1.. SPDX-License-Identifier: GPL-2.0 2 3============================ 4BPF_PROG_TYPE_FLOW_DISSECTOR 5============================ 6 7Overview 8======== 9 10Flow dissector is a routine that parses metadata out of the packets. It's 11used in the various places in the networking subsystem (RFS, flow hash, etc). 12 13BPF flow dissector is an attempt to reimplement C-based flow dissector logic 14in BPF to gain all the benefits of BPF verifier (namely, limits on the 15number of instructions and tail calls). 16 17API 18=== 19 20BPF flow dissector programs operate on an ``__sk_buff``. However, only the 21limited set of fields is allowed: ``data``, ``data_end`` and ``flow_keys``. 22``flow_keys`` is ``struct bpf_flow_keys`` and contains flow dissector input 23and output arguments. 24 25The inputs are: 26 * ``nhoff`` - initial offset of the networking header 27 * ``thoff`` - initial offset of the transport header, initialized to nhoff 28 * ``n_proto`` - L3 protocol type, parsed out of L2 header 29 * ``flags`` - optional flags 30 31Flow dissector BPF program should fill out the rest of the ``struct 32bpf_flow_keys`` fields. Input arguments ``nhoff/thoff/n_proto`` should be 33also adjusted accordingly. 34 35The return code of the BPF program is either BPF_OK to indicate successful 36dissection, or BPF_DROP to indicate parsing error. 37 38__sk_buff->data 39=============== 40 41In the VLAN-less case, this is what the initial state of the BPF flow 42dissector looks like:: 43 44 +------+------+------------+-----------+ 45 | DMAC | SMAC | ETHER_TYPE | L3_HEADER | 46 +------+------+------------+-----------+ 47 ^ 48 | 49 +-- flow dissector starts here 50 51 52.. code:: c 53 54 skb->data + flow_keys->nhoff point to the first byte of L3_HEADER 55 flow_keys->thoff = nhoff 56 flow_keys->n_proto = ETHER_TYPE 57 58In case of VLAN, flow dissector can be called with the two different states. 59 60Pre-VLAN parsing:: 61 62 +------+------+------+-----+-----------+-----------+ 63 | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER | 64 +------+------+------+-----+-----------+-----------+ 65 ^ 66 | 67 +-- flow dissector starts here 68 69.. code:: c 70 71 skb->data + flow_keys->nhoff point the to first byte of TCI 72 flow_keys->thoff = nhoff 73 flow_keys->n_proto = TPID 74 75Please note that TPID can be 802.1AD and, hence, BPF program would 76have to parse VLAN information twice for double tagged packets. 77 78 79Post-VLAN parsing:: 80 81 +------+------+------+-----+-----------+-----------+ 82 | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER | 83 +------+------+------+-----+-----------+-----------+ 84 ^ 85 | 86 +-- flow dissector starts here 87 88.. code:: c 89 90 skb->data + flow_keys->nhoff point the to first byte of L3_HEADER 91 flow_keys->thoff = nhoff 92 flow_keys->n_proto = ETHER_TYPE 93 94In this case VLAN information has been processed before the flow dissector 95and BPF flow dissector is not required to handle it. 96 97 98The takeaway here is as follows: BPF flow dissector program can be called with 99the optional VLAN header and should gracefully handle both cases: when single 100or double VLAN is present and when it is not present. The same program 101can be called for both cases and would have to be written carefully to 102handle both cases. 103 104 105Flags 106===== 107 108``flow_keys->flags`` might contain optional input flags that work as follows: 109 110* ``BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG`` - tells BPF flow dissector to 111 continue parsing first fragment; the default expected behavior is that 112 flow dissector returns as soon as it finds out that the packet is fragmented; 113 used by ``eth_get_headlen`` to estimate length of all headers for GRO. 114* ``BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL`` - tells BPF flow dissector to 115 stop parsing as soon as it reaches IPv6 flow label; used by 116 ``___skb_get_hash`` and ``__skb_get_hash_symmetric`` to get flow hash. 117* ``BPF_FLOW_DISSECTOR_F_STOP_AT_ENCAP`` - tells BPF flow dissector to stop 118 parsing as soon as it reaches encapsulated headers; used by routing 119 infrastructure. 120 121 122Reference Implementation 123======================== 124 125See ``tools/testing/selftests/bpf/progs/bpf_flow.c`` for the reference 126implementation and ``tools/testing/selftests/bpf/flow_dissector_load.[hc]`` 127for the loader. bpftool can be used to load BPF flow dissector program as well. 128 129The reference implementation is organized as follows: 130 * ``jmp_table`` map that contains sub-programs for each supported L3 protocol 131 * ``_dissect`` routine - entry point; it does input ``n_proto`` parsing and 132 does ``bpf_tail_call`` to the appropriate L3 handler 133 134Since BPF at this point doesn't support looping (or any jumping back), 135jmp_table is used instead to handle multiple levels of encapsulation (and 136IPv6 options). 137 138 139Current Limitations 140=================== 141BPF flow dissector doesn't support exporting all the metadata that in-kernel 142C-based implementation can export. Notable example is single VLAN (802.1Q) 143and double VLAN (802.1AD) tags. Please refer to the ``struct bpf_flow_keys`` 144for a set of information that's currently can be exported from the BPF context. 145 146When BPF flow dissector is attached to the root network namespace (machine-wide 147policy), users can't override it in their child network namespaces. 148