|
| 1 | +#+Title: Controlling TC qdisc TXQ selection via BPF |
| 2 | + |
| 3 | +* Use-case |
| 4 | + |
| 5 | +As a policy we don't want any traffic generated by the Linux networking stack, |
| 6 | +to use transmit queue *zero*. |
| 7 | + |
| 8 | +This use-case is connected with =AF_XDP=. The example |
| 9 | +[[file:../AF_XDP-interaction/]] is sending important Real-Time traffic on XDP-socket |
| 10 | +queue zero. Some HW and NIC drivers (e.g. igb and igc) don't have enough |
| 11 | +hardware TX-queues to allocate seperate queues for XDP. Thus, these queues are |
| 12 | +shared between XDP and network stack, and there is a potential lock-contention |
| 13 | +and also HW queue usage contention. |
| 14 | + |
| 15 | +* Example |
| 16 | + |
| 17 | +The BPF code in this example is rather simple: |
| 18 | + - See: [[file:tc_txq_policy_kern.c]] |
| 19 | + |
| 20 | +This BPF program is meant to be loaded in the TC *egress* hook. |
| 21 | + |
| 22 | +** TC-BPF loader |
| 23 | + |
| 24 | +The =tc= cmdline tool is notorious difficult to use, and have issues (mounting |
| 25 | +BPF file-system) on Yocto build. |
| 26 | + |
| 27 | +Thus, [[file:tc_txq_policy.c]] contains a C-code loader, that attach the BPF-prog to |
| 28 | +the TC-hook, without depending on =tc= command util. Furthermore, the loader |
| 29 | +uses =bpftool= skeleton feature (to generate a header file) allowing to create a |
| 30 | +binary that contains the BPF-object itself, making it self-contained. |
| 31 | + |
| 32 | +* Gotchas: XPS |
| 33 | + |
| 34 | +For TXQ (=queue_mapping=) overwrite to work, you need to *disable* XPS (Transmit |
| 35 | +Packet Steering), as XSP will have higher precedence than our BPF change to |
| 36 | +=queue_mapping=. This is done by writing 0 into each =/sys/class/net/= tx-queue |
| 37 | +file =/sys/class/net/DEV/queues/tx-*/xps_cpus=. |
| 38 | + |
| 39 | +A script for configuring and disabling XPS is provided here: [[file:xps_setup_ash.sh]]. |
| 40 | + |
| 41 | +Script command line to disable XPS: |
| 42 | +#+begin_src sh |
| 43 | + sudo ./xps_setup_ash.sh --dev DEVICE --default --disable |
| 44 | +#+end_src |
| 45 | + |
| 46 | +* Different ways to view queue_mapping |
| 47 | + |
| 48 | +Notice that =queue_mapping= set in BPF-prog is like RX-recorded number |
| 49 | +(=skb_rx_queue_recorded=). When reaching TX-layer it will have been decremented |
| 50 | +by one (by =skb_get_rx_queue()=) at the TX netstack processing stage (in |
| 51 | +=__dev_queue_xmit()=). |
| 52 | + |
| 53 | +** perf probe |
| 54 | + |
| 55 | +The perf tool can be used for recording and inspecting the =skb->queue_mapping=. |
| 56 | + |
| 57 | +Remember: BPF-prog =queue_mapping= setting have been decremented by one at this |
| 58 | +TX netstack processing stage. |
| 59 | + |
| 60 | +#+begin_src sh |
| 61 | +perf probe -a 'dev_hard_start_xmit skb->dev->name:string skb->queue_mapping skb->hash' |
| 62 | +Added new event: |
| 63 | + probe:dev_hard_start_xmit (on dev_hard_start_xmit with name=skb->dev->name:string queue_mapping=skb->queue_mapping hash=skb->hash) |
| 64 | + |
| 65 | +You can now use it in all perf tools, such as: |
| 66 | + perf record -e probe:dev_hard_start_xmit -aR sleep 1 |
| 67 | +#+end_src |
| 68 | + |
| 69 | +Afterwards run =perf script= and see results. |
| 70 | + |
| 71 | +** bpftrace |
| 72 | + |
| 73 | +It is also possible to monitor TXQ usage via a =bpftrace= script. |
| 74 | + * see [[file:monitor_txq_usage.bt]]. |
| 75 | + |
| 76 | +The main part of the script is: |
| 77 | +#+begin_src sh |
| 78 | + tracepoint:net:net_dev_start_xmit { |
| 79 | + $qm = args->queue_mapping; |
| 80 | + $dev = str(args->name, 15); |
| 81 | + |
| 82 | + @stat_txq_usage[$dev] = lhist($qm, 0,32,1); |
| 83 | + } |
| 84 | +#+end_src |
| 85 | + |
| 86 | +Or as oneliner: |
| 87 | +#+begin_src sh |
| 88 | + bpftrace -e 't:net:net_dev_start_xmit {@txq[str(args->name, 15)]=lhist(args->queue_mapping, 0,32,1)}' |
| 89 | +#+end_src |
| 90 | + |
| 91 | +* Inspecting loaded BPF |
| 92 | + |
| 93 | +How do you see if these BPF TC-hook programs are loaded? |
| 94 | + |
| 95 | +** bpftool |
| 96 | + |
| 97 | +The cmdline =bpftool net= can list any network related BPF program: |
| 98 | + |
| 99 | +#+begin_example |
| 100 | + root@main-ctrl2:~ # bpftool net |
| 101 | + xdp: |
| 102 | + eth1(5) driver id 59 |
| 103 | + |
| 104 | + tc: |
| 105 | + eth1(5) clsact/egress not_txq_zero:[17] id 17 |
| 106 | +#+end_example |
| 107 | + |
| 108 | +There we see both the *XDP* BPF-program used by AF_XDP to redirect frames, and |
| 109 | +the *TC* hook BPF-prog loaded and attached. |
| 110 | + |
| 111 | +** tc egress |
| 112 | + |
| 113 | +The tc command need to be longer and more explicit: |
| 114 | +#+begin_example |
| 115 | + root@main-ctrl2:~ # tc filter show dev eth1 egress |
| 116 | + filter protocol all pref 49199 bpf chain 0 |
| 117 | + filter protocol all pref 49199 bpf chain 0 handle 0x1 not_txq_zero:[17] direct-action not_in_hw id 17 tag a761e11074b78959 jited |
| 118 | +#+end_example |
0 commit comments