Skip to content

Commit c14f52a

Browse files
authored
Merge pull request #38 from xdp-project/vestas06_tc_qdisc
TC policy example of overriding netstack TXQ
2 parents 007c0b6 + 91432fe commit c14f52a

File tree

7 files changed

+767
-0
lines changed

7 files changed

+767
-0
lines changed

tc-policy/Makefile

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
2+
3+
USER_TARGETS := tc_txq_policy
4+
BPF_TARGETS := tc_txq_policy_kern
5+
6+
# Depend on bpftool for auto generating
7+
# skeleton header file tc_txq_policy_kern.skel.h
8+
#
9+
BPF_SKEL_OBJ := tc_txq_policy_kern.o
10+
11+
LIB_DIR = ../lib
12+
13+
include $(LIB_DIR)/common.mk

tc-policy/README.org

Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
#+Title: Controlling TC qdisc TXQ selection via BPF
2+
3+
* Use-case
4+
5+
As a policy we don't want any traffic generated by the Linux networking stack,
6+
to use transmit queue *zero*.
7+
8+
This use-case is connected with =AF_XDP=. The example
9+
[[file:../AF_XDP-interaction/]] is sending important Real-Time traffic on XDP-socket
10+
queue zero. Some HW and NIC drivers (e.g. igb and igc) don't have enough
11+
hardware TX-queues to allocate seperate queues for XDP. Thus, these queues are
12+
shared between XDP and network stack, and there is a potential lock-contention
13+
and also HW queue usage contention.
14+
15+
* Example
16+
17+
The BPF code in this example is rather simple:
18+
- See: [[file:tc_txq_policy_kern.c]]
19+
20+
This BPF program is meant to be loaded in the TC *egress* hook.
21+
22+
** TC-BPF loader
23+
24+
The =tc= cmdline tool is notorious difficult to use, and have issues (mounting
25+
BPF file-system) on Yocto build.
26+
27+
Thus, [[file:tc_txq_policy.c]] contains a C-code loader, that attach the BPF-prog to
28+
the TC-hook, without depending on =tc= command util. Furthermore, the loader
29+
uses =bpftool= skeleton feature (to generate a header file) allowing to create a
30+
binary that contains the BPF-object itself, making it self-contained.
31+
32+
* Gotchas: XPS
33+
34+
For TXQ (=queue_mapping=) overwrite to work, you need to *disable* XPS (Transmit
35+
Packet Steering), as XSP will have higher precedence than our BPF change to
36+
=queue_mapping=. This is done by writing 0 into each =/sys/class/net/= tx-queue
37+
file =/sys/class/net/DEV/queues/tx-*/xps_cpus=.
38+
39+
A script for configuring and disabling XPS is provided here: [[file:xps_setup_ash.sh]].
40+
41+
Script command line to disable XPS:
42+
#+begin_src sh
43+
sudo ./xps_setup_ash.sh --dev DEVICE --default --disable
44+
#+end_src
45+
46+
* Different ways to view queue_mapping
47+
48+
Notice that =queue_mapping= set in BPF-prog is like RX-recorded number
49+
(=skb_rx_queue_recorded=). When reaching TX-layer it will have been decremented
50+
by one (by =skb_get_rx_queue()=) at the TX netstack processing stage (in
51+
=__dev_queue_xmit()=).
52+
53+
** perf probe
54+
55+
The perf tool can be used for recording and inspecting the =skb->queue_mapping=.
56+
57+
Remember: BPF-prog =queue_mapping= setting have been decremented by one at this
58+
TX netstack processing stage.
59+
60+
#+begin_src sh
61+
perf probe -a 'dev_hard_start_xmit skb->dev->name:string skb->queue_mapping skb->hash'
62+
Added new event:
63+
probe:dev_hard_start_xmit (on dev_hard_start_xmit with name=skb->dev->name:string queue_mapping=skb->queue_mapping hash=skb->hash)
64+
65+
You can now use it in all perf tools, such as:
66+
perf record -e probe:dev_hard_start_xmit -aR sleep 1
67+
#+end_src
68+
69+
Afterwards run =perf script= and see results.
70+
71+
** bpftrace
72+
73+
It is also possible to monitor TXQ usage via a =bpftrace= script.
74+
* see [[file:monitor_txq_usage.bt]].
75+
76+
The main part of the script is:
77+
#+begin_src sh
78+
tracepoint:net:net_dev_start_xmit {
79+
$qm = args->queue_mapping;
80+
$dev = str(args->name, 15);
81+
82+
@stat_txq_usage[$dev] = lhist($qm, 0,32,1);
83+
}
84+
#+end_src
85+
86+
Or as oneliner:
87+
#+begin_src sh
88+
bpftrace -e 't:net:net_dev_start_xmit {@txq[str(args->name, 15)]=lhist(args->queue_mapping, 0,32,1)}'
89+
#+end_src
90+
91+
* Inspecting loaded BPF
92+
93+
How do you see if these BPF TC-hook programs are loaded?
94+
95+
** bpftool
96+
97+
The cmdline =bpftool net= can list any network related BPF program:
98+
99+
#+begin_example
100+
root@main-ctrl2:~ # bpftool net
101+
xdp:
102+
eth1(5) driver id 59
103+
104+
tc:
105+
eth1(5) clsact/egress not_txq_zero:[17] id 17
106+
#+end_example
107+
108+
There we see both the *XDP* BPF-program used by AF_XDP to redirect frames, and
109+
the *TC* hook BPF-prog loaded and attached.
110+
111+
** tc egress
112+
113+
The tc command need to be longer and more explicit:
114+
#+begin_example
115+
root@main-ctrl2:~ # tc filter show dev eth1 egress
116+
filter protocol all pref 49199 bpf chain 0
117+
filter protocol all pref 49199 bpf chain 0 handle 0x1 not_txq_zero:[17] direct-action not_in_hw id 17 tag a761e11074b78959 jited
118+
#+end_example

tc-policy/adv_monitor_txq_usage.bt

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
#!/usr/bin/bpftrace
2+
3+
//BEGIN {
4+
// printf("Monitor TXQ usage\n");
5+
// printf(" - Remember: BPF set queue_mapping is one-less here (zero-indexed)\n");
6+
//}
7+
8+
tracepoint:net:net_dev_start_xmit {
9+
$qm = args->queue_mapping;
10+
$dev = str(args->name, 16);
11+
12+
@stat_txq_usage[$dev] = lhist($qm, 0,32,1);
13+
}
14+
15+
/*
16+
* More precisely we actually want to see what netdev_pick_tx() is
17+
* selecting, as sockets can possibly return another queue_id.
18+
*/
19+
20+
kprobe:netdev_pick_tx {
21+
$dev = ((struct net_device *)arg0)->name;
22+
@record[cpu] = $dev;
23+
}
24+
25+
kretprobe:netdev_pick_tx {
26+
$dev = @record[cpu];
27+
@netdev_pick_tx[$dev] = lhist(retval, 0,32,1);
28+
}
29+
30+
/* Periodically print stats */
31+
interval:s:3
32+
{
33+
printf("\nPeriodic show stats - time: ");
34+
time();
35+
print(@stat_txq_usage);
36+
print(@netdev_pick_tx);
37+
}
38+
39+
/* Default bpftrace will print all remaining maps at END */
40+
//END {
41+
// printf("END stats:\n");
42+
//}

tc-policy/monitor_txq_usage.bt

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
#!/usr/bin/bpftrace
2+
3+
//BEGIN {
4+
// printf("Monitor TXQ usage\n");
5+
// printf(" - Remember: BPF set queue_mapping is one-less here (zero-indexed)\n");
6+
//}
7+
8+
tracepoint:net:net_dev_start_xmit {
9+
$qm = args->queue_mapping;
10+
$dev = str(args->name, 16);
11+
12+
@stat_txq_usage[$dev] = lhist($qm, 0,32,1);
13+
}
14+
15+
/* Periodically print stats */
16+
interval:s:3
17+
{
18+
printf("\nPeriodic show stats - time: ");
19+
time();
20+
print(@stat_txq_usage);
21+
}
22+
23+
/* Default bpftrace will print all remaining maps at END */
24+
//END {
25+
// printf("END stats:\n");
26+
//}

0 commit comments

Comments
 (0)