My other question is why does it take so long to copy an app out of a dmg and into /Applications. Like, just change some pointers to pointers to data on disk and shit.
Mount natively in macOS
> why does it take so long to copy [...] out of a dmg
Compression mostly. DMG contents can optionally be compressed using zlib, lzfse, or slow as molasses bzip2.
Also Gatekeeper.
https://github.com/albertz/PyCParser/blob/master/demos/disse...
Would be nice if they moved to storing diff sidecars.
import sys
import struct
from collections import namedtuple
# Bake the layout once into a reusable, precompiled object.
HEADER = struct.Struct(">4sIIIQQ16sQQIIII")
# struct only knows positions, not names โ pair it with a namedtuple
# to recover the named-field access that cstruct gives you for free.
Header = namedtuple("Header", [
"magic", "field4", "field8", "fieldC",
"field10", "field18", "field20",
"field30", "field38",
"field40", "field44", "field48", "field4C",
])
with open(sys.argv[1], "rb") as fh:
header = Header._make(HEADER.unpack(fh.read(HEADER.size)))
print(header)
To me, this seems significantly less readable... less Pythonic, even. The printed output is also less readable.But yeah, while I think the `cstruct` helper function to describe a binary data layout in Python is more elegant than the builtin alternatives, it would have been much less painful to just go with a minimal C command line program (or any other programming language where a struct directly maps to memory). Python and most other scripting languages have been built for manipulating text data, but suck when working with binary data.
C structs seem less bad than python structs, so why not use them? Especially why write a struct parser and create a DSL for it, when there's already one that you can use that uses a well known DSL you might already understand.
The library used in the author's post seems perfectly readable to me, enough that it didn't even register until I read your comment. Could it be tweaked slightly to not use C syntax? Sure, but it's still going to need roughly the same pattern of identifier + type (including size). Types in C are straightforward so long as you don't have functions/pointers (which have the "inside out" problem, but they're not needed for binary encodings), so you're going to be looking at pretty trivial changes to syntax. Certainly not enough to warrant this level of quibbling.
And, as a bonus, creating, say, a filesystem implementation is now often as easy as copy/pasting existing C structure definitions, either from the original source (which is usually C) or from reversing tools such as IDA/Ghidra.
Thereโs no right or wrong way in my opinion, just preferences.
2026-06-18 ยท 18 min read ยท Erik Schamper
At WWDC 2025, Apple announced macOS 26 Tahoe. One of the new features in macOS Tahoe is a new disk image format: ASIF. Designed for use with virtual machines (its documentation lives under the Virtualization framework), ASIF takes a lot of inspiration from existing virtual disk formats. Practically, that means itโs another sparse virtual disk format, and functions very similar to sparse VMDK, VHDX or QCOW2 files (for the uninitiated, it allow you to store a large disk, or file, in a smaller, โsparseโ manner).
Shortly before the release of macOS Tahoe (late 2025), I thought itโd be a fun exercise to try and write a parser for ASIF files. Itโs been a while since then, but I wanted to go back and show my process on how I approach these kinds of problems. Maybe someone unfamiliar with reverse engineering file formats can pick up a thing or two. For that reason, you can find the occasional โResearch noteโ sprinkled throughout this post with some additional insights.
Letโs create a test file with the command listed in the Apple documentation, write a test pattern to it and get started:
Research note
For testing purposes, I usually like to write a test pattern that allows me to verify the content matches the โoffsetโ. In this case, basically just numbered 1 MiB blocks of bytes. There are definitely better test patterns, but for an initial peek at the file format, itโs also important to just fill up the file with anything. Having a predictable and verifiable pattern can make later steps easier.
โฏ diskutil image create blank --fs none --format ASIF --size 1GiB file
file.asif created
โฏ diskutil image attach -nomount file.asif
/dev/disk4
โฏ python3
Python 3.14.0 (main, Oct 7 2025, 09:34:52) [Clang 17.0.0 (clang-1700.3.19.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> fh = open("/dev/disk4", "wb")
>>> for i in range(255):
... fh.write(bytes([i] * 1024 * 1024))
>>> fh.close()
โฏ hdiutil detach disk4
"disk4" ejected.
As usual, we start by eyeballing some hexdumps to see if we can discern some details.
โฏ xxd file.asif | head -5
00000000
73686477000000010000020000000000
shdwยทยทยทยทยทยทยทยทยทยทยทยท
00000010
00000000000002000000000000041400
ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท
00000020
8af9ead2cf3849c08eec0095cf5c7899
ยทยทยทยทยท8Iยทยทยทยทยทยท\xยท
00000030
00000000001dcd650000080000000000
ยทยทยทยทยทยทยทeยทยทยทยทยทยทยทยท
00000040
001000000200000000000000ffffffff
ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท
We can immediately spot some kind of file magic, followed by some big endian looking integers.
Research note
Whenever youโre reverse engineering a file format and you see some magic bytes, itโs always a good idea to search for any available information on it online. I usually search for a combination of the string/byte representation, big endian hex and little endian hex of the file magic in various search engines (Google, GitHub, VirusTotal Retrohunt). In this case, I didnโt find much useful information.
As for โspotting endiannessโ or integer fields, itโs almost like riding a bicycle after a while. I guess a tip is to scan from left to right in chunks of 4 bytes (
uint32), then 8 bytes (uint64), then possibly dividing into smaller chunks (uint16or evenuint8), until you can parse out reasonable looking integers (round base 16, or cross reference with offsets in the file, optionally multiplied by other values you spot). If you see โnatural orderโ looking integers, itโs big endian. If it looks reverse, itโs little endian.
Letโs quickly type up a rough structure, making a best guess at the integer widths and inspect it further with dissect.cstruct:
# /// script
# requires-python = ">=3.10"
# dependencies = ["dissect.cstruct"]
# ///
import sys
from dissect.cstruct import cstruct, dumpstruct
asif_def = """
struct header {
char magic[4];
uint32 field4;
uint32 field8;
uint32 fieldC;
uint64 field10;
uint64 field18;
char field20[16];
uint64 field30;
uint64 field38;
uint32 field40;
uint32 field44;
uint32 field48;
uint32 field4C;
};
"""
c_asif = cstruct(asif_def, endian=">")
with open(sys.argv[1], "rb") as fh:
header = c_asif.header(fh)
dumpstruct(header)
00000000
73686477000000010000020000000000
shdwยทยทยทยทยทยทยทยทยทยทยทยท
00000010
00000000000002000000000000041400
ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท
00000020
8af9ead2cf3849c08eec0095cf5c7899
ยทยทยทยทยท8Iยทยทยทยทยทยท\xยท
00000030
00000000001dcd650000080000000000
ยทยทยทยทยทยทยทeยทยทยทยทยทยทยทยท
00000040
001000000200000000000000ffffffff
ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท
header
magic[4]
0x0000
4 bytes
b'shdw'
field4
0x0004
4 bytes
0x1
field8
0x0008
4 bytes
0x200
fieldC
0x000c
4 bytes
0x0
field10
0x0010
8 bytes
0x200
field18
0x0018
8 bytes
0x41400
field20[16]
0x0020
16 bytes
b'\x8a\xf9\xea\xd2\xcf8I\xc0\x8e\xec\x00\x95\xcf\\x\x99'
field30
0x0030
8 bytes
0x1dcd65
field38
0x0038
8 bytes
0x80000000000
field40
0x0040
4 bytes
0x100000
field44
0x0044
4 bytes
0x2000000
field48
0x0048
4 bytes
0x0
field4C
0x004c
4 bytes
0xffffffff
Some interesting numbers, we can immediately spot two instances of 0x200, a common sector size (512 bytes). After playing around a bit multiplying and dividing different values, we can also figure out that field30 might be the virtual disk size in sector numbers (1GiB // 512 = 0x200000).
If we assume for now that field8 is the sector size, field10 and field18 might be offsets into the file. Letโs look at a slightly larger hexdump:
โฏ xxd -a file.asif | head -32
00000000
73686477000000010000020000000000
shdwยทยทยทยทยทยทยทยทยทยทยทยท
00000010
00000000000002000000000000041400
ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท
00000020
8af9ead2cf3849c08eec0095cf5c7899
ยทยทยทยทยท8Iยทยทยทยทยทยท\xยท
00000030
00000000001dcd650000080000000000
ยทยทยทยทยทยทยทeยทยทยทยทยทยทยทยท
00000040
001000000200000000000000ffffffff
ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท
00000050
00000000000000000000000000000000
ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท
*
00000200
00000000000000020000000000000004
ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท
00000210
00000000000000000000000000000000
ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท
*
00041240
00000000000000000000000000000001
ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท
00041250
00000000000000000000000000000000
ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท
*
00041400
00000000000000010000000000000000
ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท
00041410
00000000000000000000000000000000
ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท
*
00082440
00000000000000000000000000000001
ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท
00082450
00000000000000000000000000000000
ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท
*
00120030
c0000000000000020000000000000003
ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท
00120040
00000000000000000000000000000000
ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท
*
00200000
6d657461000000010000020000000000
metaยทยทยทยทยทยทยทยทยทยทยทยท
00200010
00000200000000000000000000000000
ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท
00200020
00000000000000000000000000000000
ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท
*
00200200
3c3f786d6c2076657273696f6e3d2231
ยท I often use YARA for this purpose instead of grep, which is usually a bit faster, but this will do for now. > > Another approach I sometimes take is to do a needle search on the raw disk (usually the virtual disk of a VM), and then do a reverse lookup to see which file corresponds to the offsets on disk that I get a match on. Totally overkill for this, but a useful method for other cases. Out of our matches, `diskimagescontroller` jumps out the most, and running `strings` on it reveals _a lot_ of ASIF related content, including very valuable log messages, mangled function signatures and type names! โฏ strings diskimagescontroller | grep -i "asif" [...] N7di_asif7details3dirE N7di_asif7details8dir_baseE [...] Invalid value for asif header field: %s Size cannot exceed max ASIF size Unexpected ASIF header length ( [...] asif_header Couldn't read asif's header [...] Letโs throw this puppy into IDA! The first stop of reverse engineering anything is to look for references to interesting looking strings, and this time is no exception. For a file format, it makes sense to first focus on how to interpret the fileโs header, and then go from there. We already saw some interesting strings related to โasifโs headerโ with the `strings` command, so letโs just look at some of the functions that reference those strings. Research note > At the time I did this research, I went searching online for some of the symbol names I encountered. From this I learned that there are projects on GitHub were people diff iOS firmware updates, but also that the iOS version of the same binary _might_ contain some more symbols/strings. I ended up reverse engineering the iOS version of this binary, but not necessarily for this reason. > > If there are both x86 and ARM variants of the same binary, I like to have them open simultaneously. The decompiler output of one version can sometimes be much clearer than the other. I have no real data to back this up, but my gut feeling is because of the varying levels of compiler, but also decompiler optimizations for both architectures. Regardless, I remember the version of IDA I was using would crash randomly on the x86 variant of the binary, so I just spent most of my time annotating the iOS version, even though I donโt think there were actually any meaningful differences between the two. > > Anyway, Iโm rambling, letโs continue!  Figure 1: This looks like it parsed an ASIF header We can very quickly find a function that looks like it parses values out of a raw ASIF header (`a2`) and storing it into an in-memory structure (`a1`), see Figure 1. A little lower in this function (not pictured), there are some simple checks against a bunch of these fields, paired with some helpful error/log messages (`"ASIF max_write size in header exceed the limit"`, `"Sector count is too large"`, etc). We immediately learn a lot: * The offset and byte length of each field in the header; * The name or purpose of some of these fields. If we update our header definition from before, and add just a pinch of draw-the-rest-of-the-fucking-owl, we end up with something like this: struct asif_header { uint32 header_signature; uint32 header_version; uint32 header_size; uint32 header_flags; uint64 directory_offsets[2]; char guid[16]; uint64 sector_count; uint64 max_sector_count; uint32 chunk_size; uint16 block_size; uint16 total_segments; uint64 metadata_chunk; char unk_50[16]; uint32 read_only_flags; uint32 metadata_flags; uint32 metadata_read_only_flags; }; 00000000 73686477000000010000020000000000 shdwยทยทยทยทยทยทยทยทยทยทยทยท 00000010 00000000000002000000000000041400 ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท 00000020 8af9ead2cf3849c08eec0095cf5c7899 ยทยทยทยทยท8Iยทยทยทยทยทยท\\xยท 00000030 00000000001dcd650000080000000000 ยทยทยทยทยทยทยทeยทยทยทยทยทยทยทยท 00000040 001000000200000000000000ffffffff ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท 00000050 00000000000000000000000000000000 ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท 00000060 00000000000000000000000000000000 ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท asif\_header header\_signature 0x0000 4 bytes 0x73686477 header\_version 0x0004 4 bytes 0x1 header\_size 0x0008 4 bytes 0x200 header\_flags 0x000c 4 bytes 0x0 directory\_offsets\[2\] 0x0010 16 bytes \[512, 267264\] guid\[16\] 0x0020 16 bytes b'\\x8a\\xf9\\xea\\xd2\\xcf8I\\xc0\\x8e\\xec\\x00\\x95\\xcf\\\\x\\x99' sector\_count 0x0030 8 bytes 0x1dcd65 max\_sector\_count 0x0038 8 bytes 0x80000000000 chunk\_size 0x0040 4 bytes 0x100000 block\_size 0x0044 2 bytes 0x200 total\_segments 0x0046 2 bytes 0x0 metadata\_chunk 0x0048 8 bytes 0xffffffff unk\_50\[16\] 0x0050 16 bytes b'\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00' read\_only\_flags 0x0060 4 bytes 0x0 metadata\_flags 0x0064 4 bytes 0x0 metadata\_read\_only\_flags 0x0068 4 bytes 0x0 Looks like we guessed a few fields correctly in our previous attempt! Make it make sense ------------------ If youโre thinking _โthatโs a very liberal pinch of owlโ_, youโd be correct. I glossed over some details, but itโs not super interesting how I came to those. At this stage, the reverse engineering process of various functions in `diskimagescontroller` simply has a very high owl-factor. Itโs a relatively straightforward binary to reverse engineer, which in turn makes ASIF relatively straightforward to reverse engineer. The only โannoyingโ part is the liberal amount of dynamic dispatching, so thereโs some inheritance and virtual function tables to deal with, but for the most part they arenโt that complex. It doesnโt make for great blog content (and Iโm writing this waaay after I did the initial reverse engineering, so I donโt remember most of the details anymore, oops). Anyyhoooww, here are some of the important bits: * The header has a field `directory_offsets`, which is a byte offset to a sort of allocation directory; * Each directory starts with a `uint64` version, and the directory with the highest version is the current active directory. This allows for atomic updates. * Each directory has a list of tables, and each table has a list of `uint64` โdata entriesโ. * Data entries are grouped by a certain ratio, and each group is followed by a โmap entryโ, which is a bitmap for the preceding groups. * Each data entry points to a chunk of data in the ASIF file. * The chunk size is defined in the header and is typically 1 MiB. * Data entries have 55 bits usable for the chunk number and 9 bits reserved for flags. * The maximum virtual disk size with the default chunk size (which I donโt think is configurable yet) seems to be just under 4 PiB, with a small portion at the end reserved for metadata. * The actual size of the virtual disk is defined in the header, as well as the maximum size the disk can grow to. * The header contains the offset to a metadata block, which is typically `(4 PiB - 1 chunk)`, meaning itโs within the reserved area. * The metadata block contains a small header and a plist, which should contain a `internal metadata` and `user metadata` dictionary. The bit format for data entries is as follows: 0b00000000 01111111 [...] 11111111 11111111 (chunk number) 0b00111111 10000000 [...] 00000000 00000000 (reserved) 0b11000000 00000000 [...] 00000000 00000000 (flags) With the currently known flags (derived from strings from the binary): 0b00 (uninitialized) 0b01 (fully initialized) 0b10 (unmapped) 0b11 (has bitmap) If youโve ever looked at how other sparse disk formats (VMDK, VHDX, QCOW2) work, youโd find that ASIF is a quite elegant and simple format. Sure, it doesnโt have all the same features, but at least the basic stuff is pretty simple to reason about. Adding all of this information to our implementation is fairly uneventful. Read some fields from the header, verify them, seek to some offsets, bobโs your uncle. Thereโs one major detail we glossed over though. We didnโt cover how to go from an arbitrary offset inside of the virtual disk to a chunk in the ASIF file. Research note > Quick refresh on virtual disk formats like ASIF and all the other four-letter-acronyms Iโve mentioned so far. They all function on one simple principle: storing a large disk, or file, in a smaller, โsparseโ manner. The disk is chopped up into smaller pieces, and some kind of mechanism inside the file format is responsible for keeping track whether a given piece is allocated (present) or not (missing), and where in the file to find that piece. > > So, to read any given offset โinsideโ the virtual disk, we need to figure out this mechanism, or algorithm. If you take what we know (a directory with some amount of tables and some other amount of entries per table), accessing a chunk becomes `tables[X][Y]`. We just have to figure out how to calculate `X` and `Y` for any given `offset`. Most sparse file formats work in this way, with various different names for โdirectoryโ or โtableโ. This was the part that was the most annoying to reverse engineer, because a lot of the math for this was divided over several functions, and a lot of the pseudocode looked like it was a bunch of stacked macros, halfway optimized into something that barely made sense. Initially I copy pasted the pseudocode IDA generated, whipped up some plausible sounding variables names with an LLM and called it a day. It looked something like this: blocks_per_chunk = chunk_size // block_size reserved_size = 4 * chunk_size num_reserved_table_entries = ( 1 if reserved_size < blocks_per_chunk else reserved_size // blocks_per_chunk ) max_table_entries = chunk_size >> 3 num_table_entries = max_table_entries - ( max_table_entries % (num_reserved_table_entries + 1) ) num_reserved_directory_entries = (num_reserved_table_entries + num_table_entries) // ( num_reserved_table_entries + 1 ) num_usable_entries = num_table_entries - num_reserved_directory_entries # This is the size in bytes of data covered by a single table size_per_table = num_usable_entries * chunk_size max_size = block_size * header.max_sector_count num_directory_entries = (size_per_table + max_size - 1) // size_per_table And to read a chunk from the ASIF file for a given offset: table = directory.table(offset // size_per_table) # Calculate the relative chunk index within the table relative_block_index = (offset // block_size) - (table.virtual_offset // block_size) entry_index = ( relative_block_index // blocks_per_chunk + relative_block_index // blocks_per_chunk * num_reserved_table_entries ) // num_reserved_table_entries Very little attempt at actually understanding this, as evident by the lack of comments. It gave me the correct number, I was happy and had other things to do. Revisiting the topic for this blog made me take another look. An unexpected second wind ------------------------- When I first started writing this blog post in _November of 2025_ (yes, I know, and to make it even worse, I had done all the reverse engineering in _July_ of that year), I stumbled upon a Reddit post and subsequent [GitHub repository](https://github.com/huven/asif-format) from huven. Independently from my own endeavor, huven had started to document the ASIF file format. We exchanged some ideas, and then I just did other stuff until _June of 2026_, the time of this writing and hopefully the time of publishing haha. I wasnโt satisfied with my understanding on how reading from an ASIF file works and I didnโt want to half-ass this post, so I looked up huvenโs work again and started refining my implementation. The names I originally went with were along the lines of โusable entriesโ and โreserved entriesโ, and the math was hard to reason about but did calculate the correct values. In our exchanges, huven had mentioned something about bitmaps, something that I had originally completely ignored as they werenโt important for my read-only implementation. His repository describes that chunk entries are grouped into โchunk groupsโ and are followed by a bitmap. He also describes the format of the bitmap itself. With this in mind, I took a fresh look at the math as I took it from the assembly, and I could finally make sense out of it. Going back to my IDA database, it even looks like I originally took notice of the bitmaps and groups, as I named a lot of variables with โgroupโ nomenclature. I guess I shouldโve persisted with my artisan approach at the time. You may be thinking _โbut you just mentioned bitmaps and chunk groups in the previous section where you explain the important bitsโ_. Yes I retconned the post. Sue me. How reading actually works -------------------------- I almost sound like an LLM with that title. Anyway, letโs go step by step on how to go from an arbitrary offset to the correct ASIF chunk. The math for this is still based primarily around the math from `diskimagescontroller`. For an alternative explanation, you can check out [huvenโs repository](https://github.com/huven/asif-format). Letโs begin with taking a slice of an ASIF file that contains all the interesting bits we need. 00400000 c000000000000005c000000000000006 ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท 00400010 00000000000000000000000000000000 ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท \* 00404000 00000000000000070000000000000000 ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท 00404010 00000000000000000000000000000000 ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท \* 00500000 6368756e6b20302c20626c6f636b2030 chunk 0, block 0 00500010 00000000000000000000000000000000 ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท \* 00600000 6368756e6b20312c20626c6f636b2030 chunk 1, block 0 00600010 00000000000000000000000000000000 ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท \* 00604000 6368756e6b20312c20626c6f636b2033 chunk 1, block 3 00604010 32000000000000000000000000000000 2ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท 00604020 00000000000000000000000000000000 ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท \* 00700000 55555555555555550000000000000000 UUUUUUUUยทยทยทยทยทยทยทยท 00700010 00000000000000000000000000000000 ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท \* 00700200 55555555555555555555555555555555 UUUUUUUUUUUUUUUU 00700210 00000000000000000000000000000000 ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท Between `0x00400000` and `0x00404000` (the first table, directory not pictured) we can see three populated entries. If we ignore the flag bits, these are entries for chunks 5, 6 and 7. With a chunk size of `0x100000`, we can see data in chunks 5 and 6, and some ~students from Utrecht~ repeated `0x55` in chunk 7, which happens to be `0b01` repeated. Based on this we can learn a few things: * There are `(0x00404000 - 0x00400000) // 8 = 2048` chunk entries before a bitmap; * The bitmap uses 2 bits per **block**. * A single byte in the bitmap thus covers 4 blocks. * The bitmap is 1 chunk in size, so a full bitmap covers `4 * chunk_size` blocks. * The bitmap covers one chunk group, so the number of blocks per chunk group is equal to the bitmap coverage. * ~Blocks are marked as allocated in groups of 32 at a time~ This may have just been an artefact on the method I used to write to the disk, i.e. using `/dev/disk#` instead of `/dev/rdisk#`. This matches what huven already documented. If we combine this with the earlier math from `diskimagescontroller` and apply our newfound knowledge, we get the following: blocks_per_chunk = chunk_size // block_size # Based on the observations above num_blocks_per_group = 4 * chunk_size # We need at least one chunk in the group num_chunks_per_group = max(1, num_blocks_per_group // blocks_per_chunk) # This is the maximum amount of raw entries (uint64) a table could hold num_words_per_chunk = chunk_size // 8 # A chunk group has one entry for each chunk, plus one entry for the bitmap num_entries_per_group = num_chunks_per_group + 1 num_groups_per_table, num_remaining = divmod(num_words_per_chunk, num_entries_per_group) # The number of total entries in a table, including both chunk and bitmap entries num_table_entries = num_words_per_chunk - num_remaining # Calculate the size in bytes of data covered by a single table num_chunk_entries_per_table = num_table_entries - num_groups_per_table size_per_table = num_chunk_entries_per_table * chunk_size # Calculate the maximum size of the virtual disk max_size = block_size * header.max_sector_count # And the number of tables needed to cover that size num_tables = (size_per_table + max_size - 1) // size_per_table Then, if we want to go from a given offset (again, assuming that reads will be aligned to the chunk size), the logic becomes this: table = directory.table(offset // size_per_table) # Calculate the relative chunk index within the table relative_block_index = (offset // block_size) - (table.virtual_offset // block_size) relative_chunk_index = relative_block_index // blocks_per_chunk # Calculate the chunk group chunk_group = relative_chunk_index // num_chunks_per_group # Each chunk group has a bitmap entry, so we need to account for that in the entry index entry_index = relative_chunk_index + chunk_group I donโt know about you, but that seems a whole lot easier to understand and follow! The file format also โclickedโ for me now, especially when looking at the hexdump. Previously I had just accepted that it gave me the correct values. Accounting for the โskippingโ of a few entries in the table might still look a little bit confusing. Most other sparse disk formats choose to store their allocation bitmaps somewhere else entirely, if they even have one to begin with (the table entries themselves also record โallocation statusโ, in a way). I guess Apple really wanted to have allocation bitmaps in ASIF, didnโt want to introduce a separate mechanism, and just went _โnow kissโ_ with the existing tables. Itโs elegant, in a way, I suppose. It just made reverse engineering initially a little bit more confusing when you arenโt aware of them yet. Wrapping up ----------- The entire implementation is made available as part of [`dissect.hypervisor`](https://github.com/fox-it/dissect.hypervisor) (specifically [`asif.py`](https://github.com/fox-it/dissect.hypervisor/blob/main/dissect/hypervisor/disk/asif.py) and [`c_asif.py`](https://github.com/fox-it/dissect.hypervisor/blob/main/dissect/hypervisor/disk/c_asif.py)). The final fixups are currently still awaiting review in a [pull request](https://github.com/fox-it/dissect.hypervisor/pull/86). Itโs also all wired up into [Dissect](https://docs.dissect.tools/en/stable/), so you can run all the Dissect tools and APIs against any ASIF file: โฏ xxd ED213FEE-7B2F-410F-A7C3-C5976EBF7903.img | head -1 00000000: 7368 6477 0000 0001 0000 0200 0000 0000 shdw............ โฏ target-info ED213FEE-7B2F-410F-A7C3-C5976EBF7903.img Disks - Volumes - - - Mounts - [...] Hostname : users-Virtual-Machine Domain : None Ips : 192.168.64.2 Os family : macos Os version : macOS 26.4.1 (25E253) Architecture : aarch64-apple-macos Language : Timezone : America/Los_Angeles Install date : None Last activity : 2026-06-11T08:23:41.690898+00:00