unxip
unxip is a command line-tool designed for rapidly unarchiving Xcode XIP files and writing them to disk with good compression. Its goal is to outperform Bom (which powers xip(1)
and Archive Utility) in both performance and on-disk usage.
Installation
Not much installation is needed to use unxip: simply download unxip.swift and compile it using swiftc -parse-as-library -O unxip.swift
to produce the unxip
binary.
Usage
The intended usage of unxip is with a single command line parameter that represents the path to an XIP from Apple that contains Xcode. For example:
$ unxip Xcode_13.3_beta.xip # will produce Xcode-beta.app in the current directory
As the tool is still somewhat rough, its error handling is not very good at the moment. An attempt has been made to at least crash preemptively when things go wrong, but you may still run into strange behavior on edge cases. For best results, ensure that the directory you are running unxip from does not contain any existing Xcode(-beta).app bundles and that you are using a modern version of macOS on a fast APFS filesystem. For simplicity, unxip does not perform any signature verification, so if authentication is important you should use another mechanism (such as a checksum) for validation.
Design
As a purpose-built tool, unxip outperforms Bom because of several key implementation decisions. Heavy use of Swift Concurrency allows unxip to unlock parallelization opportunities that Bom largely misses, and the use of LZFSE rather than the simpler LZVN gives it higher compression ratios. To understand its design, it's important to first be familiar with the Xcode XIP format and APFS transparent compression.
XIPs, including the ones that Xcode come in, are XAR archives, which contain a table of contents that lists each file inside and the compression used for each. However, unlike most XARs Xcode's only has two files: a bzip2-compressed Metadata that is just a few hundred bytes, and a multi-gigabyte file named Content that is stored "uncompressed". While marked as plain data, this file is an apparently proprietary archive format called pbzx. Luckily, the scheme is fairly straightforward and several people on the internet have already tried reverse engineering it. This tool contains an independent implementation that nonetheless shares many of its core details with these efforts. The compressed content inside the pbzx is an ASCII-representation cpio archive, which has been split apart into 16MB chunks that have either been individually compressed with LZMA or included as-is. Unfortunately pbzx does not contain a table of contents, or any structure aside from these (byte-, rather than file-aligned) chunks, so distinguishing individual files is not possible without decompressing the entire buffer.
Parsing this cpio archive gives the necessary information need to reconstruct an Xcode bundle, but unxip (and Bom) go through an additional step to apply transparent APFS compression to files that could benefit from it, which significantly reduces size on-disk. For this operation, unxip chooses to use the LZFSE algorithm, while Bom uses the simpler LZVN. The compressed data is stored in the file's resource fork, a special header describing the compression is constructed in an xattr, and the UF_COMPRESSED
is set on the file.
On the whole, this procedure is designed to be fairly linear, with the XIP being read sequentially, producing LZMA chunks that are reassembled in order to create the cpio archive, which can then be streamed to reconstruct an Xcode bundle. Unfortunately, a naive implementation of this is process does not perform very well due to the varying performance bottlenecks of each step. To make matters worse, the size of Xcode makes it infeasible to operate with entirely in memory. To get around this problem, unxip parallelizes intermediate steps and then streams results in linear order, benefiting from much better processor utilization and allowing the file to be processed in "sliding window" fashion.
On modern processors, single-threaded LZMA decoding is limited to about ~100 MB/s; as the Xcode cpio is almost 40 GB large, this is not really fast enough for unxip. Instead, unxip carves out each chunk from the pbzx archive into its own task (the metadata in the file format makes this fairly straightforward) and decompresses each in parallel. To limit memory usage, a cap is applied to how many chunks are resident in memory at once. Since the next step (parsing the cpio) requires logical linearity, completing chunks are temporarily parked until their preceding ones complete, after which they are all yielded together. This preserves order while still providing an opportunity for multiple chunks to be decoded in parallel. In practice, this technique can decode the LZMA stream at effective speeds approaching 1 GB/s when provided with enough CPU cores.
The linear chunk stream (now decompressed into a cpio) is then parsed in sequence to extract files, directories, and their associated metadata. cpios are naturally ordered–for example, all additional hardlinks must come after the original file–but Xcode's has an additional nice property that it's been sorted so that all directories appear before the files inside of them. This allows for a sequential stream of filesystem operations to correctly produce the bundle, without running into errors with missing intermediate directories or link targets.
While simplifying the implementation, this order makes it difficult for unxip to efficiently schedule filesystem operations and transparent compression. To resolve this, a dependency graph is created for each file (directories, files, and symlinks depend on their parent directory's existence, hardlinks require their target to exist) and then the task is scheduled in parallel with those constraints applied. New file writes are particularly expensive because compression is applied before the data is written to disk. While this step is already parallelized to some extent because of the graph described earlier, there is a chance for additional parallelism in Apple's filesystem compression implementation because it chunks data internally at 64KB chunk boundaries, which we can then run in parallel. LZFSE achieves high compression ratios and has a performant implementation, which we can take advantage of largely for free. Unlike most of our steps, which were compute-bound, the final step of writing to disk requires interacting with the kernel. If we're careless we can accidentally overload the system with operations and back up our entire pipeline. To prevent unconsumed chunks sitting around in memory, we manually apply backpressure on our stream by having them only yield results when later steps are ready to consume them.
Overall, this architecture allows unxip to utilize CPU cores and dispatch disk writes fairly well. It is likely that there is still some room for improvement in its implementation, especially around the constants chosen for batch sizes and backoff intervals (some of which can probably be done much better by the runtime itself once it is ready). Ideas on how its performance can be further improved are always welcome :)