Zip Bombs & Tar Bombs: DoS Attacks on AI Pipelines
In Machine Learning Operations (MLOps), datasets, pre-trained model weights, and checkpoints are routinely transported as compressed archives (.zip, .tar.gz, .whl). Because AI infrastructure is designed to ingest and decompress massive files automatically, it presents a highly lucrative target for Denial of Service (DoS) attacks via Decompression Bombs.
A Decompression Bomb (commonly known as a Zip Bomb or Tar Bomb) is a maliciously crafted archive file. While its compressed size is minuscule (often just a few kilobytes), its uncompressed size is exponentially larger—sometimes reaching petabytes of garbage data.
The Impact on MLOps Infrastructure
When an automated CI/CD runner, a data ingestion script, or a naive security scanner attempts to extract a Zip Bomb, the consequences are catastrophic:
- Disk Exhaustion: The extraction process rapidly consumes all available storage on the host machine or Kubernetes node, causing unrelated services to crash.
- CPU and RAM Spikes: The decompression algorithm consumes 100% of CPU cycles and rapidly exhausts system RAM.
- OOM Kills: The Linux kernel's Out-Of-Memory (OOM) Killer intervenes, forcefully terminating the process (Exit Code 137). In a Kubernetes environment, this can lead to pod eviction and cascading node failures.
A classic example is 42.zip, a 42 KB archive containing five layers of nested zip files that expand to 4.5 Petabytes of data.
Infrastructure-Level Defense with Veritensor
Protecting against Decompression Bombs requires a two-tiered approach: software-level heuristics and strict infrastructure-level sandboxing. Veritensor implements both.
Tier 1: The SafeZipReader (Software Heuristics)
Before Veritensor attempts to extract an archive for deep YARA scanning, the SafeZipReader inspects the archive's internal headers (infolist).
- Compression Ratio Limits: It calculates the ratio between the
compress_sizeandfile_size. If the ratio exceeds100x, the scanner immediately throws aZipBombError. - Absolute Size Limits: It calculates the total uncompressed size of all files in the archive. If the total exceeds 2 Gigabytes, the scan is aborted.
- Nested Archive Detection: The engine flags nested archives (
.zipinside a.zip) asMEDIUMthreats, as nesting is a primary vector for exponential expansion.
Tier 2: tmpfs Sandboxing (Infrastructure Enforcement)
Software heuristics are insufficient because archive headers can be maliciously spoofed by an attacker (e.g., claiming the uncompressed size is 1MB when it is actually 1TB).
To provide ironclad protection, the Veritensor Enterprise Control Plane enforces security at the Linux kernel level. The Celery worker containers responsible for archive extraction are deployed with a read_only: true root filesystem and a strict tmpfs (RAM disk) mount limited to 2GB (/tmp:exec,size=2G).
If an advanced Zip Bomb bypasses the Python header checks and attempts to decompress, it can only write to the isolated 2GB RAM disk. Once the 2GB limit is reached, the OS blocks further writes, gracefully neutralizing the attack without impacting the host server's physical storage or triggering a system-wide OOM cascade.