YAML Deserialization Attacks: The Danger of `yaml.load`

YAML in the AI Stack

YAML is everywhere in Machine Learning. We use it for:

CI/CD pipelines (GitHub Actions).
Environment configurations (conda, docker-compose).
Hyperparameter tuning configs (Hydra, OmegaConf).

However, the standard Python library PyYAML has a dangerous history.

The Unsafe Default

For a long time, the default yaml.load() function was capable of instantiating arbitrary Python objects. While recent versions have made safe_load() the default, many legacy codebases and tutorials still use unsafe patterns.

An attacker can craft a YAML file that exploits specific tags to execute code:

!!python/object/apply:os.system
args: ["cat /etc/passwd"]

If your training script loads this config file using a vulnerable loader, the command executes.

The Attack Vector in MLOps

This is a major risk for MLOps platforms that accept user-submitted configuration files to define training jobs. If an attacker submits a malicious job_config.yaml, they can escape the container or steal cloud credentials.

Secure Parsing

Always use safe_load(): This method restricts loading to standard data types (lists, dicts, strings).
Avoid pickle in YAML: Never enable the !!python/object tag unless absolutely necessary and within a trusted boundary.

Auditing with Veritensor

Veritensor scans your repository for YAML files and checks for known deserialization gadgets. It also scans your Python code (via AST analysis) to detect usages of yaml.load() without the Loader=SafeLoader argument.

By enforcing safe_load across your codebase, you eliminate an entire class of RCE vulnerabilities.

YAML in the AI Stack​

The Unsafe Default​

The Attack Vector in MLOps​

Secure Parsing​

Auditing with Veritensor​

YAML in the AI Stack

The Unsafe Default

The Attack Vector in MLOps

Secure Parsing

Auditing with Veritensor