Skip to main content

SSH Private Key Exposure: The Keys to the Kingdom

Why is id_rsa in my Dataset?

It sounds impossible. Why would anyone commit their SSH private key?

In Machine Learning projects, it happens more often than you think.

  1. Dotfile Backup: A developer writes a script to backup their configs (~/.ssh) and accidentally includes it in a dataset upload.
  2. Docker Context: Copying the entire home directory (COPY . .) into a Docker container, including hidden folders.
  3. Git Add All: Running git add . in the root directory without a proper .gitignore.

The Impact

An SSH private key (id_rsa, id_ed25519) allows an attacker to:

  • Access your production servers.
  • Clone private repositories.
  • Pivot laterally through your internal network.

Unlike API keys, SSH keys often don't have "usage limits" or "billing alerts." An attacker can persist in your network for months.

Detection Signatures

Private keys have highly recognizable headers. You don't need AI to find them; you need strict pattern matching.

Veritensor scans all file types (including text inside archives and datasets) for these headers:

  • -----BEGIN RSA PRIVATE KEY-----
  • -----BEGIN OPENSSH PRIVATE KEY-----
  • -----BEGIN PRIVATE KEY-----

Best Practices

  1. Never copy ~/.ssh into build contexts.
  2. Use SSH Agents instead of copying key files.
  3. Pre-commit Hooks: Use Veritensor as a pre-commit hook to block any commit containing these headers.
# .pre-commit-config.yaml
- repo: local
hooks:
- id: veritensor
name: Veritensor Scan
entry: veritensor scan .
language: system

Stopping a key from leaving your machine is infinitely cheaper than remediating a server breach.