Skip to main content

Dependency Confusion: How Hackers Infiltrate Internal Tools

The "Higher Version" Trick

In 2021, security researcher Alex Birsan hacked Apple, Microsoft, and Tesla using a simple trick called Dependency Confusion.

The concept is simple:

  1. A company uses an internal, private Python package named company-utils. It is hosted on a private PyPI mirror.
  2. An attacker registers a package named company-utils on the public PyPI repository.
  3. The attacker gives the public package a ridiculously high version number (e.g., 99.9.9).

Why pip Chooses the Malware

When a developer (or a CI/CD pipeline) runs pip install company-utils, pip looks at both the private index and the public PyPI.

By default, pip prioritizes the highest version number.

It sees v1.2 on the private server and v99.9 on public PyPI. It assumes v99.9 is the latest update and installs the attacker's package. The malicious code executes, steals environment variables, and sends them to the attacker.

The Risk to AI Pipelines

AI teams often build internal libraries for data processing (internal-data-loader) or model serving (company-llm-client). These are prime targets. If an attacker can guess the name of your internal library (often leaked in public GitHub issues or JavaScript bundles), they can hijack your pipeline.

Defense Strategies

  1. Scope your packages: In some ecosystems (like NPM), you can use @company/package. Python is slowly moving this way, but it's not universal.
  2. Use --index-url correctly: Ensure your pip.conf is configured to only check your private mirror for internal packages.
  3. Lock your dependencies: Always use poetry.lock or pipfile.lock with hash checking.

Auditing with Veritensor

Veritensor helps mitigate this by scanning your dependency files. While it cannot know your private architecture, it flags packages that have:

  • Suspiciously high version numbers.
  • Names that conflict with known public packages but come from unknown sources.
  • Unpinned versions (e.g., package>=1.0), which are susceptible to this attack.

Securing the supply chain means verifying not just what you install, but where it comes from.