Defending Against Fake HuggingFace Repository Attacks: Model Artifact Verification

Defending Against Fake HuggingFace Repository Attacks: Model Artifact Verification

Problem

On May 10, 2026, a threat actor registered the HuggingFace organisation Open-OSS and published a repository named privacy-filter. The legitimate model they were mimicking was OpenCSS/privacy-filter, a well-regarded privacy-classification model used in content pipelines. The difference: one character, a lowercase c versus an uppercase C in the organisation name. The fake repository had 244,000 downloads recorded before HuggingFace’s trust and safety team removed it approximately 36 hours after publication.

The attack mechanism was not in the model weights. The repository contained a plausible model.safetensors file that passed automated scanners, a convincing model card with benchmark numbers copied from the legitimate project, and a scripts/preprocess.py file. The README instructions — formatted to look identical to the legitimate repository’s quickstart — told users to run python scripts/preprocess.py --setup before loading the model. That script downloaded a Rust-compiled binary from a third-party CDN, verified it with a hardcoded checksum (which only confirmed the file downloaded correctly, not that it was safe), and executed it. The binary harvested AWS credentials from ~/.aws/credentials, environment variables containing TOKEN, KEY, SECRET, or PASSWORD, and the HuggingFace token from ~/.cache/huggingface/token, then exfiltrated them to a command-and-control endpoint over HTTPS.

This incident represents a maturation of the ML supply chain attack surface. Earlier attacks targeting PyPI or npm packages relied on developers executing pip install or npm install on a typosquatted package name. The HuggingFace vector is more subtle in two respects. First, the model weights file itself can be entirely benign, so malware scanners that inspect pickle or safetensors content will find nothing. The malicious payload lives in a supplementary script presented as legitimate setup tooling. Second, the attacker exploited learned behaviour: ML developers are trained to follow README quickstart instructions without scrutiny, because model-specific preprocessing steps are genuinely common and necessary. The cognitive overhead of evaluating whether a setup script is legitimate is higher than evaluating whether a package dependency is appropriate.

Any organisation that downloads models from HuggingFace Hub — whether directly through the huggingface_hub Python library, through transformers.AutoModel.from_pretrained(), or by following documentation instructions in a Jupyter notebook — is exposed to this class of attack. The 244,000 download figure implies the fake repository was indexed by HuggingFace search and appeared near the top of results for the model name, likely amplified by bot-driven download inflation in the hours following publication.


Threat Model

Understanding what you are defending against determines which controls are worth deploying. The HuggingFace fake-repository surface has four distinct attack vectors:

Vector 1: Typosquatted repository with malicious supplementary script. The attacker publishes a repository with a name one or two characters different from a legitimate model. The model weights are benign. The README includes setup instructions that run a script, which downloads and executes a malicious binary. This is the vector used in the May 2026 incident. The key characteristic is that automated scanning of the weights file provides no signal — the threat lives in the scripting layer.

Vector 2: Malicious pickle-serialised model weights. PyTorch .pt and .pth files are serialised using Python’s pickle module. A pickle stream is not a data format; it is an instruction stream that Python’s unpickler executes. A crafted .pt file can contain arbitrary Python opcodes — spawning a reverse shell, reading environment variables, or importing and executing additional modules — which run at the moment the file is passed to torch.load(). This vector requires no supplementary script: loading the weights is the exploit.

Vector 3: Trojanised model alongside benign safetensors. A repository can contain both a legitimate model.safetensors and a malicious pytorch_model.bin. The transformers library’s from_pretrained() loading order historically preferred safetensors when both were present, but edge cases exist. An attacker who understands the loading priority can craft a repository where the benign file is loaded in the standard path while the malicious file is loaded by pipelines that explicitly request PyTorch format or by older library versions.

Vector 4: SEO-poisoned search results via download inflation. HuggingFace search ranking incorporates download counts and trending signals. A new repository can be pushed to the top of search results for a model name within hours if an attacker runs automated download scripts against it. Developers who search for a model name and click the first result — without checking the organisation name carefully — will reach the fake repository. This vector amplifies all of the above: the attacker does not need to wait for organic discovery.

The common thread across all four vectors is trust that HuggingFace Hub’s namespace implies vetting. It does not. HuggingFace Hub is a public repository. Any account can publish any model. Organisation names are not protected in the same way that verified domains are protected. Defending against these vectors requires moving trust from namespace to artifact.


Configuration and Implementation

Step 1: Verify Repository Identity Before Downloading

Before downloading any model, perform a manual identity check. A checklist of red flags:

  • Organisation account age: Navigate to https://huggingface.co/<org>. A legitimate organisation publishing a widely-used model will have been active for months or years. An organisation created within the past 30 days publishing a supposedly established model is a strong red flag.
  • Star count velocity: Hundreds of stars appearing within 24–48 hours of repository creation is a signal of artificial inflation.
  • Commit history: A legitimate model repository accumulates commits over time — weight updates, config changes, model card improvements. A repository with a single initial commit containing all files, published days ago, is suspicious.
  • Model card completeness: Legitimate model cards include training data disclosure, evaluation benchmarks with linked datasets, intended use, and limitations. Cards that are copied from another repository (check for identical wording) or that are minimal placeholders are red flags.
  • Organisation name character inspection: Copy the organisation name from the URL bar, not from the display text, and compare it character by character with the expected name. Unicode lookalike characters (0 vs O, l vs 1, Cyrillic е vs Latin e) do not show up in visual inspection of rendered HTML.
  • README setup instructions: Any README that asks you to run a script, execute a binary, or run a command that downloads and executes additional code outside of a standard pip install or transformers load is a significant red flag. Legitimate models do not need you to run untrusted setup scripts.

Step 2: Use huggingface_hub Safely

The huggingface_hub library provides controls that limit what you download. Use them.

Prefer hf_hub_download for individual files rather than cloning entire repositories:

from huggingface_hub import hf_hub_download

# Download only the model weights — no scripts, no executables
model_path = hf_hub_download(
    repo_id="OpenCSS/privacy-filter",
    filename="model.safetensors",
    # Pin to a specific revision for reproducibility and tamper detection
    revision="a3f2c1d",
)

When you need the full repository snapshot, exclude scripts and executables explicitly:

from huggingface_hub import snapshot_download

local_dir = snapshot_download(
    repo_id="OpenCSS/privacy-filter",
    revision="a3f2c1d",
    # Exclude anything that could execute — scripts, binaries, notebooks
    ignore_patterns=[
        "*.py",
        "*.sh",
        "*.bash",
        "*.exe",
        "*.bin",        # Exclude PyTorch pickle files; load safetensors only
        "scripts/*",
        "notebooks/*",
        "tools/*",
        "*.ipynb",
    ],
    # Do NOT use allow_patterns=["*.safetensors"] alone — that misses configs
    # Include config.json and tokenizer files explicitly if needed
)

Never clone a HuggingFace repository with git clone. Git LFS pulls all large files including any binary blobs tracked with LFS. You lose the ability to selectively exclude files, and you bypass the huggingface_hub library’s file-level hash verification.

Do not use trust_remote_code=True unless the model organisation is on your approved allowlist and you have reviewed the remote code. When trust_remote_code=True is set, from_pretrained() imports Python files from the model repository and executes them locally. This is arbitrary code execution by design.

Step 3: Verify SHA256 Hashes Before Loading

HuggingFace Hub stores SHA256 hashes for every file in a repository. The huggingface_hub library exposes these through model_info(). Verify hashes before loading any file:

import hashlib
from pathlib import Path
from huggingface_hub import hf_hub_download, model_info

REPO_ID = "OpenCSS/privacy-filter"
FILENAME = "model.safetensors"
# Pin the revision; never verify against a floating ref like "main"
REVISION = "a3f2c1d"

def get_expected_sha(repo_id: str, filename: str, revision: str) -> str:
    """Fetch the expected SHA256 from HuggingFace Hub metadata."""
    info = model_info(repo_id, revision=revision)
    for sibling in info.siblings:
        if sibling.rfilename == filename:
            if sibling.blob_id is None:
                raise ValueError(f"No blob_id (hash) available for {filename}")
            return sibling.blob_id
    raise FileNotFoundError(f"{filename} not found in {repo_id}@{revision}")

def sha256_file(path: str) -> str:
    """Compute SHA256 of a local file."""
    h = hashlib.sha256()
    with open(path, "rb") as f:
        for chunk in iter(lambda: f.read(65536), b""):
            h.update(chunk)
    return h.hexdigest()

def verified_download(repo_id: str, filename: str, revision: str) -> Path:
    """Download a model file and verify its hash before returning the path."""
    expected_hash = get_expected_sha(repo_id, filename, revision)
    local_path = hf_hub_download(repo_id=repo_id, filename=filename, revision=revision)
    actual_hash = sha256_file(local_path)

    if actual_hash != expected_hash:
        raise ValueError(
            f"Hash mismatch for {filename}!\n"
            f"  Expected: {expected_hash}\n"
            f"  Got:      {actual_hash}\n"
            "Do not load this file."
        )

    print(f"Hash verified: {filename} matches {expected_hash[:16]}...")
    return Path(local_path)

model_path = verified_download(REPO_ID, FILENAME, REVISION)

This verification catches file tampering in transit and catches cases where the HuggingFace repository itself has been modified after you noted the expected hash. Pinning to a specific revision hash means that even if the repository’s main branch is updated with a malicious commit, your verification will fail against the original expected hash.

Step 4: Prefer Safetensors; Never Load Untrusted Pickle Files

Safetensors is a tensor serialisation format designed specifically to prevent code execution. It contains only tensor data and metadata — no Python opcodes, no callable objects, no arbitrary serialised class instances. PyTorch pickle files (.pt, .pth, .bin) can contain all of these.

from safetensors.torch import load_file
from transformers import AutoConfig, AutoModelForSequenceClassification

# Load config separately from weights
config = AutoConfig.from_pretrained(model_path.parent)

# Load weights using safetensors — no pickle execution possible
state_dict = load_file(str(model_path))

# Instantiate model from config and load verified weights
model = AutoModelForSequenceClassification.from_config(config)
model.load_state_dict(state_dict)

If you must load a PyTorch pickle file from a source you have not fully vetted, use weights_only=True as a minimum precaution:

import torch

# weights_only=True restricts unpickling to tensor data and basic Python types.
# It will raise an error if the pickle stream attempts to instantiate arbitrary classes.
state_dict = torch.load("model.pt", weights_only=True, map_location="cpu")

weights_only=True is not a complete defence — sufficiently crafted pickle payloads targeting primitive types can still cause harm — but it raises the cost of exploitation significantly.

Step 5: Maintain an Allowlist of Approved Model Sources

Define the set of HuggingFace organisations and specific repositories that your team is permitted to use. Store this as a checked-in configuration file:

# approved_models.py — commit this to your internal tooling repository

APPROVED_ORGS = {
    "openai-community",
    "google",
    "meta-llama",
    "mistralai",
    "OpenCSS",   # Add specific orgs after vetting
}

APPROVED_REPOS = {
    "OpenCSS/privacy-filter@a3f2c1d",
    "meta-llama/Llama-3-8B-Instruct@5b0d96f",
    # Pin to specific revisions — never approve floating refs
}

def assert_approved(repo_id: str, revision: str) -> None:
    org = repo_id.split("/")[0]
    pinned = f"{repo_id}@{revision}"
    if pinned not in APPROVED_REPOS and org not in APPROVED_ORGS:
        raise PermissionError(
            f"{repo_id}@{revision} is not in the approved model list.\n"
            "Open a pull request to add it after security review."
        )

Integrating this check into your model loading utilities means unapproved models cannot be loaded without an explicit PR that goes through review. The review process is where the organisation account age, commit history, and model card completeness checks happen — not ad-hoc at download time.

Step 6: Mirror Approved Models to an Internal Registry

Once a model has been approved and verified, mirror it to an internal artifact store. This eliminates ongoing dependency on HuggingFace Hub availability and prevents the scenario where an approved repository is later compromised and re-downloaded.

Using an S3 bucket as a model registry:

# One-time: download and verify the approved model
python -c "
from your_tooling.verified_download import verified_download
verified_download('OpenCSS/privacy-filter', 'model.safetensors', 'a3f2c1d')
"

# Mirror to internal S3 registry
aws s3 cp \
  ~/.cache/huggingface/hub/models--OpenCSS--privacy-filter/blobs/ \
  s3://your-ml-registry/models/OpenCSS/privacy-filter/a3f2c1d/ \
  --recursive \
  --sse aws:kms \
  --sse-kms-key-id arn:aws:kms:us-east-1:123456789:key/your-key-id

Set the HF_ENDPOINT environment variable in training and inference environments to point at your internal mirror:

# In training job environment
export HF_ENDPOINT=https://ml-registry.internal.yourcompany.com
export HUGGINGFACE_HUB_VERBOSITY=warning

Your internal mirror serves models over HTTPS with TLS certificate pinning, eliminating DNS spoofing and MITM risks that exist when fetching directly from huggingface.co.

Step 7: Sign Model Artifacts with Cosign

Use cosign to create detached signatures over model files, stored alongside the models in your internal registry. This creates a cryptographic chain of custody: a signature proves that a specific person or CI pipeline reviewed and approved the model before it was admitted to the registry.

# Generate a signing key pair (store the private key in your secrets manager)
cosign generate-key-pair

# Sign the model file after verification
cosign sign-blob \
  --key cosign.key \
  --output-signature model.safetensors.sig \
  --output-certificate model.safetensors.pem \
  model.safetensors

# Upload signature and certificate alongside the model
aws s3 cp model.safetensors.sig s3://your-ml-registry/models/OpenCSS/privacy-filter/a3f2c1d/model.safetensors.sig
aws s3 cp model.safetensors.pem s3://your-ml-registry/models/OpenCSS/privacy-filter/a3f2c1d/model.safetensors.pem

Verify the signature before loading in any environment:

# In training job bootstrap script
cosign verify-blob \
  --key cosign.pub \
  --signature model.safetensors.sig \
  --certificate model.safetensors.pem \
  model.safetensors || { echo "Signature verification failed — aborting"; exit 1; }

The signing key’s private component should live in your secrets manager (AWS Secrets Manager, HashiCorp Vault) and only be accessible to the CI pipeline that admits models to the registry. Developers cannot sign models themselves; they open a PR to the approved-models list, and the CI pipeline performs the download, verification, cosign signing, and upload.

Step 8: Scan Model Files with ModelScan

Before approving any new model into your registry, run modelscan — a Python tool specifically designed to detect malicious code in ML model files, including pickle payloads and suspicious serialised objects:

# Install
pip install modelscan

# Scan all files in a downloaded model directory
modelscan scan -p ~/.cache/huggingface/hub/models--OpenCSS--privacy-filter/

# Scan a specific file
modelscan scan -p model.safetensors
modelscan scan -p pytorch_model.bin

# Fail CI if any issues found (exit code 1 on findings)
modelscan scan -p . --exit-error-level 1

Integrate modelscan into your CI pipeline as a gate before cosign signing:

# GitHub Actions example — model approval workflow
- name: Scan model artifacts
  run: |
    pip install modelscan
    modelscan scan -p ./downloaded_model/ --exit-error-level 1
  
- name: Sign approved model
  if: success()
  run: |
    cosign sign-blob --key ${{ secrets.COSIGN_KEY }} \
      --output-signature model.safetensors.sig \
      model.safetensors

modelscan is not a complete defence — it targets known-bad patterns and will not catch novel techniques — but it eliminates opportunistic pickle payload attacks that use standard exploit patterns.

Step 9: Control Network Egress from Training Environments

If malicious code does execute in a training or inference environment, network egress controls determine whether the attack succeeds. An infostealer binary cannot exfiltrate credentials if all outbound connections from the environment are blocked except to approved registries.

In Kubernetes training environments, use NetworkPolicy to restrict egress:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: training-job-egress
  namespace: ml-training
spec:
  podSelector:
    matchLabels:
      role: training-job
  policyTypes:
    - Egress
  egress:
    # Allow DNS
    - ports:
        - protocol: UDP
          port: 53
    # Allow only internal model registry and approved endpoints
    - to:
        - namespaceSelector:
            matchLabels:
              name: ml-registry
      ports:
        - protocol: TCP
          port: 443
    # Block everything else — no direct HuggingFace access from training pods

This defence-in-depth control means that even a successfully executed infostealer binary cannot phone home, and cannot reach the HuggingFace CDN to download additional payloads. It also prevents a compromised model from exfiltrating secrets to an attacker-controlled endpoint during training.


Expected Behaviour

The following table maps download scenarios to verification outcomes and the action your tooling should take:

Download Scenario Verification Outcome Action
Approved org repo, pinned revision, cosign signature present and valid All checks pass: allowlist match, hash match, signature valid Load model — proceed
Approved org repo, pinned revision, cosign signature missing Allowlist match, hash match, but no signature Block load; trigger re-signing workflow in CI
Typosquatted repo (Open-OSS/privacy-filter), not on allowlist Allowlist check fails immediately Block download; alert security team
Approved repo, revision hash mismatch (file tampered in transit) Allowlist match, hash mismatch Block load; alert; do not cache the file
Internal registry copy, cosign signature valid Hash match against registry, signature valid Load model — fastest path for production inference
Unsigned model from unknown source in notebook No allowlist entry, no signature Block load; log researcher’s identity; prompt to open approval PR
modelscan finding in newly submitted model Scanner reports malicious pickle pattern Reject from registry; file incident report; notify HuggingFace

Trade-offs

Control Security Gain Operational Cost
Allowlist-only model loading Eliminates typosquatting and unvetted model risk entirely Slows research workflows — new models require a PR and review cycle (typically 1–2 days); researchers may work around controls using personal environments
Safetensors-only loading (reject .pt/.pth/.bin) Eliminates pickle code execution at load time Some older models and community fine-tunes are published only in PyTorch format; researchers needing these models must either find safetensors alternatives or request conversion
Internal model registry (mirror from HuggingFace) Eliminates ongoing dependency on HuggingFace availability; prevents re-download of later-compromised approved repos Storage costs (large models run 10–150 GB each); synchronisation lag when approved models receive legitimate updates; ops burden to maintain registry infrastructure
weights_only=True on torch.load() Significantly reduces pickle exploit surface by rejecting arbitrary class instantiation Some model architectures use custom serialised objects that fail to load with weights_only=True; requires model-specific workarounds or migration to safetensors

Failure Modes

Failure Mode Trigger Condition Consequence Mitigation
Allowlist not updated after legitimate model release Security team is slow to process approval PRs; researchers are blocked on a legitimate new model Researchers load models from unapproved sources (personal machines, shadow environments) to bypass the bottleneck, creating an unmonitored shadow pipeline Set SLA for approval reviews (e.g., 48 hours); add self-service fast-track for models from already-approved organisations
Internal registry sync fails Registry sync job errors silently; training jobs continue loading stale model versions Production model version drifts from what was approved; models are not updated when legitimate security patches are published by the upstream organisation Alert on sync job failure; expose model version in training job metadata so drift is detectable
Cosign verification skipped in Jupyter notebooks Researchers run notebooks interactively and do not invoke the verification wrapper; use huggingface_hub directly All cryptographic chain-of-custody protection is absent for notebook-based experimentation, which is often where new untrusted models are first loaded Enforce verification at the Python level rather than the developer’s discipline: wrap hf_hub_download at the library level using a custom resolver; deploy a Jupyter server extension that intercepts model loads
modelscan false negative on novel attack technique Attacker uses an obfuscated or previously unseen pickle payload pattern not in modelscan’s detection rules Malicious model admitted to internal registry with a valid cosign signature; all downstream environments trust it Treat modelscan as one layer, not the only layer; pair with network egress control and behavioural monitoring so exfiltration is blocked even if the payload executes

Hardening Checklist

Before loading any model from HuggingFace Hub:

  • [ ] Verified organisation account age is greater than 90 days
  • [ ] Confirmed repository name character-by-character against authoritative source (paper, official documentation)
  • [ ] Checked commit history for plausible development trajectory
  • [ ] Repository is on the approved-models allowlist with a pinned revision
  • [ ] Downloaded using hf_hub_download or snapshot_download with ignore_patterns excluding scripts
  • [ ] SHA256 hash verified against HuggingFace Hub metadata for the pinned revision
  • [ ] Model file is in safetensors format; no .pt/.pth files loaded without weights_only=True
  • [ ] modelscan scan run on all downloaded files with no findings
  • [ ] Model mirrored to internal registry and signed with cosign
  • [ ] Training/inference environment has egress NetworkPolicy blocking outbound except to approved registries