OSS-Fuzz and ClusterFuzzLite: Continuous Fuzzing as a Supply Chain Security Control

OSS-Fuzz and ClusterFuzzLite: Continuous Fuzzing as a Supply Chain Security Control

Why Fuzzing Is a Supply Chain Control

The 2021 Log4Shell vulnerability was a remote code execution bug triggered by a crafted string passed to a logging function. The 2022 OpenSSL heap overflow (CVE-2022-3602) was triggered by a maliciously encoded X.509 certificate. Both bugs lived in parsing code — exactly where fuzzers excel.

Supply chain attackers do not need to compromise your build pipeline if the libraries you depend on already have exploitable memory corruption bugs. They target widely-used parsing and deserialization libraries because a single bug multiplies across every downstream consumer. OSS-Fuzz has found over 10,000 vulnerabilities in open-source projects since 2016, including critical bugs in libpng, FreeType, SQLite, and curl, most before any public exploit existed.

Fuzzing fits into a supply chain security posture alongside SLSA build provenance and SBOM generation. Provenance tells you what went in; an SBOM tells you what is present; fuzzing tells you whether the code is exploitable given attacker-controlled input. All three controls address different attack surfaces, and none substitutes for the others.

The threat model: an attacker supplies a crafted artifact — a certificate, a network packet, a configuration file, an archive — to your running service. Your service parses it using a library that has a latent heap overflow. The attacker gets code execution. Fuzzing catches this class of bug at development time, before the library version ships into your dependency tree.

libFuzzer Harness Structure

libFuzzer is a coverage-guided fuzzer built into LLVM. It calls a single function repeatedly with mutated inputs, tracking which code paths each input exercises and prioritizing mutations that reach new paths. The entry point is always the same:

#include <stdint.h>
#include <stddef.h>
#include "mylib.h"

int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    mylib_parse(data, size);
    return 0;
}

Three rules govern harness correctness:

  1. LLVMFuzzerTestOneInput must never call exit() or abort() — the fuzzer needs to keep running after each input. Return 0 always.
  2. Memory allocated inside the function must be freed before returning. Leaks cause AddressSanitizer to flag them as bugs if compiled with leak detection enabled.
  3. The harness should exercise one coherent code path. Cramming ten different parsers into one harness reduces coverage guidance effectiveness because mutations that improve coverage for parser A may not help parser B.

Corpus and Seed Corpus

The seed corpus is the set of valid inputs you provide to bootstrap the fuzzer. Without a seed corpus, the fuzzer starts from random bytes and may take hours to find the byte sequences that reach interesting code paths in a binary format parser. With a seed corpus of real-world examples, it reaches deep code paths within minutes.

Structure the seed corpus as a directory of files, one input per file:

corpus/
  seed/
    example1.cert    # real X.509 certificate
    example2.cert    # another certificate with edge-case fields
    minimal.cert     # smallest valid certificate

Pass it at fuzzer invocation:

./fuzz_cert_parser corpus/seed/ -max_len=65536 -timeout=30

libFuzzer will merge any new interesting inputs it discovers into the corpus directory. Over time the corpus grows to cover more code paths. Storing this corpus and reusing it across runs dramatically reduces the time to rediscover coverage already found.

C Harness With Initialization

Some libraries require one-time initialization. Use LLVMFuzzerInitialize for this:

#include <stdint.h>
#include <stddef.h>
#include <string.h>
#include "mylib.h"

int LLVMFuzzerInitialize(int *argc, char ***argv) {
    mylib_global_init();
    return 0;
}

int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    if (size < 4) {
        return 0;
    }

    mylib_ctx *ctx = mylib_ctx_new();
    if (!ctx) {
        return 0;
    }

    mylib_parse(ctx, data, size);
    mylib_ctx_free(ctx);
    return 0;
}

The size < 4 guard is a common pattern when the format has a minimum length. Returning early on degenerate inputs avoids false positives and keeps coverage guidance focused on meaningful mutations.

Build the harness with AddressSanitizer and coverage instrumentation:

clang -g -fsanitize=address,fuzzer \
  -fprofile-instr-generate -fcoverage-mapping \
  fuzz_cert_parser.c mylib.c -o fuzz_cert_parser

Writing a Go Fuzzing Harness

Go 1.18 introduced native fuzzing with go test -fuzz. No external tooling is required. Fuzz tests live alongside unit tests in _test.go files:

package myparser

import (
    "testing"
)

func FuzzParseCertificate(f *testing.F) {
    // Seed corpus entries
    f.Add([]byte{0x30, 0x82, 0x01, 0x00})
    f.Add([]byte(""))
    f.Add([]byte{0xFF, 0xFF, 0xFF, 0xFF})

    f.Fuzz(func(t *testing.T, data []byte) {
        cert, err := ParseCertificate(data)
        if err != nil {
            return
        }
        // If parsing succeeded, the result should be re-encodable
        // without panic. This catches logic bugs, not just crashes.
        _ = cert.Subject.String()
    })
}

Run it locally:

go test -fuzz=FuzzParseCertificate -fuzztime=60s ./pkg/myparser/

Go stores new corpus entries discovered during fuzzing in testdata/fuzz/FuzzParseCertificate/. These are checked into version control, so subsequent runs start with coverage already achieved. This is the Go equivalent of a libFuzzer corpus directory.

The Go fuzzer reports a failure when the fuzz function calls t.Fatal, t.Error, panics, or causes a nil dereference. Panics in production code are the primary signal — a panic in a parser that receives untrusted input is a denial-of-service vulnerability at minimum.

For memory safety bugs in cgo code called from Go fuzz targets, combine the native Go fuzzer with -race for data race detection, and wrap cgo calls with a recover to surface panics as test failures rather than process exits:

f.Fuzz(func(t *testing.T, data []byte) {
    defer func() {
        if r := recover(); r != nil {
            t.Fatalf("panic on input: %v", r)
        }
    }()
    ParseCertificateFromC(data)
})

ClusterFuzzLite: Fuzzing in GitHub Actions

ClusterFuzzLite is a trimmed-down version of the OSS-Fuzz infrastructure designed to run inside existing CI systems. It uses the same Docker-based build system as OSS-Fuzz but invokes fuzzers as GitHub Actions steps rather than on dedicated cloud VMs.

ClusterFuzzLite operates in two modes. Code change mode runs fuzzers for a short duration on every pull request — typically 600 to 1800 seconds — with the goal of catching regressions introduced by the change. Batch fuzzing mode runs for hours on a schedule, with the goal of deep corpus exploration. Both modes share the same corpus stored as a GitHub Actions artifact or in a GCS bucket.

GitHub Actions Workflow

name: ClusterFuzzLite
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  schedule:
    - cron: '0 2 * * *'

permissions:
  contents: read
  security-events: write

jobs:
  Build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build fuzzers
        id: build
        uses: google/clusterfuzzlite/actions/build_fuzzers@v1
        with:
          language: c++
          sanitizer: address

  CodeChangeFuzzing:
    needs: Build
    if: github.event_name == 'pull_request'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build fuzzers
        uses: google/clusterfuzzlite/actions/build_fuzzers@v1
        with:
          language: c++
          sanitizer: address
      - name: Run fuzzers (code change mode)
        uses: google/clusterfuzzlite/actions/run_fuzzers@v1
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          fuzz-seconds: 600
          mode: code-change
          sanitizer: address
          output-sarif: true

  BatchFuzzing:
    needs: Build
    if: github.event_name == 'schedule' || github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build fuzzers
        uses: google/clusterfuzzlite/actions/build_fuzzers@v1
        with:
          language: c++
          sanitizer: address
      - name: Run fuzzers (batch mode)
        uses: google/clusterfuzzlite/actions/run_fuzzers@v1
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          fuzz-seconds: 3600
          mode: batch
          sanitizer: address

  Coverage:
    needs: BatchFuzzing
    if: github.event_name == 'schedule' || github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build fuzzers (coverage instrumentation)
        uses: google/clusterfuzzlite/actions/build_fuzzers@v1
        with:
          language: c++
          sanitizer: coverage
      - name: Generate coverage report
        uses: google/clusterfuzzlite/actions/run_fuzzers@v1
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          fuzz-seconds: 600
          mode: coverage
          sanitizer: coverage

The output-sarif: true flag in code change mode emits SARIF output that GitHub uploads to the Security tab as code scanning alerts. Crashes found during a PR appear as annotations directly on the diff.

OSS-Fuzz Integration for Open-Source Projects

If your project is open source, OSS-Fuzz provides free, continuous fuzzing on dedicated infrastructure. Google runs the fuzzers 24/7 and automatically reports bugs to maintainers via the OSS-Fuzz issue tracker under a 90-day disclosure deadline.

OSS-Fuzz integration requires three files in an oss-fuzz/projects/<your-project>/ directory in the OSS-Fuzz repository.

project.yaml

homepage: "https://github.com/your-org/your-project"
language: c++
primary_contact: "security@your-org.com"
auto_ccs:
  - "lead-maintainer@your-org.com"
fuzzing_engines:
  - libfuzzer
  - afl
  - honggfuzz
sanitizers:
  - address
  - memory
  - undefined

fuzzing_engines controls which fuzzers OSS-Fuzz runs against your harnesses. Running all three catches different bug classes: libFuzzer with AddressSanitizer finds memory corruption; AFL with UndefinedBehaviorSanitizer finds integer overflows and out-of-bounds array accesses; MemorySanitizer finds reads of uninitialized memory that AddressSanitizer misses.

Dockerfile

FROM gcr.io/oss-fuzz-base/base-builder

RUN apt-get update && apt-get install -y \
    cmake \
    pkg-config \
    libssl-dev

COPY . $SRC/your-project
COPY build.sh $SRC/
WORKDIR $SRC/your-project

The base image gcr.io/oss-fuzz-base/base-builder provides LLVM, libFuzzer, and all sanitizer runtimes. You add only the dependencies specific to your project. Keep the image minimal — extra packages slow builds and increase the attack surface of the build environment itself.

build.sh

#!/bin/bash -eu

cd $SRC/your-project

cmake -DCMAKE_BUILD_TYPE=Release \
      -DCMAKE_C_COMPILER=$CC \
      -DCMAKE_CXX_COMPILER=$CXX \
      -DCMAKE_C_FLAGS="$CFLAGS" \
      -DCMAKE_CXX_FLAGS="$CXXFLAGS" \
      -DBUILD_FUZZING=ON \
      -B build_fuzz

cmake --build build_fuzz --parallel $(nproc)

cp build_fuzz/fuzz_cert_parser $OUT/
cp build_fuzz/fuzz_message_decoder $OUT/

# Copy seed corpus
zip -r $OUT/fuzz_cert_parser_seed_corpus.zip corpus/seed/
zip -r $OUT/fuzz_message_decoder_seed_corpus.zip corpus/message_seed/

The $CC, $CXX, $CFLAGS, and $CXXFLAGS environment variables are set by the OSS-Fuzz build system to inject the appropriate compiler and sanitizer flags for each fuzzing engine and sanitizer combination. Hardcoding these prevents OSS-Fuzz from running your harnesses with sanitizer configurations other than the one you tested with. Passing them through is mandatory.

Seed corpus archives follow the naming convention <fuzzer_binary>_seed_corpus.zip. OSS-Fuzz unpacks these automatically before running the fuzzer. The OSS-Fuzz infrastructure also persists the full corpus across runs in GCS and feeds it back on each invocation.

Reading AddressSanitizer and MemorySanitizer Output

When a fuzzer-instrumented binary crashes on a malformed input, AddressSanitizer prints a structured report. Reading it correctly determines whether a bug is exploitable and where the fix should go.

==12345==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000114
READ of size 4 at 0x602000000114 thread T0
    #0 0x401234 in parse_length /src/mylib/parser.c:87:12
    #1 0x401567 in parse_record /src/mylib/parser.c:142:5
    #2 0x401890 in mylib_parse /src/mylib/mylib.c:234:3
    #3 0x402abc in LLVMFuzzerTestOneInput /src/fuzz/fuzz_parser.c:18:5

0x602000000114 is located 4 bytes to the right of 16-byte region [0x602000000104,0x602000000114)
allocated by thread T0 here:
    #0 0x7f8abc123456 in malloc (/lib/x86_64-linux-gnu/libasan.so.6+0x...)
    #1 0x401234 in parse_header /src/mylib/parser.c:56:18

SUMMARY: AddressSanitizer: heap-buffer-overflow /src/mylib/parser.c:87:12 in parse_length

The critical fields: the error type (heap-buffer-overflow versus heap-use-after-free versus stack-buffer-overflow), the access type (READ versus WRITE — writes are generally more exploitable), and the allocation site tells you where the under-sized buffer was created. The stack trace at the access site tells you where the bounds check is missing.

heap-buffer-overflow WRITE is a strong exploitability signal. heap-buffer-overflow READ may be exploitable as an information leak. heap-use-after-free WRITE is almost always exploitable on modern allocators. Stack overflows are exploitable depending on whether the target has stack canaries enabled.

MemorySanitizer output is structurally similar but reports uninitialized reads, which AddressSanitizer does not detect:

==12345==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x401234 in compare_records /src/mylib/parser.c:203:8
    #1 0x401567 in find_record /src/mylib/parser.c:287:3

Uninitialized reads in comparison operations can leak heap layout through timing or through direct information disclosure if the value ends up in output. They are lower severity than write overflows but worth fixing before shipping.

To reproduce a crash locally from a fuzzer-generated input file:

./fuzz_cert_parser crash-a1b2c3d4e5f6

# Or with a specific sanitizer for more detail:
ASAN_OPTIONS=symbolize=1:abort_on_error=1 ./fuzz_cert_parser crash-a1b2c3d4e5f6

libFuzzer writes crash inputs to files named crash-<sha1>, timeout-<sha1>, and oom-<sha1> in the working directory. These files are the minimal inputs needed to reproduce the bug and should be added to the test suite as regression inputs after the fix lands.

Corpus Management

A large corpus slows fuzzer startup and reduces mutation diversity. Minimize the corpus after extended batch fuzzing runs using libFuzzer’s built-in merge mode:

./fuzz_cert_parser -merge=1 corpus/minimized/ corpus/full/

Merge mode processes inputs in corpus/full/, keeps only those that add at least one new coverage edge, and writes them to corpus/minimized/. The minimized corpus covers the same code paths with fewer files, which speeds up subsequent runs and reduces artifact storage.

Store the corpus in a location accessible across PRs. ClusterFuzzLite supports storing corpus in GitHub Actions artifacts with automatic download on each run. For larger projects, a GCS bucket shared across all CI jobs provides better durability and avoids artifact size limits.

- name: Run fuzzers (batch mode)
  uses: google/clusterfuzzlite/actions/run_fuzzers@v1
  with:
    github-token: ${{ secrets.GITHUB_TOKEN }}
    fuzz-seconds: 3600
    mode: batch
    sanitizer: address
    storage-provider: gcs
    bucket: your-project-corpus
    gcs-key: ${{ secrets.GCS_SERVICE_ACCOUNT_KEY }}

When a new code path is introduced in a PR, code change mode will not have corpus inputs that exercise it. The fuzzer starts from the seed corpus and discovers mutations that reach the new path. If the new path has a bug, code change mode will find it — or flag that the new code is not covered at all, which is itself actionable.

Integrating Fuzzer Coverage Into PR Gates

Coverage-guided fuzzing produces line and branch coverage data as a side effect of running. Using ClusterFuzzLite’s coverage mode, you can generate an lcov report and enforce a minimum coverage threshold on new code introduced in a PR.

After the coverage job runs, download the report and compute per-file coverage for files modified in the PR:

# Extract coverage for changed files only
git diff --name-only origin/main...HEAD > changed_files.txt

# Parse lcov report for those files
lcov --extract coverage.info $(cat changed_files.txt | tr '\n' ' ') \
     --output-file pr_coverage.info

# Enforce minimum line coverage
lcov --summary pr_coverage.info 2>&1 | grep "lines" | \
  awk '{if ($2+0 < 70) exit 1}'

This gates the PR on 70% line coverage of changed files by the fuzzer corpus. The threshold is aggressive for a fuzzer gate — unlike unit test coverage, fuzzer coverage depends on how long the fuzzer has run and how mature the corpus is. A more useful gate is to fail if coverage of new code is lower than coverage of existing code, which catches cases where a developer adds a complex parser without any seed corpus entries that exercise it.

The SARIF output from code change mode integrates directly into GitHub’s code scanning. Set it as a required check in branch protection rules:

# Branch protection configuration via GitHub API
required_status_checks:
  - "ClusterFuzzLite / CodeChangeFuzzing"
  - "ClusterFuzzLite / Build"

With these checks required, a PR cannot merge if the fuzzer finds a crash within the 600-second window. This is not a guarantee that the code is bug-free — 600 seconds is not long enough to explore a complex parser thoroughly — but it catches the obvious cases: null pointer dereferences on empty input, integer overflows on maximum-length fields, and use-after-free bugs introduced by refactors.

The batch fuzzing job on main provides the deeper coverage. Bugs found there should open security issues automatically. ClusterFuzzLite integrates with GitHub Issues via the file-github-issue: true parameter on the run action, opening an issue with the crash input attached when a new unique crash is found.

The combination of short-duration PR fuzzing plus long-duration nightly batch fuzzing plus corpus sharing across both gives you a practical continuous fuzzing posture without dedicated fuzzing infrastructure. For open-source projects, adding OSS-Fuzz integration on top of ClusterFuzzLite gives you 24/7 fuzzing on Google’s infrastructure at no cost, with automatic security disclosure management. The three-file OSS-Fuzz integration (project.yaml, Dockerfile, build.sh) is worth the setup time for any widely-distributed parsing library.