From Coverage to Causes: Data-Centric Fuzzing for JavaScript Engines

This repository contains the official artifacts (scripts and data) for the paper "From Coverage to Causes: Data-Centric Fuzzing for JavaScript Engines", a data-centric approach to finding vulnerabilities in the V8 JavaScript engine.

Authors: Kishan Kumar Ganguly, Tim Menzies
Contact: kgangul@ncsu.edu, tjmenzies@ncsu.edu
Artifacts URL: https://github.com/anon-artifacts/DataCentricFuzzJS

🚀 Overview

This project challenges traditional coverage-guided fuzzing, which is often inefficient for complex targets like the V8 JavaScript engine. Such fuzzers waste effort on low-risk inputs and can accidentally destroy vulnerability-triggering patterns.

Instead of prioritizing new code paths (asking "is this path new?"), we introduce feature-guided fuzzing. Our approach asks, "does this code look dangerous?".

To do this, we use:

LLM-Boosted Feature Engineering: We use a Large Language Model (LLM) to analyze historical V8 vulnerabilities and automatically generate a set of static (code-based) and dynamic (runtime-based) features indicative of high-risk inputs.
Predictive Guidance Model: We train an XGBoost classifier on these features to predict the likelihood that a new input will trigger a vulnerability.
Feature-Guided Fuzzer: We built a fuzzer (on top of Fuzzilli) that uses this model's prediction score as its primary guidance, dedicating 90% of its effort to "exploiting" high-risk seeds and 10% to "exploring" new coverage.

Our results show this model is highly precise (over 85% precision) with a very low false alarm rate (under 1%). We also found that only the top 25% of features are needed for this performance, making the fuzzer fast and efficient.

🛠️ Installation and Setup

1. Install Dependencies

First, run the main dependency installation script:

./install_deps.sh

2. Install V8 Engine (via `jsvu`)

For most of the predictive modeling and evaluation, you will need a d8 shell. The easiest way to get a specific V8 version is by using jsvu:

# Install jsvu globally
sudo npm install -g jsvu

# Install the specific V8 version used for experiments
jsvu v8@14.1.146

3. Build V8 from Source with Fuzzing Flags

To run the full fuzzer (RQ5) and reproduce the instrumented build, you must compile V8 from source using depot_tools with specific build flags (e.g., ASAN, coverage).

# Define environment variables (adjust paths as needed)
# Assumes v8.tar.gz is in the current directory
export DEPS="$HOME/Downloads/depot_tools"
export V8_COMMIT="YOUR_SPECIFIC_V8_COMMIT_HASH" # e.g., from the paper's experiment
export OUTDIR="out/fuzz"
export NINJA_JOBS=16 # Adjust to your core count

# --- 1. Get Depot Tools ---
if [ ! -d "$DEPS" ]; then
  git clone --depth=1 [https://chromium.googlesource.com/chromium/tools/depot_tools](https://chromium.googlesource.com/chromium/tools/depot_tools) "$DEPS"
else
  echo "depot_tools already exists at $DEPS"
fi

# --- 2. Add to PATH ---
export PATH="$DEPS:$PATH"
if ! grep -Fxq 'export PATH="$HOME/Downloads/depot_tools":$PATH' "$HOME/.bashrc" 2>/dev/null; then
  echo 'export PATH="$HOME/Downloads/depot_tools":$PATH' >> "$HOME/.bashrc"
  echo "Appended depot_tools PATH to ~/.bashrc"
else
  echo "depot_tools PATH already in ~/.bashrc"
fi

# --- 3. Fetch and Build V8 ---
if [ ! -d "v8" ]; then
  tar -zxvf v8.tar.gz || true
else
  echo "v8 directory already exists, skipping fetch"
fi

# Enter V8 directory and build
cd v8 && \
git checkout "$V8_COMMIT" || true && \
gclient sync --with_branch_heads --with_tags || true && \
gn gen "$OUTDIR" --args='is_debug=false is_asan=true dcheck_always_on=true v8_static_library=true v8_enable_verify_heap=true v8_fuzzilli=true sanitizer_coverage_flags="trace-pc-guard" target_cpu="x64"' && \
ninja -j"$NINJA_JOBS" -C "$OUTDDIR" d8

⚙️ Configuration

Set V8 Version in Scripts

Before running the evaluation scripts, you must ensure the dynamic feature extractor points to the correct V8 d8 binary that you installed.

File to Edit: dynamic_feature_extractor_v2.py
Action: Open this file and update the variable holding the V8 version/path to match the one you installed (e.g., v8@14.1.146).

🔬 Reproducing the Paper's Results

The following scripts reproduce the key Research Questions (RQs) from the paper.

RQ2 & RQ4 (Predictive Performance & Minimal Feature Set)

To evaluate the predictive performance (precision, recall, false alarm) of the guidance model using the time-aware cross-validation setup, run:

python3.13 prediction_time_aware.py

This script also provides the data for analyzing the minimal feature set (RQ4).

RQ3 (Ablation Study: Static vs. Dynamic Features)

To run the ablation study that compares the performance of the static-only, dynamic-only, and combined feature models, run:

python3.13 prediction_compare_time_aware.py > compare.txt

RQ5 (Feature-Guided Fuzzing)

Running the full feature-guided fuzzer is a multi-step process. It requires populating the database, launching the prediction services, and then starting the fuzzer.

1. Populate the Database

Load the corpus (vulnerable PoCs and benign mjsunit tests) into the database.

# Load vulnerable PoCs
./load_corpus.sh corpus_pocs loaded_corpus 10 16

# Load benign test cases from mjsunit
./load_corpus.sh corpus_mjsunit loaded_corpus 10 16

2. Run Prediction Services

These services provide the model's predictions to the fuzzer in real-time. Run each command in a separate terminal.

# Terminal 1: SHAP service for feature importance
python3.13 predict_minimal_shap.py

# Terminal 2: Main crash prediction service
python3.13 crash_predictor_service.py

3. Run the Fuzzer

With both services running, you can launch the fuzzer:

./run.sh loaded_corpus

The fuzzer will now communicate with the services to score and prioritize seeds based on their predicted risk.

License and Attribution

This project includes modifications to Fuzzilli, which is licensed under the Apache License 2.0.

📜 Citation

If you use this work, please cite the original paper:

@article{Ganguly2025DataCentricFuzzJS,
  title   = {From Coverage to Causes: Data-Centric Fuzzing for JavaScript Engines},
  author  = {Kishan Kumar Ganguly and Tim Menzies},
  journal = {Preprint submitted to Information and Software Technology},
  year    = {2025},
  month   = {Oct},
  note    = {Available at \url{[https://github.com/anon-artifacts/DataCentricFuzzJS](https://github.com/anon-artifacts/DataCentricFuzzJS)}}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Cloud		Cloud
Docs		Docs
Sources		Sources
Targets		Targets
Tests/FuzzilliTests		Tests/FuzzilliTests
corpus_mjsunit		corpus_mjsunit
corpus_pocs		corpus_pocs
folds		folds
loaded_corpus		loaded_corpus
plot_timed_fractions		plot_timed_fractions
swift-protobuf		swift-protobuf
.gitignore		.gitignore
.swift-version		.swift-version
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
Package.swift		Package.swift
README.md		README.md
collect_train_features_files_issue_date.py		collect_train_features_files_issue_date.py
crash_predictor_cli_final.py		crash_predictor_cli_final.py
crash_predictor_service.py		crash_predictor_service.py
create_feature_ranges.py		create_feature_ranges.py
dynamic_feature_extractor_v2.py		dynamic_feature_extractor_v2.py
dynamic_ic.py		dynamic_ic.py
dynamic_maps.py		dynamic_maps.py
feature_cache.py		feature_cache.py
feature_desc.txt		feature_desc.txt
feature_extractor_cli.py		feature_extractor_cli.py
feature_extractor_v4.py		feature_extractor_v4.py
feature_ranges.csv		feature_ranges.csv
features_dynamic_neg.csv		features_dynamic_neg.csv
features_static_neg.csv		features_static_neg.csv
final_crash_detection_model.pkl		final_crash_detection_model.pkl
flow_diagram.svg		flow_diagram.svg
install_deps.sh		install_deps.sh
load_corpus.sh		load_corpus.sh
package.json		package.json
parallel_fuzzil.sh		parallel_fuzzil.sh
predict_minimal_shap.py		predict_minimal_shap.py
prediction.py		prediction.py
prediction_cache.db		prediction_cache.db
prediction_compare_time_aware.py		prediction_compare_time_aware.py
prediction_time_aware.py		prediction_time_aware.py
requirements.txt		requirements.txt
run.sh		run.sh
valid_mutation_marker.py		valid_mutation_marker.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

From Coverage to Causes: Data-Centric Fuzzing for JavaScript Engines

🚀 Overview

🛠️ Installation and Setup

1. Install Dependencies

2. Install V8 Engine (via `jsvu`)

3. Build V8 from Source with Fuzzing Flags

⚙️ Configuration

Set V8 Version in Scripts

🔬 Reproducing the Paper's Results

RQ2 & RQ4 (Predictive Performance & Minimal Feature Set)

RQ3 (Ablation Study: Static vs. Dynamic Features)

RQ5 (Feature-Guided Fuzzing)

License and Attribution

📜 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

From Coverage to Causes: Data-Centric Fuzzing for JavaScript Engines

🚀 Overview

🛠️ Installation and Setup

1. Install Dependencies

2. Install V8 Engine (via jsvu)

3. Build V8 from Source with Fuzzing Flags

⚙️ Configuration

Set V8 Version in Scripts

🔬 Reproducing the Paper's Results

RQ2 & RQ4 (Predictive Performance & Minimal Feature Set)

RQ3 (Ablation Study: Static vs. Dynamic Features)

RQ5 (Feature-Guided Fuzzing)

License and Attribution

📜 Citation

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2. Install V8 Engine (via `jsvu`)

Packages