Skip to content

SuperSeriousLab/Gremlins.jl

Repository files navigation

Gremlins.jl

CI Julia License: MIT Registry

Mutation testing for Julia. Gremlins systematically corrupts your source code one operator at a time, then checks whether your test suite notices. A test suite that kills 80 %+ of mutants is actually asserting; one that kills 20 % is mostly checking that code runs without crashing.

Why Julia needs this

Vimes.jl was the only Julia mutation-testing tool ever written. Its last real commit was November 2019. It is pinned to a defunct CSTParser#location branch and cannot be installed. Nothing replaced it. Recurring Discourse threads asking "what do I use for mutation testing" still point at dead Vimes.

Gremlins is the replacement. It uses JuliaSyntax.jl (now shipped with Julia 1.10+) for byte-accurate parsing, a warm-worker pool that reduces per-mutant cost by 5.77x compared to process-per-mutant, and a coverage-guided selection that skips mutants your tests cannot possibly reach.

Quickstart

# From the Julia REPL, in your package directory:
using Gremlins
result = mutate_warm(".")   # warm pool, auto-discovers src/, runs test/runtests.jl
print_warm_summary(result)

Or via the CLI:

julia --project=path/to/Gremlins bin/gremlins-cli.jl \
  --pkg /path/to/YourPkg \
  --warm \
  --strong 0.80 --acceptable 0.60

Sample output

━━━ Gremlins Warm Mutation Report ━━━━━━━━━━━━━━
  Package       : TeleTUI
  Score         : 28.0%  (killed=7 / eligible=25)
  Killed        : 7
  Survived      : 18
  Timeout       : 0
  NoCov         : 0
  Error         : 0
  Total         : 25
  Cache hits    : 0
  Warm-executed : 25
  Cold fallback : 0
  Worker recycles: 0
  Baseline      : 14.71s
  Runtime       : 367.85s (5.77x faster than cold)
  ── Fallback taxonomy ──
    warm_ok : 25
  ── I4 agreement (10 sampled) ──
    OK — all 10 warm results agree with cold re-runs
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

BAND	weak	kill_rate=0.28	killed=7/25

How it works

  1. Discovery (static) — JuliaSyntax parses every .jl file in src/ and walks the green tree, collecting mutation sites (byte ranges + operator). No code is executed. Mutant IDs are stable hashes of (relpath, byte-range, op-id).

  2. Baseline — runs your test suite once with --code-coverage=user to build a line-to-testfile map. Mutants on uncovered lines are marked no_coverage (a finding, not a kill).

  3. Shadow-copy crash safety — before any mutant is applied, Gremlins creates a disposable shadow copy of your package under mktempdir(). All mutations and test subprocesses run against the shadow; your real source tree is never written. This means a SIGKILL or OOM event leaves an orphaned /tmp directory (harmless) rather than corrupted source files. In-process try/finally restore is not crash-safe — SIGKILL bypasses finally. The shadow design eliminates the correctness risk entirely.

  4. Warm-worker eval — a persistent Julia worker process loads your package once (paying startup cost once, not per-mutant). For each mutant, the worker evals only the changed top-level expression into the package module via Core.eval, runs the tests in a fresh Module, then restores the original. Disk is never written on the warm path (shadow applies to cold paths only).

  5. Fallback taxonomy — mutations inside macro definitions, type/struct defs, or const globals cannot safely be eval'd; these route to the cold path (subprocess per mutant, run in shadow). The report shows the fallback breakdown.

  6. Incremental cache — results are keyed on `SHA256(source_content) + mutant_id

    • gremlins_version`. No mtime (git checkout refreshes mtimes on untouched files). Cache is read/written against the real source tree — unchanged since shadow is byte-identical.
  7. I4 agreement — after the run, a random sample of warm-executed mutants is re-run cold (in shadow) to verify that warm eval produces the same outcome as a fresh subprocess. Any mismatch is a hard error in the report.

Operator table

ID Name Mutation
relop_lt_le relop: < → <= <<=
relop_le_lt relop: <= → < <=<
relop_gt_ge relop: > → >= >>=
relop_ge_gt relop: >= → > >=>
relop_eq_neq relop: == → != ==!=
relop_neq_eq relop: != → == !===
bool_and_or bool: && → || &&||
bool_or_and bool: || → && ||&&
bool_delete_not bool: delete ! !xx
arith_plus_minus arith: + → - +-
arith_minus_plus arith: - → + -+
arith_mul_div arith: * → / */
arith_div_mul arith: / → * /*
literal_int_incr literal: int+1 4243
literal_int_decr literal: int-1 4241
literal_true_false literal: true→false truefalse
literal_false_true literal: false→true falsetrue
return_nothing return→nothing return xreturn nothing
stmt_delete stmt delete delete a statement from a block

Outcome taxonomy

Outcome Meaning
killed Test suite exited non-zero — mutant detected
survived Test suite passed — mutant not caught
timeout Test run exceeded 3× baseline — likely infinite loop or hang
no_coverage No baseline coverage on the mutation site — tests cannot reach it
error Runner infrastructure error (apply/revert failed, etc.)

Mutation score = killed / (total - no_coverage - error). Mutants you cannot reach do not count for or against you.

Performance

Real benchmark on JUI (TeleTUI), 25 covered sites, warm worker pool, no cache:

Mode Total time Per-mutant Killed Survived
Cold (M1, process-per-mutant) 2121.56 s 84.86 s 7 18
Warm (M2, eval-into-module) 367.85 s 14.71 s 7 18
Speedup 5.77× 5.77× same same

I4 agreement: 10 sampled, 0 mismatches. Outcomes are equivalent.

The warm path works on ~80 % of mutants in practice; the remainder fall back to cold (macro defs, struct defs, const globals). The fallback taxonomy in every report shows the breakdown.

CI integration

See .github/workflows/mutation.yml.example for a ready-to-use GitHub Actions recipe that runs Gremlins on changed files in a PR and fails below the acceptable threshold.

Quick setup:

- name: Mutation gate
  run: |
    julia --project=path/to/Gremlins bin/gremlins-cli.jl \
      --pkg ${{ github.workspace }} \
      --files "$CHANGED_FILES" \
      --warm --acceptable 0.60

Exit codes: 0 = strong or acceptable, 1 = weak (below acceptable threshold), 2 = infrastructure error.

Custom operators

You can run a subset of the built-in operators or supply entirely custom ones via the operators keyword argument, available on mutate, mutate_warm, and discover.

Running a subset of DEFAULT_OPERATORS

using Gremlins

# Only test relational-operator mutations
result = mutate("path/to/MyPkg";
    operators = [OP_LT_TO_LE, OP_LE_TO_LT, OP_GT_TO_GE, OP_GE_TO_GT,
                 OP_EQ_TO_NEQ, OP_NEQ_TO_EQ])
print_summary(result)

The MutationOperator struct

struct MutationOperator
    id::Symbol        # stable Symbol used in the mutant hash (must be unique across your set)
    name::String      # human-readable label shown in reports
    matcher::Function # (node::SyntaxNode, src::String) -> Bool — true iff this node should be mutated
    replacer::Function # (node::SyntaxNode, src::String) -> String — returns the replacement text
end
  • matcher receives a JuliaSyntax.SyntaxNode and the full source text of the file as a String. Return true when the node is a candidate for mutation.
  • replacer receives the same node and source. Return a String with the replacement text that will be spliced into site.byte_range. The splice covers the exact bytes of the matched node.
  • Both functions are called only during static discovery — no code is executed.

Worked example — negate every boolean literal

This operator flips every true to false and every false to true in a single combined operator (equivalent to OP_TRUE_TO_FALSE + OP_FALSE_TO_TRUE merged):

using Gremlins
using JuliaSyntax

OP_FLIP_BOOL = MutationOperator(
    :flip_bool,
    "literal: flip bool",
    # matcher: any Bool literal leaf
    (node, src) -> JuliaSyntax.is_leaf(node) &&
                   JuliaSyntax.kind(node) == JuliaSyntax.K"Bool",
    # replacer: invert the value
    (node, src) -> node.val === true ? "false" : "true",
)

result = mutate("path/to/MyPkg"; operators = [OP_FLIP_BOOL])
print_summary(result)

The id field (:flip_bool) must be unique within the operator set you pass — it is part of the stable mutant-ID hash. Use a distinct symbol for each operator.

Opt-in advanced operators

Several powerful operators are not in DEFAULT_OPERATORS and must be opted in explicitly (they are too noisy or too slow for default runs):

# Julia-unique: constant-pool swap (replace a literal with another literal in the same function)
result = mutate("path/to/MyPkg"; operators = [OP_CONST_POOL])

# Julia-unique: dispatch-contract mutations (signature type swap, union-member drop, where-bound relax)
result = mutate("path/to/MyPkg"; operators = [OP_DISPATCH_SWAP, OP_UNION_DROP, OP_WHERE_RELAX])

Limitations

const-site coverage blind spot — mutations inside const global assignments route to the cold path, but if the test suite never exercises a const value indirectly (unlikely but possible), the mutant is marked survived. Coverage data is per-line; const-site lines that are "hit" during package load are considered covered even if no test actually asserts the value.

Warm-path fallbacks — mutations inside macro definitions, struct/abstract type defs, and const globals cannot be eval'd into a running module without struct-redefinition errors or macro hygiene violations. These fall back to per-process cold runs, which are slow. If most of your mutations are in these constructs, the speedup over M1 will be lower.

Julia compile cost — even on the warm path, running your test suite per mutant takes time proportional to your suite's wall time. A 60-second suite on 300 mutants = 5 hours warm, 50 hours cold. Use --files to scope runs to changed files in CI, and use the budget parameter for exploratory runs.

Equivalent mutants — Gremlins does not detect semantically equivalent mutations (mutants that change syntax but not observable behaviour). These appear as survived and inflate apparent weakness. This is known noise; document it in your reports.

Installation

using Pkg
Pkg.add("Gremlins")  # once registered in General registry

Until General registration (see docs/release-checklist.md):

Pkg.add(url="https://github.com/YOUR-ORG/Gremlins.jl")

License

MIT — see LICENSE.

About

Mutation testing for Julia — JuliaSyntax operators, coverage-guided selection, warm-worker pool

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors