After initial taxonomic assignment through the Holi pipeline, we merged samples, refined the taxonomic profiles, quantified DNA damage patterns, and developed age-dependent filtering models to distinguish genuine ancient DNA (aDNA) from noise. The workflow integrates filterBAM, metaDMG, and R-based model fitting to assess DNA authenticity and reproducibility across technical and biological replicates.
Taxonomic alignments from Holi were post-processed using filterBAM (v1.5.1):
--iters 0 --min-read-ani 94 --min-read-count 3--min-read-ani 94 --min-read-count 3 \
--min-expected-breadth-ratio 0.5 --min-normalized-entropy auto \
--min-normalized-gini auto --min-breadth 0 --min-avg-read-ani 90 \
--min-coverage-evenness 0.4 --min-coverage-mean 0 --include-low-detectionThese steps ensure high-confidence taxonomic assignments and consistent genome-wide coverage metrics before downstream damage estimation.
DNA cytosine deamination and model-based noise were estimated using metaDMG (v0.4-93):
- LCA assignment — each read's lowest common ancestor (LCA) was determined from all possible alignments.
- Damage model fitting — for each taxon and replicate, metaDMG estimated parameters describing observed damage (
A_b) and noise (c).
Model quality was evaluated using Lin's Concordance Correlation Coefficient (CCC):
- ρc ≥ 0.85
- Cb ≥ 0.9
- p-value < 0.1
- Noise baseline confidence interval within [0, 1]
Fits failing any criterion were classified as bad.
For each taxon (≥500 reads), we plotted median A_b against sample age:
- DNA damage decreases with younger layers, consistent with expected temporal decay.
- Marine sediments showed lower deamination than contemporaneous lacustrine layers.
- Post-1900 AD samples exhibited little to no measurable damage.
To define genuine aDNA signals, we computed:
- The 5th percentile of A_b across taxa (660 CE–1900 CE)
- Smoothed these thresholds with a LOESS regression
- Retained taxa exceeding the lower 95% CI of the fit
Taxa showing authentic damage in older layers were also accepted in younger ones ("oldest pass date" rule).
- Figures S10–S11: Damage vs. age for individual taxa (e.g., Ovis) and replicate consistency
- Figures S12–S13: PCoA of eukaryotic community profiles across replicates
- Figures S14–S16: Combined DNA damage–age models and taxa-specific trends
- Figures S17–S19: Rarefaction and negative control validation
All plots and summaries were generated in R, integrating metaDMG, filterBAM, and custom scripts for visualization and statistical filtering.
SampleWorkflow.html— Overview of pipeline and sample-level workflowReplicates.html— Detailed replicate analysis, damage modeling, and filtering resultsNegatives.html— Anaylsis of taxa in blanks