Skip to content

GeoGenetics/Tjornin_Butterfly

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Butterfly

Overview

After initial taxonomic assignment through the Holi pipeline, we merged samples, refined the taxonomic profiles, quantified DNA damage patterns, and developed age-dependent filtering models to distinguish genuine ancient DNA (aDNA) from noise. The workflow integrates filterBAM, metaDMG, and R-based model fitting to assess DNA authenticity and reproducibility across technical and biological replicates.

Taxonomic Refinement

Taxonomic alignments from Holi were post-processed using filterBAM (v1.5.1):

Reassign step:

--iters 0 --min-read-ani 94 --min-read-count 3

Filter step:

--min-read-ani 94 --min-read-count 3 \
--min-expected-breadth-ratio 0.5 --min-normalized-entropy auto \
--min-normalized-gini auto --min-breadth 0 --min-avg-read-ani 90 \
--min-coverage-evenness 0.4 --min-coverage-mean 0 --include-low-detection

These steps ensure high-confidence taxonomic assignments and consistent genome-wide coverage metrics before downstream damage estimation.

DNA Damage Estimation

DNA cytosine deamination and model-based noise were estimated using metaDMG (v0.4-93):

  1. LCA assignment — each read's lowest common ancestor (LCA) was determined from all possible alignments.
  2. Damage model fitting — for each taxon and replicate, metaDMG estimated parameters describing observed damage (A_b) and noise (c).

Model quality was evaluated using Lin's Concordance Correlation Coefficient (CCC):

Good fit criteria:

  • ρc ≥ 0.85
  • Cb ≥ 0.9
  • p-value < 0.1
  • Noise baseline confidence interval within [0, 1]

Fits failing any criterion were classified as bad.

DNA Damage–Age Models and Filtering

For each taxon (≥500 reads), we plotted median A_b against sample age:

  • DNA damage decreases with younger layers, consistent with expected temporal decay.
  • Marine sediments showed lower deamination than contemporaneous lacustrine layers.
  • Post-1900 AD samples exhibited little to no measurable damage.

To define genuine aDNA signals, we computed:

  • The 5th percentile of A_b across taxa (660 CE–1900 CE)
  • Smoothed these thresholds with a LOESS regression
  • Retained taxa exceeding the lower 95% CI of the fit

Taxa showing authentic damage in older layers were also accepted in younger ones ("oldest pass date" rule).

Visualization and Reproducibility

  • Figures S10–S11: Damage vs. age for individual taxa (e.g., Ovis) and replicate consistency
  • Figures S12–S13: PCoA of eukaryotic community profiles across replicates
  • Figures S14–S16: Combined DNA damage–age models and taxa-specific trends
  • Figures S17–S19: Rarefaction and negative control validation

All plots and summaries were generated in R, integrating metaDMG, filterBAM, and custom scripts for visualization and statistical filtering.

HTML Reports

  • SampleWorkflow.html — Overview of pipeline and sample-level workflow
  • Replicates.html — Detailed replicate analysis, damage modeling, and filtering results
  • Negatives.html — Anaylsis of taxa in blanks

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages