Skip to content

Storing Base Editor Data in MaveDB #317

@bencap

Description

@bencap

Base editor data is a type of functional assay data that is similar to MAVE data in many respects. In these experiments, engineered CRISPR base editors are used to convert target bases without creating double strand breaks, introducing mutations in a given window based on their proximity to the CRISPR guide RNA binding location.

One important difference is that in a typical analysis, only the guide RNAs are quantified and used to calculate variant scores. The actual mutations introduced are inferred based on the guide location, relative position of the editing window, and the base editor's conversion profile. This is typically done by assuming all bases of the appropriate type in the window are converted with 100% efficiency and calculating the variant DNA sequence (and variant amino acid sequence, if relevant) that way. Because of this, it is possible that two different guide RNAs could introduce the same sequence variant if they bind near each other.]

To address this and more faithfully represent the way the data was generated and analyzed, we should add a new score set type in MaveDB specifically for base editor data. Key features and implementation notes are outlined below:

  • Base editors should be able to use existing experiment and experiment set records with no changes.
  • Instead of an hgvs_ index column, base editor score sets use a new column guide_sequence as the index column.
    • Guides should be validated to ensure that they are unique, like all index columns.
    • Guides should be validated to ensure that they contain only ACGT.
  • hgvs_nt is required but does not need to be unique.
  • hgvs_pro is optional.
  • Base editor score sets do not have a target sequence. All hgvs_ columns should be specified with a versioned chromosome scaffold, transcript, or protein identifier from RefSeq as was introduced for the SGE data.
    • Many base editor experiments hit multiple functional elements (many genes) so this should be supported.
  • Base editor experiments may have multiple score sets, with each score set describing the variants with a different editing window width.
  • Many base editor variants will be multi-nucleotide variants, since the assumption is that all nucleotides in the editing window have been changed.
  • It will be useful to generate VRS objects for the base editor variants, but this can wait if the mapping service will struggle with this type of multi-nucleotide and multi-amino acid data.
  • The most extensive changes will be required in the validator code to accept this new type of uploaded data table.
  • We may be able to auto-detect whether a score set is a base editor score set simply by the presence of the guide_sequence column, but it is probably better to not do this and force an explicit choice. We may want to use the same column name for CRISPR-based technologies like SGE where including the guide is valuable metadata but not an index value.

Metadata

Metadata

Assignees

Labels

app: backendTask implementation touches the backendapp: databaseTask implementation requires database changestype: discussionTeam discussion requiredtype: featureNew feature

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions