Output gene name for mapped targets by bencap · Pull Request #67 · VariantEffect/dcd_mapping2

bencap · 2025-12-12T22:29:08Z

This pull request introduces a new mechanism for inferring and attaching gene symbol information to target annotations, with provenance, in the mapping pipeline. It adds the compute_target_gene_info function, which determines a single gene symbol per target using a prioritized approach, and integrates this logic into the mapping API. Additionally, it restructures the reference_sequences data structure to use a new TargetAnnotation class, and makes several supporting improvements and bug fixes.

Gene symbol inference and annotation improvements:

Added the compute_target_gene_info async function in annotate.py, which determines a single gene symbol per target using a prioritized strategy (selected transcript, alignment overlap, variant spans, or fallback to metadata), and returns a GeneInfo object with provenance. Supporting helper functions for overlap-based inference and interval merging were also added.
Integrated gene symbol inference into the map_scoreset API route: for each target, the computed gene info is attached to its TargetAnnotation in the response. [1] [2]

Data structure and schema changes:

Updated the reference_sequences structure in map_scoreset to use TargetAnnotation objects, which now include a layers attribute and a new gene_info field. Adjusted all code paths to reference the new structure. [1] [2] [3]
Added TargetAnnotation and GeneInfo imports to relevant modules, ensuring the new schema types are properly used throughout. [1] [2]

Supporting infrastructure and environment:

Added ENSEMBL_API_URL to the .env.dev settings file to support Ensembl API queries for gene overlap.
Imported request_with_backoff and ENSEMBL_API_URL in lookup.py to enable robust gene feature queries.
Minor: Added Any import in lookup.py for type hinting.

Bug fixes and code improvements:

Fixed score parsing logic to correctly handle zero-valued scores (now checks for is not None instead of truthiness) in multiple locations (annotate.py). [1] [2] [3]

These changes collectively improve the accuracy and transparency of gene symbol assignment in the API, and lay the groundwork for robust, provenance-aware gene annotation in downstream analyses.

…e mapping

…rence selection to set it

This function queries the Ensembl API with exponential backoff as needed, returning a list of features which overlap the passed region.

bencap · 2025-12-12T22:29:42Z

See #66 for information on computing this information for regulatory targets, which will be included in a future release.

sallybg

This is great! The only comment I have is that there are some changes in the router function (src/api/routers/map.py) which we should also recreate in the corresponding command line function (save_mapped_output_json, which is in src/dcd_mapping/annotate.py). I think we just need to add the layers and gene_info properties to the reference_sequences dict when accessing the mapper from the command line, unless you had a reason for not including it there.

Computes a new `gene_info` property for all mapped targets. This property is defined by an `hgnc_symbol` and a `selection_method`. The hgnc symbol is the HGNC symbol of the gene to which this target relates. The selection method is the method by which this symbol was selected and may be: - `tx_selection`: via the selected transcript - `alignment_max_covered_bases`: based on the gene 'feature' (via Ensembl) which covered the most bases of the aligned target - `variants_max_covered_bases`: same as `alignment_max_covered_bases`, but based on variant bases rather than aligned bases - `target_metadata`: based on parsing the target metadata the user supplied - `target_category`: no gene info was selected because the target was not protein coding (see #66) Various helpers were added to `dcd_mapping.annotate` which support this calculation. Gene info selection should not cause job failures, and will simply fail to select gene info on failure.

…-targets

sallybg · 2026-01-30T22:15:02Z

The addition of gene info to the command line output looks good!

bencap added 3 commits December 12, 2025 11:24

fix: Update score assignment to handle None values correctly in allel…

8c77d7d

…e mapping

feat: Add hgnc_symbol field to TxSelectResult and update protein refe…

fd2b69d

…rence selection to set it

feat: Implement get_overlapping_features_for_region function

6598890

This function queries the Ensembl API with exponential backoff as needed, returning a list of features which overlap the passed region.

bencap requested a review from sallybg December 12, 2025 22:29

bencap linked an issue Dec 12, 2025 that may be closed by this pull request

Determine Mapped Target HGNC Name During Mapping #55

Open

This was referenced Dec 13, 2025

Use new layers/gene info format in mapped target metadata VariantEffect/mavedb-api#611

Merged

Use mapped HGNC name for assay facts gene text when available VariantEffect/mavedb-ui#597

Merged

sallybg approved these changes Jan 30, 2026

View reviewed changes

bencap force-pushed the feature/bencap/55/hgnc-name-for-mapped-targets branch from 5d52e0c to 9669507 Compare January 30, 2026 19:47

Merge branch 'mavedb-dev' into feature/bencap/55/hgnc-name-for-mapped…

f503cc7

…-targets

bencap merged commit 8b34557 into mavedb-dev Jan 30, 2026
6 checks passed

bencap deleted the feature/bencap/55/hgnc-name-for-mapped-targets branch January 30, 2026 23:57

bencap linked an issue Feb 2, 2026 that may be closed by this pull request

Compute HGNC (or other) symbols for regulatory targets #66

Open

bencap mentioned this pull request Feb 2, 2026

Release 2026.1.0 #69

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output gene name for mapped targets#67

Output gene name for mapped targets#67
bencap merged 5 commits intomavedb-devfrom
feature/bencap/55/hgnc-name-for-mapped-targets

bencap commented Dec 12, 2025

Uh oh!

bencap commented Dec 12, 2025

Uh oh!

sallybg left a comment

Uh oh!

sallybg commented Jan 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bencap commented Dec 12, 2025

Uh oh!

bencap commented Dec 12, 2025

Uh oh!

sallybg left a comment

Choose a reason for hiding this comment

Uh oh!

sallybg commented Jan 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants