Skip to content

When to use --no_force_align #2

@sbliven

Description

@sbliven

It's unclear to me when to use the --no_force_align option to ProGraph. The README describes this as

do not force alignment of initial Methionine

What's the scientific motivation for skipping initial M by default?

I ask because of a potential bug in the interaction with the --repeat option, which matches the sequences to a T-Reks output alignment. These files reference sequence positions, so they cause an off-by-one error if the M was stripped.

I can think of several possible solutions:

  • Default to --no_force_align when the --repeat option is also specified
  • For each sequence, store a flag indicating whether it has been truncated. If so, account for that when reading in the repeats file
  • Be more permissive when verifying the FASTA/T-REKS alignment. Automatically recover from off-by-one errors in the coordinates. (This would have the side benefit of supporting malformed T-Reks files that used 0-based indexes rather than the correct 1-based positions.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions