Skip to content

Debugging notes

Sina Majidian edited this page Dec 29, 2025 · 3 revisions

This instruction is high-level and for developing purposes of FastOMA. If you are interested in trying it out, please contact us by email or github issue and we would be happy to provide more in-depth, step-by-step instruction.

How to know the destination of my genes during HOG inference step:

Go to the relevant folder in work . Re-run this step using this where writing gene tree and MSA are activated. Nb. for big trees this could generate hundreds of files.

fastoma-infer-subhogs  --input-rhog-folder  0   --species-tree species_tree_checked.nwk    --output-pickles pickle_hogs  --parallel  -vv  --msa-filter-method col-row-threshold    --gap-ratio-row 0.3    --gap-ratio-col 0.5    --number-of-samples-per-hog 5   --msa-write    --gene-trees-write  > log  2>&1

In the script FastOMA/_infer_subhog.py, change this and re-install with pip. In the current code this is False and the intermediate pickle files are removed.

keep_subhog_each_pickle = True

Using the species tree, find the internal node names:

tree_= Tree(wdir+"species_tree_checked.nwk",format=1)
print(len(tree_))
species_interest = "Saccharomyces_cerevisiae.R64-1-1.pep.all"
nodes_with_sac=[]
for node in tree_.traverse("postorder"):
    leaves= node.get_leaves()
    leaves_names= [i.name for i in leaves]
    if species_interest in leaves_names:
        nodes_with_sac.append(node)
print(len(nodes_with_sac))
nodes_with_sac

Then check the files created in the rhog_HOGID

import os
found_nodes=[]
files= os.listdir(wdir+"/rhog_E0793066_noon")
print(len(files),files[:2])
for i in nodes_with_sac_names:
    if i+".pickle" in files:
        found_nodes.append(i)
print("FastOMA didn't complete HOG inference for the following internal nodes ")
print([i for i in nodes_with_sac_names if i not in found_nodes])

print("FastOMA completed HOG inference for the following internal nodes", found_nodes)



Then read the pickle file. Here the internal node name is 4891. You can get the integer gene id, from the HOG fasta file.

import pickle

gene_id= "2009000009"
for internal_node_name in found_nodes:
    input_pickle=wdir+"/rhog_E0793066_noon/"+internal_node_name+'.pickle' # 1294330
    print(input_pickle)
    handle=open(input_pickle,'rb')
    hogs = pickle.load(handle)
    handle.close()
    member_secleted=[]
    for hog in  hogs:
        members = hog.get_members()
        for m in members:
            if gene_id in m:
                member_secleted.append(hog)
    #print(len(member_secleted))
    if member_secleted:
        print("For the node," +internal_node_name+', this HOG is found ', member_secleted, 'that include the gene of interest.')
    else:
        print("For the node," +internal_node_name+', no HOG found with the gene of interest')
        

        

Then you can compare the last node that has the gene and the next node. In HOG with gene you can get its subhog ID

hog_secleted=[]
all_members=[]
for hog in  hogs:
    members = hog.get_members()
    #print(len(members))
    all_members.append(members)
    for m in members:
        if gene_id in m:
            hog_secleted.append(hog)
hog_secleted

This gives us the subhog ID of the gene at the level.

Then for the gene tree of the level without the gene, you can upload the gene tree on beta.phylo.io as extended NWK (NHX). On the right side, use selecte the leaf label ,D, as hogid. you can see what happens to the those genes from the same subHOG. Only 5 of genes per subHOG are represented in the gene tree.

Clone this wiki locally