Skip to content

Debugging notes

Sina Majidian edited this page Dec 29, 2025 · 4 revisions

How to know the destination of my genes during HOG inference step:

Go to the relevant folder in work . Re-run this step using this where writing gene tree and MSA are activated. Nb. for big trees this could generate hundreds of files.

fastoma-infer-subhogs  --input-rhog-folder  0   --species-tree species_tree_checked.nwk    --output-pickles pickle_hogs  --parallel  -vv  --msa-filter-method col-row-threshold    --gap-ratio-row 0.3    --gap-ratio-col 0.5    --number-of-samples-per-hog 5   --msa-write    --gene-trees-write  > log  2>&1

In the script FastOMA/_infer_subhog.py, change this and re-install with pip. In the current code this is False and the intermediate pickle files are removed.

keep_subhog_each_pickle = True

Using the species tree, find the internal node names:

tree_= Tree(wdir+"species_tree_checked.nwk",format=1)
print(len(tree_))
species_interest = "Saccharomyces_cerevisiae.R64-1-1.pep.all"
nodes_with_sac=[]
for node in tree_.traverse("postorder"):
    leaves= node.get_leaves()
    leaves_names= [i.name for i in leaves]
    if species_interest in leaves_names:
        nodes_with_sac.append(node)
print(len(nodes_with_sac))
nodes_with_sac

Then check the files created in the rhog_HOGID

import os
files= os.listdir(wdir+"/rhog_E0793066")
print(len(files),files[:2])
for i in nodes_with_sac_names:
    if i+".pickle" in files:
        print(i)

Then read the pickle file. Here the internal node name is 4891

import pickle
input_pickle=wdir+"/rhog_E0793066/"+'4891.pickle'
print(input_pickle)
handle=open(input_pickle,'rb')
hogs = pickle.load(handle)
handle.close()
member_secleted=[]
gene_id= "2009000009"
for hog in  hogs:
    members = hog.get_members()
    for m in members:
        if gene_id in m:
            member_secleted.append(hog)
print(len(member_secleted))

Clone this wiki locally