Skip to content

Conversation

@ChristopherMancuso
Copy link
Contributor

@ChristopherMancuso ChristopherMancuso commented Sep 9, 2025

This branch is for checking the domino implementation and improving it as needed. Some items that might be considered are

Questions

  • Should any additional kwargs be added?
  • Should parameter defaults be changed at all?
  • In retain_relevant slices it finds them using connected components, might need to change to louvain for really dense networks at some point?
  • In retain_relevant_slices, why did they not use number of pertubed nodes in cc instead of all pertubed nodes?
  • In retain_relevant_slcies why does hypergeom use G_org for universe but only perturbed nodes found in G_modularity (this doesn’t account for perturbed nodes not in slices))?
  • In prune_slice, since relevant_slices found with G_modularity, why look at them in the context of the full graph (G) here?
  • In prune_slice and in run_pcst and in linear_threshold we can set the threshold and influence values of a node, do we need to do this ever?
  • In prune_slice and in run_pcst, vertices_prizes seem to be mix of floats and booleans (i.e. Active value), is this OK?
  • In general I don’t really get how in prune_slice the get_putatiave_modules is really working, should this be changed at all? In particular
    • Seem not great to just be looking at connected components
    • The if statement clause cur_modularity <= best_modularity + improvement_delta makes no sense to me and seems like if anything that should be a >= sign
  • In the Louvain implementation of PyGenePlexus, is there a place we could do that in Domino?

Things I noticed in domino implementation

  • Individual slices are never considered in domino, so the initial slices step just removes some network genes as slices are recombined right away
  • Since linear_threshold only uses G_cc, then diffusion step only add scores for fast_pcst, but not nodes outside of G_cc. So does this mean all "propagated genes" come from finding relevant slices (and not diffusion) and then diffusion can just prune those.
  • In prune_slice and in get_putative_modules, full_G and module_threshold are dropped from arguments. In original DOMINO there was code about finding sig_values for each sub-slice but those sig_values were never actually used
  • In final_modules (and other places), modules that are below the min_clust_size parameters are included in the p-value adjustment calculations and they should probably be removed beforehand

@ChristopherMancuso
Copy link
Contributor Author

The main functions to look at are

  • geneplexus/cluster_input()
  • _geneplexus/_generate_clusters()
  • everything in _clustering/domino.py
  • everything in _clustering/domino_utls.py

Some example code to run Geneplexus with clustering is

import geneplexus

### you will need to change the paths ###

# load in a gene set
# the code here loads the examples but it would be great if you tried some different ones
fp_pygp = "/Users/mancchri/Desktop/repos/PyGenePlexus/"
input_genes = geneplexus.util.read_gene_list(f"{fp_pygp}example/input_genes.txt")

# this is the path to the needed data. If you don't have it the downloader function should work (I can send code if needed)
# or you can get the data from here https://zenodo.org/records/14750555
fp_data = "/Users/mancchri/Desktop/CIDA_unsorted/Arjun/GenePlexusZoo_webserver/gpdata_prop/regular" # the good one right now

# set other variables
sp_trn = "Human" # needs to be a string of "Human", "Mouse", "Fly", "Zebrafish", "Worm", "Yeast"
gsc_trn = "Combined" # needs to be a STRING of either "GO", "Monarch", "Combined"
sp_res = ["Mouse", "Human", "Fly", "Yeast"] # can be a list or a single string (options same as sp_trn)
gsc_res = "Combined" # can be a list the length of the number or species or a string (if a string it will repeat the number of species for len(sp_res))
features = "SixSpeciesN2V"
net_type = "STRING" # needs to be a string, options "STRING", "BioGRID", "IMP"

# instansitate the class
gp = geneplexus.GenePlexus(file_loc = fp_data,
                           gsc_trn = gsc_trn,
                           gsc_res = gsc_res,
                           features = features,
                           net_type = net_type,
                           sp_trn = sp_trn,
                           sp_res = sp_res,
                           log_level="INFO")
# print(gp.sp_trn)
# print(gp.gsc_trn_original)
# print(gp.gsc_trn)
# print(gp.sp_res)
# print(gp.gsc_res_original)
# print(gp.gsc_res)
# print(gp.model_info["All-Genes"].results)
# print(list(gp.model_info))
# # run some functions
gp.load_genes(input_genes)
#### if you want to do louvain method uncomment below and comment out lines 49-55 ####
# lv_kwargs = {"louvain_max_size" : 70, "louvain_max_tries" : 3, "louvain_res" : 1, "louvain_seed" : 323}
# gp.cluster_input(
#     clust_method = "louvain",
#     clust_min_size = 5, # sets the minimum size of the clusters to keep. If you change also change gp.fit argument min_num_genes. This will be done automattically in the future
#     clust_weighted = True, # whether or not to use edge weights when making the clusters
#     clust_kwargs = lv_kwargs,
# )
dm_kwargs = {"domino_res" : 1, "domino_slice_thresh" : 0.3, "domino_n_steps" : 20, "domino_module_threshold" : 0.05, "domino_seed" : 123}
gp.cluster_input(
    clust_method = "domino",
    clust_min_size = 10,
    clust_weighted = True,
    clust_kwargs = dm_kwargs,
)
# print(gp.model_info["All-Genes"].model_genes)
# print(gp.model_info["Cluster-01"].model_genes)
# print(list(gp.model_info))
gp.fit(min_num_pos = 5)
# print(gp.model_info["All-Genes"].avgps)
# print(gp.model_info["Cluster-01"].avgps)
gp.predict()
# print(gp.model_info["All-Genes"].results["Fly-Combined"].df_probs)
# print(gp.model_info["Cluster-01"].results["Fly-Combined"].df_probs)
gp.make_sim_dfs()
# print(gp.model_info["All-Genes"].results["Fly-Combined"].df_sim)
# print(gp.model_info["Cluster-01"].results["Fly-Combined"].df_sim)
gp.make_small_edgelist()
# print(gp.model_info["All-Genes"].results["Fly-Combined"].df_edge_sym)
# print(gp.model_info["Cluster-01"].results["Fly-Combined"].df_edge_sym)
# gp.alter_validation_df()
# print(gp.df_convert_out_subset)
gp.save_class("/Users/mancchri/Desktop/save_test/mytest")

ChristopherMancuso and others added 5 commits October 5, 2025 19:56
* added default params to API except kwargs

* renamed data_dir to file_lovc in cli

* flipped behavior of autodownloader and renamed it

* updated some defaults and changed parameter names

* removed the quite argument

* updated help in pipeline control arguments

* updated some of the clustering arguments in cli

* updated cluster_input to take None for kwargs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added defaults for fit in CLI

* updated params for small edgelist in CLI

* renamed outdir to output_dir

* added defaults for save_class added ALL_SAVES

* changed how output_dir of None is handled

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleaned up some config params

* updated Types in config

* added cluster kwargs to config

* changed what file scale is saved to

* changed how logreg_kwargs called

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
this fixed how filenames are generated when no overwriting is selected. The new way is it look for a folders or zip files with the name and then will incriminate by one if either is already existing.
* got old tests working

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added new tests for clustering and res params

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed save name bug for windows

* change min python v3.10 and latest v3.13

* tmp remove py v3.13

* tmp keep only py v 3.10

* pipeline test now sets -fl

* pipeline test now sets -sr to Mouse for cli

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated assertion to be Mouse-Combined

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add back in test for py v3.11

* add back in test for py v3.12

* change requirements to greater than

* tmp remove domino from pytests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add back in test for py v3.13

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* changed docs packages to be >=

* updated pre-commit-config.yaml based on #303

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated tests to not be duplicated in PRs

* updated tests to not be duplicated in PRs take 2

* removed safe from black in pre-commit yaml

* change pre-commit back to normal

* changes from rolling back reorder and black

* updated the main RTD figure

* added new functions to geneplexus.geneplexus page

* added clust_kwargs defaults to docs

* updated the main RTD figure

* swapped order to have input_genes worked if added in class

* updates to main RTD page

* updated to how to use with R

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated RTD CLI page

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change data_dir to file_loc in download.py

* added None as file_loc option in download.py

* updated some doc strings in download.py

* updated download code in RTD API doc

* updated RTD API page

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add logreg_kward to have defaults other than None

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed fucntions starting with _

* stopped properties from showing up

* added all class methods to atribute doc strings

* added to example to get all major outputs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated the examples scripts

* updated the examples scripts typos

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed geneplexus CLI args from README

* small README changes

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants