Basic skeleton for building the package by KasperThystrup · Pull Request #110 · ssi-dk/MMASeq

KasperThystrup · 2026-01-15T09:53:29Z

Providing required files for building as package.

Current commands (modules) MUST work:
mmadeploy (If this works mmaseq also works)
mmacreate

A lot of restructuring and cleanup has been made, to facilitate a meaningful wrapper. This commit includes: * A conceptual wrapper script * Removal of unused variables * A deployment_dir which are used to store databases and conda environments, as created by snakemake. * Reorganisation of files. * Metadata renamed to target_screening for now * Analysis relevant files are moved into config/analysis (e.g. species_configs) * reads/ and assemblies/ folder now located in data folder * samplesheet now located in data folder * examples/ folder removed * Added Test and Deploy to .gitignore, I suggest to use Test/Results for output_folder, and Deploy for deployment_dir * MLST-db-update rule removed completely, as MLST has removed the update script

* Illumina and Assembly paths removed from config and snakefile, now it must be specified directly in the samplesheet * Samplesheet changes: * Adeded Illumina_mate1 and _mate2 * Removed Nanopore path * Renamed Assembly path variable * Removed sample_to_illumina, _to_nanopore, & _assembly vars and replaced with samplesheet * Samples can be accessed as samplesheet.index * Updated rules to acces samplesheet directory Replaced lambda wildcards with lambda wc, to avoid confusion with snakemake.wildcards object

Breaks during Obtaining file:///home/cucumbergebt/repos/push/ssi_analysis_utility Installing build dependencies: started Installing build dependencies: finished with status 'done' Checking if build backend supports build_editable: started Checking if build backend supports build_editable: finished with status 'done' Getting requirements to build editable: started Getting requirements to build editable: finished with status 'done' Installing backend dependencies: started Installing backend dependencies: finished with status 'done' Preparing editable metadata (pyproject.toml): started Preparing editable metadata (pyproject.toml): finished with status 'done' INFO: pip is looking at multiple versions of microbemapper to determine which version is compatible with other requirements. This could take a while.

Launcher has been added. Now config files are automatically created and executed FROM the launcher. A bit of polishing has been provided for all downstream files for supporting new launcher. Symbolic links of assembly files has NOT YET been added!

* If assemblies specified in samplesheet exists, it will be symlinked (unless allready exists as file and not link in results) * Tool versioning removed from assemblies, as this would disrupt the circumvension of assembly linking - Think about alternatives see #112 * Migrated contents of config/analysis into config/ for simplicity

Launcher into pkging

sistr package doesn't automatically build all deps

…python added

* In the attempt to fix pathing, I implemented a root variable to the snakemake config.yaml in the launcher using create_config(...) -> Unfortunately that yaml file is corrupt * Pathing issue before attaching root into the config file was that src/mmaseq/helper_functions:determine_sample_configs didn't get the correct config_dir location. Since the config_dir was hardcoded in relative paths in the Snakefile but the packaging doesn't include the folder structure when building, snakemake would look to current folder upon execution, rather than intended repo folder.

…ction to be more effecient

…hs in a samplesheet. It converts all relative paths to absolute if found in the samplesheet and then feeds it to Snakemake. Implemented to guarantee that Snakemake receives always valid paths.

…ative paths, now it is inferred from the workflow.basedir() and converted to absolute path and passed on to all the rules, such to avoid mistakes due to relative paths. Fixed missing assembly from data/

- All fixed variables in the Snakefile are now uppercase for consistency and clearer distinction from wildcards - Changed such variables in all rules - Improved readability of the mmaseq launcher

* Added documentation using materials for mkdocs * Deployed github actions * Renamed home to index to fix github actions * Coneected github actions in ci.yaml * Added a logo --------- Authored-by: SimoneScrima <simonescrima@gmail.com>

* Added a src/ folder to uphold standards * Migrated helper_functions to an internal utils/ folder, potentially split into several themed files * Renamed variables based on agreed rules: UPPERCASE -> Constants (mostly for pkg installation location), lowercase -> Dynamic (samplesheet, outdir ... etc)

* Separated common used functions into aptly named scripts in the utils/ folder * Made utils a callable module by adding an __init__ * Formulated standard imports from utils and all file paths as all import object * Renamed a few variables * Create.py works, setup.py unfinished mmaseq.py unstable

After module creation mmaseq.py was a bit messy and usntable. Now fixed

3 modules have now been implemented: * mmadeploy: - Utilized to deploy conda environments and databases. Will download and examine a test dataset (option to run on minimalistic 1 sample only dataset) * mmacreate: - Utilized to create a samplesheet. The input directiory will be screened recursively for .fasta and paried end .fastq.gz files, which in turn will be written to output directory/samplesheet.tsv * mmaseq: - Will read the specified samplesheet, create a pipeline config and finally execute the pipeline utils/__init__py functionality removed and module functionality restored to each module script file to enhance code transparency

Kleborate requires AMRfinderplus for certain annotation tasks. Since both databases and environments are deployed separately, system path to AMRfinder DB doesn't exists in Kleborate. Solution: Created kleborate_amrfinder setup rule, which symlinks amrfinder DB into environment system folder. An issue has been created, requesting options to specify database paths: klebgenomics/Kleborate#111 Dynamic output file names can be a bit of a pain. For now output are expected as a directory and versioning have been removed. This means that final output will NOT be part of the longtable. Issues have been created internally: #120 and a request for a static output file name has been reported klebgenomics/Kleborate#110

*Added some stablity fixes when running from deploy * deploy no longer uses outdir, instead it writes to MMAseqTest in cur dir * if deploy --small is selected assemblies are NOT ignored for the sake of included at least one assembly pipeline step

Reorganization and Launch modularization

Deploy creates output into deploydir rather than cwd

* Added a TRACE level below DEBUG, to showcase minute details * Provided more details on the different modules in the description texts

* Streamlined logging functionality * Minor code fixes

* Added a TRACE level below DEBUG, to showcase minute details * Provided more details on the different modules in the description texts

…download * Connection are estbalished per host, based on ftp paths in reads.url * Missing test sample files are downloaded as chunks * Old chunks are removed if detected, as they indicate download errors

* Retries occurs per path, not per host * Downloading occurs outside try: connect * ftp dicsonnect migrated as individual function

* Changed the deploy:'--small' argument to deploy:'--update' to highlight that all rules will be run with this option. * Added a mmaseq:'--force' option which force reruns all nescesary rules. * Running mmadeploy --update ... will now ensure that databases are allways rebuild. To save time, the small dtaset will be used.

KasperThystrup added 12 commits January 9, 2026 08:29

Removed unused scripts

0364e74

Basic skeleton for building the package

9f9865a

Added policy for determining versions

f84df2e

Pipe named, variables updated, and launcher pseudocoded

2ac0fc7

Partially completed launcher

adfabc6

Launcher has been added. Now config files are automatically created and executed FROM the launcher. A bit of polishing has been provided for all downstream files for supporting new launcher. Symbolic links of assembly files has NOT YET been added!

Resolved merge conflicts

cf4c967

Bugfixing logdir and renaming :-(

0641772

Reduced amount of assemblies in testing to only C. diff sample

321c78f

KasperThystrup mentioned this pull request Feb 12, 2026

Wrangler #107

Closed

KasperThystrup and others added 17 commits February 12, 2026 19:58

Merge branch 'pkging' into wrangler

c6b7b1d

Launcher merged into pkging

3bdac87

Launcher into pkging

"-z" missing in the curl command for fetching the database

d2a7cc8

Add setuptools to SISTR environment dependencies

e6d00b5

sistr package doesn't automatically build all deps

Commenced renaming

4166698

Environment polishing: Setuptools deprecation fix and meningotype bio…

a402235

…python added

Renaming functions

4c26ea4

Partial packaging, unstable

6651d82

Fixed paths to run the wrapper test case

0e212a9

Refactor according to PEP8 and reimplementation of create_symlink fun…

fa91628

…ction to be more effecient

Implemented "normalize_samplesheet_paths" to account for relative pat…

3f88e08

…hs in a samplesheet. It converts all relative paths to absolute if found in the samplesheet and then feeds it to Snakemake. Implemented to guarantee that Snakemake receives always valid paths.

Fixed rules not running .py scripts from workflow/scripts/ due to rel…

b0c360f

…ative paths, now it is inferred from the workflow.basedir() and converted to absolute path and passed on to all the rules, such to avoid mistakes due to relative paths. Fixed missing assembly from data/

Refactor:

19a887d

- All fixed variables in the Snakefile are now uppercase for consistency and clearer distinction from wildcards - Changed such variables in all rules - Improved readability of the mmaseq launcher

fixed small bug in the path if the --test was selected

dd8cdc9

Restructure of the packaging to make it functional for both pip commands

5ed090d

Restructuring folder

e8866a2

KasperThystrup added 23 commits March 9, 2026 14:38

Documentation (#118)

da980f9

* Added documentation using materials for mkdocs * Deployed github actions * Renamed home to index to fix github actions * Coneected github actions in ci.yaml * Added a logo --------- Authored-by: SimoneScrima <simonescrima@gmail.com>

Removed unused imports

df9b303

Restored mmaseq.py

f5f65eb

After module creation mmaseq.py was a bit messy and usntable. Now fixed

Stability and deploy-params changes

15a9ee6

*Added some stablity fixes when running from deploy * deploy no longer uses outdir, instead it writes to MMAseqTest in cur dir * if deploy --small is selected assemblies are NOT ignored for the sake of included at least one assembly pipeline step

Merge pull request #121 from ssi-dk/pkg_org

7f2fe8b

Reorganization and Launch modularization

Deploy creates output into deploydir rather than cwd

db15214

Merge pull request #122 from ssi-dk/pkg_org

6550ca8

Deploy creates output into deploydir rather than cwd

Typos and print statements

ec5877e

Improved logging in launcher scripts

4813265

* Added a TRACE level below DEBUG, to showcase minute details * Provided more details on the different modules in the description texts

Adjustments to log

096b89b

Premature commit previously - More fixes and log adjustments

a6c9774

* Imporved parser functions and text

5b80796

* Streamlined logging functionality * Minor code fixes

Improved logging in launcher scripts

a537128

* Added a TRACE level below DEBUG, to showcase minute details * Provided more details on the different modules in the description texts

Adding host identification and enhanced integrity checkup on testset …

39a1286

…download * Connection are estbalished per host, based on ftp paths in reads.url * Missing test sample files are downloaded as chunks * Old chunks are removed if detected, as they indicate download errors

Merged local branch 'deploy_from_hosts' into 'pkging'

218f313

Resolved merge conflict error

b612529

Fixed hosts issues

966acf5

* Retries occurs per path, not per host * Downloading occurs outside try: connect * ftp dicsonnect migrated as individual function

Bugfix: deploy.py NameError -> host not defined

00b634d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic skeleton for building the package#110

Basic skeleton for building the package#110
KasperThystrup wants to merge 56 commits intodevfrom
pkging

KasperThystrup commented Jan 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

KasperThystrup commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KasperThystrup commented Jan 15, 2026 •

edited

Loading