Skip to content

Basic skeleton for building the package#110

Open
KasperThystrup wants to merge 56 commits intodevfrom
pkging
Open

Basic skeleton for building the package#110
KasperThystrup wants to merge 56 commits intodevfrom
pkging

Conversation

@KasperThystrup
Copy link
Copy Markdown
Collaborator

@KasperThystrup KasperThystrup commented Jan 15, 2026

Providing required files for building as package.

Current commands (modules) MUST work:
mmadeploy (If this works mmaseq also works)
mmacreate

A lot of restructuring and cleanup has been made, to facilitate a meaningful wrapper. This commit includes:
* A conceptual wrapper script
* Removal of unused variables
* A deployment_dir which are used to store databases and conda environments, as created by snakemake.
* Reorganisation of files.
  * Metadata renamed to target_screening for now
  * Analysis relevant files are moved into config/analysis (e.g. species_configs)
  * reads/ and assemblies/ folder now located in data folder
  * samplesheet now located in data folder
  * examples/ folder removed
  * Added Test and Deploy to .gitignore, I suggest to use Test/Results for output_folder, and Deploy for deployment_dir
* MLST-db-update rule removed completely, as MLST has removed the update script
* Illumina and Assembly paths removed from config and snakefile, now it must be specified directly in the samplesheet
* Samplesheet changes:
  * Adeded Illumina_mate1 and _mate2
  * Removed Nanopore path
  * Renamed Assembly path variable
* Removed sample_to_illumina, _to_nanopore, & _assembly vars and replaced with samplesheet
* Samples can be accessed as samplesheet.index
* Updated rules to acces samplesheet directory
Replaced lambda wildcards with lambda wc, to avoid confusion with snakemake.wildcards object
Breaks during Obtaining file:///home/cucumbergebt/repos/push/ssi_analysis_utility
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Checking if build backend supports build_editable: started
  Checking if build backend supports build_editable: finished with status 'done'
  Getting requirements to build editable: started
  Getting requirements to build editable: finished with status 'done'
  Installing backend dependencies: started
  Installing backend dependencies: finished with status 'done'
  Preparing editable metadata (pyproject.toml): started
  Preparing editable metadata (pyproject.toml): finished with status 'done'
INFO: pip is looking at multiple versions of microbemapper to determine which version is compatible with other requirements. This could take a while.
Launcher has been added. Now config files are automatically created and executed FROM the launcher. A bit of polishing has been provided for all downstream files for supporting new launcher. Symbolic links of assembly files has NOT YET been added!
* If assemblies specified in samplesheet exists, it will be symlinked (unless allready exists as file and not link in results)
* Tool versioning removed from assemblies, as this would disrupt the circumvension of assembly linking - Think about alternatives see #112
* Migrated contents of config/analysis into config/ for simplicity
@KasperThystrup KasperThystrup mentioned this pull request Feb 12, 2026
KasperThystrup and others added 17 commits February 12, 2026 19:58
Launcher into pkging
sistr package doesn't automatically build all deps
* In the attempt to fix pathing, I implemented a root variable to the snakemake config.yaml in the launcher using create_config(...) -> Unfortunately that yaml file is corrupt
* Pathing issue before attaching root into the config file was that src/mmaseq/helper_functions:determine_sample_configs didn't get the correct config_dir location. Since the config_dir was hardcoded in relative paths in the Snakefile but the packaging doesn't include the folder structure when building, snakemake would look to current folder upon execution, rather than intended repo folder.
…hs in a samplesheet.

It converts all relative paths to absolute if found in the samplesheet and then feeds it to Snakemake.
Implemented to guarantee that Snakemake receives always valid paths.
…ative paths, now it is inferred

from the workflow.basedir() and converted to absolute path and passed on to all the rules,
such to avoid mistakes due to relative paths.

Fixed missing assembly from data/
- All fixed variables in the Snakefile are now uppercase for consistency and clearer distinction from wildcards
- Changed such variables in all rules
- Improved readability of the mmaseq launcher
* Added documentation using materials for mkdocs

* Deployed github actions

* Renamed home to index to fix github actions

* Coneected github actions in ci.yaml

* Added a logo

---------

Authored-by: SimoneScrima <simonescrima@gmail.com>
* Added a src/ folder to uphold standards
* Migrated helper_functions to an internal utils/ folder, potentially split into several themed files
* Renamed variables based on agreed rules: UPPERCASE -> Constants (mostly for pkg installation location), lowercase -> Dynamic (samplesheet, outdir ... etc)
* Separated common used functions into aptly named scripts in the utils/ folder
* Made utils a callable module by adding an __init__
* Formulated standard imports from utils and all file paths as all import object
* Renamed a few variables
* Create.py works, setup.py unfinished mmaseq.py unstable
After module creation mmaseq.py was a bit messy and usntable. Now fixed
3 modules have now been implemented:
* mmadeploy: - Utilized to deploy conda environments and databases. Will download and examine a test dataset (option to run on minimalistic 1 sample only dataset)
* mmacreate: - Utilized to create a samplesheet. The input directiory will be screened recursively for .fasta and paried end .fastq.gz files, which in turn will be written to output directory/samplesheet.tsv
* mmaseq: - Will read the specified samplesheet, create a pipeline config and finally execute the pipeline
utils/__init__py functionality removed and module functionality restored to each module script file to enhance code transparency
Kleborate requires AMRfinderplus for certain annotation tasks. Since both databases and environments are deployed separately, system path to AMRfinder DB doesn't exists in Kleborate.
Solution: Created kleborate_amrfinder setup rule, which symlinks amrfinder DB into environment system folder.
An issue has been created, requesting options to specify database paths: klebgenomics/Kleborate#111
Dynamic output file names can be a bit of a pain. For now output are expected as a directory and versioning have been removed. This means that final output will NOT be part of the longtable.
Issues have been created internally: #120 and a request for a static output file name has been reported klebgenomics/Kleborate#110
*Added some stablity fixes when running from deploy
* deploy no longer uses outdir, instead it writes to MMAseqTest in cur dir
* if deploy --small is selected assemblies are NOT ignored for the sake of included at least one assembly pipeline step
Reorganization and Launch modularization
Deploy creates output into deploydir rather than cwd
* Added a TRACE level below DEBUG, to showcase minute details
* Provided more details on the different modules in the description texts
* Streamlined logging functionality
* Minor code fixes
* Added a TRACE level below DEBUG, to showcase minute details
* Provided more details on the different modules in the description texts
…download

* Connection are estbalished per host, based on ftp paths in reads.url
* Missing test sample files are downloaded as chunks
* Old chunks are removed if detected, as they indicate download errors
* Retries occurs per path, not per host
* Downloading occurs outside try: connect
* ftp dicsonnect migrated as individual function
* Changed the deploy:'--small' argument to deploy:'--update' to highlight that all rules will be run with this option.
* Added a mmaseq:'--force' option which force reruns all nescesary rules.
* Running mmadeploy --update ... will now ensure that databases are allways rebuild. To save time, the small dtaset will be used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants