autoQuaC Watcher monitors one or more input folders for new mass-spectrometry files (e.g., *.raw), waits until candidates are stable (unchanged size across consecutive scans), copies them into a temporary working directory, generates a mcquac.json params file from a template, and launches Nextflow using the Docker profile. On completion, it delivers results to the configured output folder and prevents reprocessing.
In short: drop → process → deliver → don’t reprocess.
- Features
- Requirements
- Quick start
- Configuration
- How it works
- Run it
- Run as a systemd service
- Troubleshooting
- Project layout
- License
- Watcher for any number of input folders (supports glob patterns like
*std.raw). - Robust candidate check: only files that remain unchanged (same size in ≥2 consecutive scans) are picked up.
- Automatic job creation:
- copies candidates to
tmp/<hash>/input/ - generates
mcquac.jsonfrom a template - injects FASTA & spike-in paths and replaces placeholders
- copies candidates to
- Nextflow runner (Docker profile) with status markers (
.ready,.working,.finish,.error) and Nextflow logs. - Post-processing & delivery:
- on success: selects the best
*.hdf5fromtmp/<hash>/output/**and writes<output>/<SRC_STEM>.hdf5 - on failure: if no
.hdf5exists, writes<output>/<SRC_STEM>.error.log(copied from.nextflow.log) - updates
ignore.txtto prevent reprocessing
- on success: selects the best
- Optional SMB mounts with version fallback (tries SMB
3.1.1 → 3.0 → 2.1). - Per-
io_pairMcQuaC template: eachio_pairs[*]entry can optionally definemcquac_template(default:mcquac.json). - Robust SMB handling: mount checks and automatic remount/retry attempts are performed before scanning inputs and before delivering results to the output share.
- Debian/Ubuntu-like system (WSL works as well).
- Docker Engine + Docker Compose plugin.
- Java ≥ 11 (required by Nextflow).
- Python 3.10+ (uses modern type annotations like
X | Y). - For SMB/network shares:
cifs-utils(formount.cifs).
The included
setup.shcan help install prerequisites, download Nextflow into the project, clone the McQuaC repository, and generate/updateconfig/app.json.
- Prepare the repo (if not done yet):
chmod +x ./setup.sh ./setup.sh
The script can install Docker/Java (if missing), download Nextflow locally, clone McQuaC, and create/update config/app.json with proper paths.
-
Adjust configuration:
- Review
config/app.json(see below), especiallymcquac_path,nextflow_bin,io_pairs, and optionalmounts. - Place your FASTA under
config/fasta/(top level, e.g.,human.fasta). - Place your spike-in table under
config/spike/(top level,*.csv). - (Optional) Add additional templates under
config/(e.g.,mcquac_dda.json) and select them perio_pair.
- Review
-
Start the watcher:
python3 main.py # or make it executable: ./main.py -
Check results:
- Logs:
tmp/<hash>/logs/nextflow-YYYYMMDD-HHMMSS.log(and.nextflow.logintmp/<hash>/) - On success:
<output>/<SRC_STEM>.hdf5 - On error (no
.hdf5produced):<output>/<SRC_STEM>.error.log
- Logs:
Minimal example:
{
"interval_minutes": 6,
"default_pattern": "*std.raw",
"mcquac_path": "/path/to/McQuaC/main.nf",
"nextflow_bin": "/path/to/nextflow",
"mounts": [],
"continue_on_mount_error": false,
"unmount_on_exit": true,
"io_pairs": [
{
"input": "/data/in",
"output": "/data/out",
"pattern": "*std.raw",
"mcquac_template": "mcquac_dda.json"
}
]
}Fields:
-
interval_minutes— watcher polling interval in minutes (internally converted to seconds). -
default_pattern— glob used when anio_pairdoesn’t specify its ownpattern. -
mcquac_path— path to the McQuaCmain.nf. -
nextflow_bin(optional) — path to the Nextflow binary resolution order:$NEXTFLOW_BIN→app.json:nextflow_bin→ local./nextflow→PATH. -
mounts(optional) — SMB share definitions (see below). -
continue_on_mount_error(optional, bool) — continue even if a mount fails. -
unmount_on_exit(optional, bool) — unmount shares when shutting down. -
io_pairs— list of objects:{ input, output, pattern?, mcquac_template? }input(string) — source folder to scan for new filesoutput(string) — target folder where final results are writtenpattern(string, optional) — glob pattern to filter input filesmcquac_template(string, optional) — filename of a McQuaC template located in./config/(default:mcquac.json)
Example with domain and extra options:
{
"mounts": [
{
"name": "archive",
"host": "192.168.10.20",
"share": "Archiv",
"mountpoint": "/mnt/archive",
"username": "user123",
"password": "SECRET",
"domain": "ACME",
"vers": null,
"file_mode": "0664",
"dir_mode": "0775",
"extra_opts": ["noserverino"]
}
]
}Notes:
- Mounting requires root privileges (run the watcher with
sudo) andcifs-utils. - The mounter checks mountpoint health and tries SMB versions 3.1.1 → 3.0 → 2.1.
- Security: passwords are stored in clear text in
config/app.json. Restrict access to theconfig/folder accordingly.
Each pair defines a watched folder (input) and a final target (output).
-
The target folder maintains an
ignore.txtfile. After a successful run, the source base filename is appended there to prevent re-processing.- Remove a line from
ignore.txtto force a re-run for that file.
- Remove a line from
-
Each watcher thread also keeps a thread-local ignore file under
tmp/to avoid duplicates while the process is running.
Each io_pair can optionally specify its own McQuaC configuration template via mcquac_template, for example:
"mcquac_template": "mcquac_dda.json"Rules:
- The template file must exist under
./config/<template>.json(example:./config/mcquac_dda.json). - If
mcquac_templateis not set,./config/mcquac.jsonis used by default.
Templates are plain JSON files and may contain placeholders (see below).
Put files at the top level of these folders:
config/
├── fasta/ # *.fasta (the "newest" file is used automatically)
└── spike/ # *.csv (same rule)
During job creation:
- the newest
*.fastais injected asmain_fasta_file - the newest
*.csvis injected asmain_spike_file
Supported placeholders inside the template (strings are replaced automatically):
%%%INPUT%%%/%%%OUTPUT%%%(template → job params)%%%FASTA%%%(and legacy variants) → resolved FASTA path%%%SPIKE%%%(and legacy variants) → resolved spike-in CSV path
-
Watch — For each
io_pair, a thread scans its input folder on an interval (interval_minutes). A file becomes a candidate only if it appears with the same size in at least two consecutive scans and is not matched by either ignore file (thread-localtmp/...andoutput/ignore.txt). -
Copy & job — Each candidate is copied to
tmp/<hash>/input/. For each hash, the system creates:mcquac.json(generated from./config/<template>.json, where<template>is taken fromio_pair.mcquac_templateor defaults tomcquac.json; FASTA/spike-in paths are injected and placeholders are replaced)info.json(metadata: paths, source, watch context).ready(signal for the runner)
-
Run — The Nextflow runner consumes jobs from
tmp/*/.ready, resolves Nextflow (order:$NEXTFLOW_BIN→app.json:nextflow_bin→ local./nextflow→PATH), and starts:nextflow run -profile docker <mcquac main.nf> -params-file mcquac.json
Logs are written to
tmp/<hash>/logs/. Status files are updated from.ready→.working→.finish(or.error). -
Post-processing & delivery
-
Success (
rc == 0)- Searches
tmp/<hash>/output/**for*.hdf5(recursively) and selects the best one (newest, then largest). - Copies it directly to the final output folder as:
<output>/<SRC_STEM>.hdf5. - Updates
ignore.txt. - Empties
tmp/<hash>/{input,output,work}and marks the job as delivered (.delivered).
- Searches
-
Failure (
rc != 0)- If no
.hdf5was produced, copiestmp/<hash>/.nextflow.logto:<output>/<SRC_STEM>.error.log. - Marks the job as delivered (
.delivered) once the log copy succeeds.
- If no
If the output share is temporarily unavailable (e.g., SMB disconnect), mount checks/remount attempts are performed and delivery is retried until it succeeds.
-
Important:
main.pycurrently clears./tmpon startup. If you want to survive restarts and keep “pending delivery” jobs after a reboot, comment out thenuke_tmp()call inmain.py.
-
Direct:
python3 main.py
-
With a virtual env (optional):
python3 -m venv .venv source .venv/bin/activate python3 main.py -
Test Nextflow (optional):
./nextflow run hello ./nextflow run hello -with-docker
You can run MCQuaC Watcher as a long-running background service using systemd.
Important
- If you do not use SMB mounts (
"mounts": []inconfig/app.json), you can run it as a user service.- If you use SMB mounts, MCQuaC Watcher needs to call
mount.cifsand therefore must run as root as a system service.
In the examples below, replace /path/to/mcquac-watcher with the actual project directory and adapt the Python/venv path as needed.
Use this if mounts in config/app.json is empty and you do not mount any network shares from the watcher itself.
-
Create a user unit:
mkdir -p ~/.config/systemd/user nano ~/.config/systemd/user/mcquac-watcher.service
-
Paste the following unit file:
[Unit] Description=MCQuaC Watcher (user service) After=network-online.target docker.service Wants=network-online.target [Service] Type=simple WorkingDirectory=/path/to/mcquac-watcher ExecStart=/path/to/mcquac-watcher/.venv/bin/python -u main.py Restart=always RestartSec=10 StandardOutput=journal StandardError=journal [Install] WantedBy=default.target
-
Reload and start:
systemctl --user daemon-reload systemctl --user start mcquac-watcher.service systemctl --user enable mcquac-watcher.service -
View logs:
journalctl --user-unit mcquac-watcher.service -f
On WSL you must have systemd enabled and the distro restarted so that
systemctl --userworks.
Use this if you configured any mounts in config/app.json. Mounting CIFS shares requires root privileges.
-
Create a system unit as root:
sudo nano /etc/systemd/system/mcquac-watcher.service
-
Paste the following unit file:
[Unit] Description=MCQuaC Watcher (root + SMB mounts) After=network-online.target docker.service Wants=network-online.target [Service] Type=simple WorkingDirectory=/path/to/mcquac-watcher ExecStart=/path/to/mcquac-watcher/.venv/bin/python -u main.py Restart=always RestartSec=10 # Example if you want to force a specific Nextflow binary: # Environment="NEXTFLOW_BIN=/path/to/mcquac-watcher/nextflow" [Install] WantedBy=multi-user.target
-
Reload and start:
sudo systemctl daemon-reload sudo systemctl start mcquac-watcher.service sudo systemctl enable mcquac-watcher.service -
View logs:
sudo journalctl -u mcquac-watcher.service -f
-
Docker daemon not reachable — ensure the service is running and your user is in the
dockergroup. Log out/in or runnewgrp docker. -
nextflownot found — setnextflow_bininconfig/app.jsonor export$NEXTFLOW_BIN. -
main.nfmissing — verifymcquac_pathand that the McQuaC repo/branch is cloned correctly. -
SMB mount errors — run with
sudo, installcifs-utils, check network/port 445; optionally setcontinue_on_mount_error: true. -
No watchers active — ensure
io_pairsinconfig/app.jsonis a non-empty list. -
No output delivered / network share flaky — delivery is retried automatically. If you reboot, make sure you do not wipe
tmp/on startup (comment outnuke_tmp()inmain.py). -
Stuck temp state / cleanup — clear the working area:
python3 -m src.clear # or run the helper directly if you prefer: # python3 src/clear.py
project/
├── main.py
├── src/
│ ├── load_config.py # read & validate config/app.json (incl. mounts, nextflow_bin)
│ ├── search.py # watcher thread (stable candidates via repeated scans)
│ ├── size.py # file listing & size helper with glob + ignore patterns
│ ├── copier.py # copy candidates → tmp/<hash>; generate mcquac.json/info.json/.ready
│ ├── job_creater.py # apply %%%INPUT%%% / %%%OUTPUT%%% into mcquac.json from template
│ ├── mcquac_runner.py # consume .ready, run Nextflow, deliver results, write .delivered
│ ├── mounter.py # optional SMB mounting from config
│ └── clear.py # `nuke_tmp()` to clean ./tmp
├── config/
│ ├── app.json # main configuration
│ ├── mcquac.json # default McQuaC template (placeholders supported)
│ ├── mcquac_*.json # optional additional templates (selectable via mcquac_template)
│ ├── fasta/ # *.fasta (top level)
│ └── spike/ # *.csv (top level)
├── tmp/ # working directory (hash folders)
└── setup.sh # optional bootstrap script
--------------------------------------------------------------------------------
autoQuaC
--------------------------------------------------------------------------------
Copyright 2021, Ruhr University Bochum, Medizinisches Proteom-Center
This software is released under a three-clause BSD license:
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice, this
list of conditions and the following disclaimer in the documentation and/or
other materials provided with the distribution.
* Neither the name of any author or any participating institution may be used to
endorse or promote products derived from this software without specific prior
written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- McQuaC (mpc-bioinformatics) for the underlying QC pipeline.