Skip to content
This repository was archived by the owner on Nov 10, 2025. It is now read-only.

vicinitas-therapeutics/pyRapidFire

Repository files navigation

pyRapidFire

A simple program to analysis protein-compound complex rapidfire data at Vicinitas using either UniDec or openMS FlashDeconv.

Details

Currently, the program is in development stage and runs as a pipeline. Running mainly off of the main.py function. The program takes in a folder or either raw MS data or mzML files and runs them through the pipeline. If the data is raw MS data then a conversion docker is called via a REST API, more details below.

Steps

  1. The program takes in a folder of data
  2. Uploads a meta-data file that contains protein masses, compound masses, file identifications and other information. Note that if IC50 values are needed a concentration values are needed in the meta-data file.
  3. Either UniDec or FlashDeconv is called to process the data.
  • If UniDec is called then the program will run the data through the python API for UniDec.
  • If FlashDeconv is called then the program will run the data through a CLI call.
  1. Results from the process are then uploaded to a database.
  2. Compound complex modifications are then calculated and matched per each well.
  3. Within each well a percentage intensity is calculated for each protein-compound complex.
  4. These matches are uploaded separately to the database.
  5. Using each protein-compound modification number ie Mod0 or Mod1, the IC50 values are calculated and uploaded to the database.
  6. Plots of each curve are generated and printed to ... png files.
  7. TODO create a web UI to display the results.

Installation

Both installation via pip and poetry are supported. The program is designed to run in a docker container. The layout of the folders allows a package to be built. To build the package run the following command:

which poetry || pip install poetry
poetry build

This will create a whl file that can be installed via pip.

pip install dist/pyRapidFire-VERSION_NUMBER-py3-none-any.whl

Additionally, a full docker_compose file is provided to run the program. The docker_compose file will run the program and a database and the needed converter API functions.

File Structure

  • main.py - the main file that runs the program. Has a pipeline function that calls most of the other functions.

  • database.py - contains the database class that is used to upload data to the database.

  • protein_deconvolution.py - contains the functions that are used to process the data. It has two classes protein_well and protein_decon_unidec class. The protein_decon_unidec class is used to aggragite the wells by a single compound/ VCNT-ID. The protein_well class is used to store the data for each well. Within this class is also the matching function simple_match that is used to match the protein.

    • When unidec is used the method needs to know the estimated mass of the protein and the range of masses to search. Additionally, it's helpful for it to know the charge state of the protein.
    • FlashDeconv does not need to know the estimated mass of the protein or the range of masses to search and has an improved resolution/mass accuracy.
  • helper.py - contains the helper functions that are used to process the data. Mainly, functions to find the files, and a function to help fit the IC50 curves.

  • analysis.py - contains the functions that are used to analyze the data. Mainly, running functions to process the calculation the IC50 values. The IC50 values are processed in the IC50_Curves class.

Development

Things to know

The system is designed with a database in mind. The database is used to store the data and the results of the analysis. Most of the methods and functions are designed with the database in mind. Additionally, there is a custom logger that logs to both a file and the database. If the logger object is not passed to the database class then a default logger is made. The caveat here is that the logger needs a database connection. == This means that enviroment variables are needed ==. These are :

  • DB_USER - the username for the database
  • DB_PASS - the password for the database
  • DB_HOST - the host for the database
  • DB_NAME - the name of the database
  • DB_CERT_PATH - the path to the certificate for the database
  • DB_CERT_NAME - the name of the certificate file
  • DATA_PATH - the path to the data folder for data to be processed from

Loading modules order

Due to the logger and the need for the database connection, the modules need to be loaded in a specific order. If you are creating a new run script/program you will need to load the dotenv module prior to load ing the pyrapidfire.RapidfireDB and logging_db modules. This is because the database connection is needed for the logger.

An example would be as follows:

import os
from dotenv import load_dotenv
from pyrapidfire import database
from pyrapidfire import logging_db

load_dotenv()
logger = logging_db.get_logger()
logger.name = "pyRapidFire" # Set the name of the logger can also be __name__
obj = database.RapidFireDB(sqlalchemy=True, direct_connect=True , logger=logger)
obj.get_experiments()

Details about the logger

The logger works by creating a custom logger that logs to both a file and the database. The logger is created by the logging_db.get_logger() function. This function returns a logger object that can be used to log messages. The logger object has a custom handler that logs to the database. Additional handlers can be added to the logger object to log to the console or another file by using the logger.addHandler() function. The logger object has a custom attribute expid that can be set to the experiment id. This is used to log the experiment id to the database. The logger object also has a custom attribute name that can be set to the name of the logger. This is used to log the name of the logger to the database.

import os
from dotenv import load_dotenv
from pyrapidfire import database
from pyrapidfire import logging_db

load_dotenv()
logger = logging_db.get_logger()
logger.handlers[0].db.expid = 1 # Set the experiment id for the logger

TODOs

  • Add a docker container for running either unidec or flashdeconv
  • change to a class based processing system removing main.py from the run
  • Add a logger to the program
  • Add a web UI for the program
  • Move code around into better files and folder structure.

About

A simple program to analysis some protein rapidfire data at Vicinitas using openMS flashdeconv.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published