GitHub - davidcoscor/ncRNA-AI: Source code supporting David da Costa Correia's MSc thesis project "Predicting non-coding RNA function using Artificial Intelligence"

ncRNA-AI

Source code supporting David da Costa Correia's MSc thesis project "Predicting non-coding RNA function using Artificial Intelligence".

Supervised by Hugo Martiniano, PhD and Francisco Couto, PhD.

Executed at FCUL & INSA, Portugal in 2023-2024.

Main Contributions

a ncRNA-Phenotype Relational Corpus (ncoRP) Download
a ncRNA-Phenotype Relation Dataset aggregating 5 databases Download
an embedding-based Entity Recognition and Linking pipeline (using FAISS and SentenceTransformers)
an Ollama-based LLM binary classification framework
- supporting a LLM Relation Extraction methodology

All the described pipelines are easily adaptable to work with any pair of entities.

File information

ncRNA-AI
├── src                       | Contains the developed modules that support the pipelines
│   ├── articles_download.py  | Implements a simple framework to download articles using NCBI's E-utils
│   ├── FAISS_EL.py           | FAISS and SentenceTransformers Entity Recognition and Linking Tool
│   ├── relDS.py              | Implements RelationDataset class
│   └── llm_re.py             | Implements an Ollama-based LLM binary classification framework
├── utils                     | Contains other utility python scripts
├── misc                      | Contains supporting jupyter notebooks for data/output analysis
├── data                      | Contains the data (raw and processed) used by the pipelines
├── outputs                   | Contains the final outputs from the pipelines
├── ncoRP_creation.py         | ncoRP corpus creation pipeline
├── dataset_creation.py       | Relation Dataset creation pipeline
├── llm_exp.py                | LLM methodology implementation
├── asd_cs.py                 | Austim Spectrum Disorder Case Study pipeline
├── download_data.sh          | Script to download all the necessary raw data
├── env.yaml                  | Dependencies
...

Additional information may be found in each file's header.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ncRNA-AI

Main Contributions

File information

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data		data
misc		misc
outputs		outputs
src		src
utils		utils
.gitignore		.gitignore
README.md		README.md
asd_cs.py		asd_cs.py
dataset_creation.py		dataset_creation.py
download_data.sh		download_data.sh
env.yaml		env.yaml
llm_exp.py		llm_exp.py
ncoRP_creation.py		ncoRP_creation.py

davidcoscor/ncRNA-AI

Folders and files

Latest commit

History

Repository files navigation

ncRNA-AI

Main Contributions

File information

About

Resources

Uh oh!

Stars

Watchers

Forks

Languages