A Final Year Project (FYP) utilizing a Hybrid Deep Learning Architecture (LSTM + Transformer-based FinBERT) to predict cryptocurrency market trends by fusing Technical (OHLCV) and Fundamental (News Sentiment) data.
| Role | Name | Key Responsibilities |
|---|---|---|
| Supervisor | Sir Muhammad Adeel Zahid | Project Guidance & Evaluation |
| Data Scientist | Haider Ali | Model Architecture, Pipeline Design, Trading Logic & Streamlit |
| Data Scientist | Nimrah Akbar | Data Preprocessing, Sentiment Analysis, Optimization & Backtesting |
- Executive Summary
- System Architecture
- Data Science Methodology
- Technology Stack
- Directory Structure
- Installation & Setup
- Project Roadmap
- Disclaimer
The Multi-Modal Crypto Trading Bot is a high-frequency algorithmic trading system designed to address the stochastic nature of cryptocurrency markets. Traditional quantitative models often fail because they rely solely on historical price data (Technical Analysis), ignoring the massive impact of social sentiment and news events (Fundamental Analysis).
Our Solution: We propose a Multi-Modal Fusion Model that processes two distinct data streams:
- Time-Series Data: Analyzed using Long Short-Term Memory (LSTM) networks to capture temporal dependencies in price action.
- Unstructured Text Data: Analyzed using FinBERT (Financial Bidirectional Encoder Representations from Transformers) to extract sentiment polarity from news.
The system outputs a unified Confidence Score for Buy/Sell signals, executed via a simulated trading engine with strict risk management protocols (RSI & Stop-Loss).
The project follows a standard ETL (Extract, Transform, Load) pipeline integrated with an Inference Engine.
graph TD;
subgraph "Data Ingestion Layer"
A[Yahoo Finance API] -->|Raw OHLCV| C(Data Preprocessing);
B[Kaggle & RSS Feeds] -->|Raw Text| C;
C -->|MinMax Normalization| D[Cleaned Data Store];
end
subgraph "Deep Learning Layer"
D -->|Seq Length 60| E[LSTM Model];
D -->|Tokenization| F[FinBERT Model];
end
subgraph "Decision Layer"
E -->|Price Prediction| G{Fusion Engine};
F -->|Sentiment Score| G;
G -->|Trade Logic| H[Trade Execution];
end
subgraph "Presentation Layer"
H -->|Metrics| I[Streamlit Dashboard];
end
We utilize Recurrent Neural Networks (RNNs), specifically Stack LSTMs, to model the sequential nature of financial data.
- Problem Type: Many-to-One Regression.
- Input Features: Open, High, Low, Close, Volume.
- Lookback Window: 60 Days (The model looks at the past 60 days to predict t+1).
- Optimization: Adam Optimizer with Mean Squared Error (MSE) loss function.
Instead of using basic dictionary-based approaches (like VADER), we implement Transfer Learning using FinBERT.
- Architecture: BERT-base model fine-tuned on financial corpus (TRC2).
- Output: Softmax probabilities for three classes:
[Positive, Negative, Neutral].
The trading signal is generated based on a Confluence Strategy: $$ Signal = \begin{cases} BUY, & \text{if } P_{pred} > P_{curr} \times 1.01 \text{ AND } S_{score} > 0.2 \text{ AND } RSI < 70 \ SELL, & \text{if } P_{pred} < P_{curr} \times 0.99 \text{ OR } S_{score} < -0.5 \ HOLD, & \text{otherwise} \end{cases} $$
| Component | Technology | Reasoning |
|---|---|---|
| Environment | Google Colab | Utilizes free T4 GPU for faster model training. |
| Core | Python 3.10 | Standard language for Data Science & AI. |
| Deep Learning | TensorFlow / Keras | High-level API for rapid prototyping of LSTMs. |
| NLP | HuggingFace Transformers | State-of-the-art pre-trained models (FinBERT). |
| Data Manipulation | Pandas & NumPy | Vectorized operations for large datasets. |
| Visualization | Streamlit & Matplotlib | Interactive dashboards for end-users. |
| Version Control | Git & GitHub | Collaborative development. |
π¦ FYP-Crypto-Trading-Bot
β£ π datasets # Raw & Processed Data (Managed via Google Drive)
β£ π models # Serialized Models (.h5 for LSTM, .pt for BERT)
β£ π notebooks # Experimental Jupyter Notebooks
β β£ π 01_Data_Preprocessing.ipynb # Cleaning & Feature Engineering
β β£ π 02_Model_Training.ipynb # Model Training Loop
β β π 03_Backtesting_Engine.ipynb # Strategy Simulation
β£ π src # Production Scripts
β β£ π data_loader.py # Yahoo Finance Fetcher
β β π indicators.py # RSI/MACD Logic
β£ π requirements.txt # Dependencies
β π README.md # Documentation
Since this project requires GPU acceleration, we highly recommend running the notebooks on Google Colab.
git clone https://github.com/haiderali-01/FYP-Crypto-Trading-Bot.gitThe bot requires a persistent storage layer to save datasets and trained models.
- Create a folder in your Google Drive named:
FYP DATAETS. - Upload the
news_data.csv(downloaded from Kaggle) into this folder. - Note: Market price data is auto-fetched via our
data_loader.pyscript.
Open the notebooks in the notebooks/ folder in the following order:
01_Data_Preprocessing: Syncs Price and News data.02_Model_Training: Trains the AI models.03_Backtesting_Engine: Simulates the trading strategy.
| Phase | Description | Status |
|---|---|---|
| Phase 1 | Data Engineering: ETL Pipelines, Feature Engineering (RSI, MACD). | π‘ In Progress |
| Phase 2 | Model Development: LSTM Training & FinBERT Integration. | βͺ Planned |
| Phase 3 | Strategy Formulation: Developing the Hybrid Decision Engine. | βͺ Planned |
| Phase 4 | Validation: Backtesting on 2024 unseen data & Performance Metrics. | βͺ Planned |
| Phase 5 | Deployment: Streamlit Dashboard & Final Reporting. | βͺ Planned |
This software is engineered as a robust, real-time algorithmic trading system capable of executing live market strategies. While it demonstrates high-performance capabilities during backtesting and simulation, it is primarily developed for academic research as a Data Science Final Year Project.
Important Note for Users:
- Not Financial Advice: The signals generated by this bot are based on probabilistic AI models (LSTM/FinBERT). They should not be taken as guaranteed financial advice.
- Market Risk: Cryptocurrency markets are highly volatile. The authors (Haider Ali & Nimrah Akbar) and the Supervisor are not liable for any financial losses incurred while using this software in a live production environment.
- Use Responsibly: We recommend running this system in Paper Trading Mode (Simulation) before deploying real capital.