The Better Threads Project: GPT-Based Cyberbullying Detection

A comprehensive implementation of a GPT (Generative Pre-trained Transformer) model built from scratch and fine-tuned for cyberbullying detection. This project demonstrates deep understanding of transformer architecture, transfer learning, and practical ML engineering.

🎯 Project Overview

This project implements a complete GPT model architecture from scratch using PyTorch, then adapts it for binary classification of cyberbullying content. The model achieves 87.97% validation accuracy by leveraging transfer learning with pre-trained GPT-2 weights and fine-tuning on a cyberbullying detection dataset.

Key Features

✅ GPT Architecture from Scratch: Complete implementation of transformer blocks, multi-head attention, and feed-forward networks
✅ Transfer Learning: Integration of pre-trained GPT-2 (124M parameters) weights
✅ High Accuracy: 87.97% validation accuracy on cyberbullying detection
✅ Production-Ready API: FastAPI server for real-time inference
✅ Interactive Testing: Command-line interface for model testing
✅ Comprehensive Logging: Training metrics and visualization

🚀 Installation

Prerequisites

Python 3.8+
CUDA-capable GPU (recommended for training, optional for inference)
~4GB VRAM for training

Setup

Clone the repository (or navigate to the project directory)
Install dependencies:
```
pip install -r requirements.txt
```
Prepare the dataset:
- Place your training CSV file at: ./cyberbullying_detector/data/cyberbullying_train.csv
- Place your validation CSV file at: ./cyberbullying_detector/data/cyberbullying_validation.csv
- CSV format: columns should include Text (text content) and Label (0 or 1)
Download pre-trained GPT-2 weights (automatic on first run):
- The model will automatically download GPT-2 Small (124M) weights on first training run
- Weights are cached in pretrained_gpt2/ directory

📁 Project Structure

the_better_threads_project/
├── gpt/                          # GPT model implementation
│   ├── gpt.py                    # Main BTPModel class
│   └── transformer_block/
│       ├── transformer_block.py  # Transformer block
│       ├── multi_head_attention.py
│       ├── feed_forward.py
│       └── layer_norm.py
│
├── cyberbullying_detector/       # Cyberbullying detection module
│   ├── cb_detect_train.py        # Training orchestration
│   └── utils/
│       ├── cb_detect_datasets.py # Dataset class
│       ├── cb_detect_train_classifer.py  # Training loop
│       ├── cb_detect_evaluate.py # Evaluation functions
│       ├── cb_detect_loss_funcs.py  # Loss and accuracy
│       └── cb_detect_run.py      # Inference function
│
├── pretrained_gpt2/              # GPT-2 integration
│   ├── create_gpt2_model.py     # Model creation with weights
│   └── utils/
│       ├── load_weights_into_gpt.py
│       └── weights_downloader.py
│
├── main.py                       # Main entry point (interactive testing)
├── api.py                        # FastAPI server
├── run_api.py                    # API server runner
├── global_utils.py               # Utility functions
├── requirements.txt              # Dependencies
├── final_trained_model.pt        # Trained model checkpoint
└── case_study.md                 # Comprehensive case study

💻 Usage

Training

To train the model on your dataset:

from main import train

train()

Or modify main.py to call train():

python -c "from main import train; train()"

Training Configuration (in cyberbullying_detector/cb_detect_train.py):

Batch size: 64
Learning rate: 1e-5
Epochs: 3
Optimizer: AdamW with weight decay (0.01)
Scheduler: Cosine annealing (1e-5 → 1e-6)
Gradient clipping: 1.0

The trained model will be saved as final_trained_model.pt.

Interactive Testing

Test the model interactively from the command line:

python main.py

This will:

Load the trained model
Start an interactive loop
Accept text input and return predictions
Type quit to exit

Example:

> You are so stupid!
('cyberbullying', 0.95)

> Have a great day!
('not cyberbullying', 0.89)

API Server

Start the FastAPI server for HTTP-based inference:

python run_api.py

Or using uvicorn directly:

uvicorn api:app --host 0.0.0.0 --port 8000 --reload

The API will be available at http://localhost:8000

API Endpoints

1. Detect Cyberbullying

POST /detect

Request:

{
  "message": "Your text message here"
}

Response:

{
  "isCyberbullying": true,
  "confidence": 0.95
}

2. Health Check

GET /health
Returns API and model status

3. API Documentation

GET /docs - Swagger UI
GET /redoc - ReDoc documentation

Example API Usage

Using curl:

curl -X POST "http://localhost:8000/detect" \
     -H "Content-Type: application/json" \
     -d '{"message": "You are so stupid!"}'

Using Python:

import requests

response = requests.post(
    "http://localhost:8000/detect",
    json={"message": "You are so stupid!"}
)

result = response.json()
print(f"Cyberbullying: {result['isCyberbullying']}")
print(f"Confidence: {result['confidence']}")

Using JavaScript:

const response = await fetch("http://localhost:8000/detect", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ message: "You are so stupid!" }),
});

const result = await response.json();
console.log(result);

🏗️ Model Architecture

BTPModel (GPT Architecture)

The model follows the GPT architecture with the following components:

Embedding Layer
- Token embeddings: 50,257 vocabulary size (GPT-2)
- Positional embeddings: 512-1024 context length
- Embedding dimension: 768
Transformer Blocks (12 layers)
- Multi-head self-attention (12 heads, 64 dims per head)
- Feed-forward network (768 → 3072 → 768)
- Layer normalization (pre-norm architecture)
- Residual connections
- Dropout (0.1)
Output Head
- Classification: Linear layer (768 → 2 classes)
- Generative: Linear layer (768 → vocab_size)

Transfer Learning Strategy

Pre-trained Weights: GPT-2 Small (124M parameters)
Freezing: All layers frozen except:
- Last transformer block
- Final layer normalization
- Classification head (newly initialized)
Fine-tuning: 3 epochs with learning rate 1e-5

Classification Approach

Pooling: Mean pooling over sequence length (handles variable-length inputs)
Masking: Proper attention masking for padding tokens
Output: Binary classification (cyberbullying / not cyberbullying)

📊 Results

Training Performance

Final Metrics:

Validation Accuracy: 87.97%
Training Accuracy: 86.41%
Validation Loss: 0.293
Training Loss: 0.293

Training Progress:

Epoch 1: 86.72% validation accuracy
Epoch 2: 87.19% validation accuracy
Epoch 3: 87.97% validation accuracy
Training time: ~50 minutes

The model shows excellent generalization with closely aligned train/validation metrics, indicating no overfitting.

🔧 Technical Details

Hyperparameters

Parameter	Value	Description
Learning Rate	1e-5	Small LR for fine-tuning
Weight Decay	0.01	L2 regularization
Batch Size	64	Training batch size
Epochs	3	Number of training epochs
Gradient Clipping	1.0	Prevents exploding gradients
Dropout	0.1	Regularization rate
Optimizer	AdamW	With weight decay
Scheduler	Cosine Annealing	LR decay (1e-5 → 1e-6)

Key Implementation Details

Custom Components: LayerNorm and GELU implemented from scratch
Causal Masking: Maintains autoregressive property
Efficient Data Loading: 4 worker processes, pin memory
Comprehensive Logging: Training metrics logged to logs/training_logs.txt
Visualization: Loss and accuracy plots generated during training

Dependencies

torch: Deep learning framework
tiktoken: GPT-2 tokenizer
fastapi: Web framework for API
uvicorn: ASGI server
pandas: Data manipulation
numpy: Numerical operations
matplotlib: Visualization
tqdm: Progress bars

📚 References

Primary Source: "Build a Large Language Model from Scratch" by Sebastian Raschka
Pre-trained Model: GPT-2 Small (124M) by OpenAI
Dataset: Kaggle Cyberbullying Detection Dataset
Tokenizer: GPT-2 BPE tokenizer (tiktoken)

📝 Notes

The model automatically uses GPU if available, otherwise falls back to CPU
Pre-trained GPT-2 weights are downloaded automatically on first run
Model checkpoints are cached in pickle_vars/ for faster subsequent runs
Training logs are saved to logs/training_logs.txt
Visualization plots are saved as PNG files during training

🎓 Learning Outcomes

This project demonstrates:

Deep Learning Fundamentals: Understanding of transformer architecture, attention mechanisms, and normalization techniques
Transfer Learning: Effective application of pre-trained models to downstream tasks
ML Engineering: Complete pipeline from data preprocessing to production API
PyTorch Expertise: Clean, modular, production-ready code implementation
Problem-Solving: Adapting generative models for classification tasks

📄 License

This project is for educational and demonstration purposes.

For a detailed technical case study, see case_study.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Better Threads Project: GPT-Based Cyberbullying Detection

🎯 Project Overview

Key Features

📋 Table of Contents

🚀 Installation

Prerequisites

Setup

📁 Project Structure

💻 Usage

Training

Interactive Testing

API Server

API Endpoints

Example API Usage

🏗️ Model Architecture

BTPModel (GPT Architecture)

Transfer Learning Strategy

Classification Approach

📊 Results

Training Performance

🔧 Technical Details

Hyperparameters

Key Implementation Details

Dependencies

📚 References

📝 Notes

🎓 Learning Outcomes

📄 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
cyberbullying_detector		cyberbullying_detector
gpt		gpt
logs		logs
pretrained_gpt2		pretrained_gpt2
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
api.py		api.py
case_study.md		case_study.md
global_utils.py		global_utils.py
main.py		main.py
requirements.txt		requirements.txt
run_api.py		run_api.py

azeng4499/The-Better-Threads-Project

Folders and files

Latest commit

History

Repository files navigation

The Better Threads Project: GPT-Based Cyberbullying Detection

🎯 Project Overview

Key Features

📋 Table of Contents

🚀 Installation

Prerequisites

Setup

📁 Project Structure

💻 Usage

Training

Interactive Testing

API Server

API Endpoints

Example API Usage

🏗️ Model Architecture

BTPModel (GPT Architecture)

Transfer Learning Strategy

Classification Approach

📊 Results

Training Performance

🔧 Technical Details

Hyperparameters

Key Implementation Details

Dependencies

📚 References

📝 Notes

🎓 Learning Outcomes

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages