🚨 AI-Driven Threat Detection & Prioritization

👥 Team Members

Garach Viraj
Vaghasiya Jil
Dedakiya Manav

📜 Disclaimer

This project integrates a publicly available AI model licensed under the MIT License
(reference implementation: https://github.com/mahaswetaroy1/cybersecurity-threat-ai.git).

The model was pre-trained and adapted for this system.
All architecture design, preprocessing pipelines, risk scoring logic, API integration, and dashboard components were developed independently during the hackathon.

📌 Project Overview

Modern security teams face alert fatigue caused by massive volumes of logs and monitoring alerts, increasing the risk of missing critical threats.

This project presents an AI-driven threat detection and prioritization system that:

Detects anomalous behavior in network traffic
Assigns dynamic risk scores
Prioritizes alerts based on severity
Provides visual insights through a web dashboard

The goal is to improve SOC efficiency, early threat detection, and decision-making clarity.

✨ Key Features

Real-time anomaly detection
Risk-based threat scoring and prioritization
Interactive web dashboard for monitoring and alerts
Machine learning models:
- Random Forest
- XGBoost
- Neural Networks
Robust preprocessing pipeline for imbalanced datasets
Model explainability using feature importance
Secure configuration using environment variables
Modular and scalable architecture

Dataset Requirements

The training pipeline expects a KDD-style network intrusion dataset in ARFF format. After preprocessing, a numeric CSV file is generated for model training.

Raw Input Format

File type: .arff
Structure: Tabular network traffic data
Mandatory label column: class

Each row represents a single network event or flow, and each column represents a feature or label.

Preprocessing Pipeline

The preprocessing step:

Loads ARFF-formatted data
Decodes categorical features
Encodes all non-numeric fields using label encoding
Outputs a fully numeric CSV file for training

⚠ The original datasets used during development are not included due to privacy, security, and size considerations.

Required Data Format

The preprocessing pipeline expects tabular data in CSV or ARFF format with features similar to the following:

Feature Name	Description
`timestamp`	Event or flow timestamp
`src_ip`	Source IP address
`dst_ip`	Destination IP address
`src_port`	Source port
`dst_port`	Destination port
`protocol`	Network protocol (TCP, UDP, ICMP, etc.)
`packet_count`	Number of packets
`byte_count`	Number of bytes transferred
`flow_duration`	Duration of the network flow
`flag_counts`	TCP flag statistics
`class`	Normal / Attack (or attack category)

✅ The class column is mandatory for supervised training.

Data Preprocessing

The preprocessing pipeline includes:

Label encoding of categorical features
Handling missing or inconsistent values
Class imbalance mitigation (oversampling / weighting)
Feature normalization where required

Implemented in:

src/preprocess.py

Using Your Own Dataset

Place the dataset inside the data/ directory
Ensure it follows the feature structure described above
Run preprocessing:

python src/preprocess.py

Train the model:

python src/train.py

Trained models are automatically saved to:

models/

Synthetic & Test Data

For experimentation:

You may use synthetic datasets

Or simulated network traffic matching the schema

This allows testing without exposing real or sensitive data.

⚠ Preprocessing must be completed before training.
Ensure data/KDDTrain+Multi.csv is generated by src/preprocess.py.

🚀 Installation & Setup Prerequisites

Python 3.11+

pip

Virtual environment tool (venv recommended)

Installation Steps 1️⃣ Clone the repository git clone

https://github.com/0Manav0/AI-threat-detect.git
cd AI-threat-detect

2️⃣ Create and activate virtual environment

python -m venv venv
source venv/bin/activate      # Linux / macOS
venv\Scripts\activate         # Windows

3️⃣ Install dependencies

pip install -r requirements.txt

🟠 Usage Guide A. Preprocess Data

python src/preprocess.py

B. Train the Model

python src/train.py

C. Test Predictions (Optional)

python src/predict.py

D. Deploy API

python src/deploy.py

➡ Visit: http://127.0.0.1:5000

Submit requests through the web interface.

📁 Project Structure

cybersecurity-threat-ai/

├── models/ # Trained ML models

├── data/ # Input datasets (user-provided)

├── templates/ # HTML templates

├── static/ # CSS, JS, assets

├── src/ # Core AI & API logic

│ ├── preprocess.py

│ ├── train.py

│ ├── predict.py

│ └── deploy.py

├── requirements.txt

├── README.md

├── .gitignore

└── LICENSE

📖 How It Works 🔍 Data Ingestion

Network or log data is loaded, cleaned, and normalized.

📊 Feature Engineering

Traffic behavior, protocol patterns, and statistical features are extracted.

🤖 Model Training

ML models learn patterns distinguishing normal and malicious activity.

🚨 Anomaly Detection

Incoming data is scored to detect suspicious behavior.

⚡ Alert Prioritization

Threats are ranked using model confidence and risk scoring logic.

📈 Visualization

A dashboard presents alerts, trends, and insights for analysts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚨 AI-Driven Threat Detection & Prioritization

👥 Team Members

📜 Disclaimer

📌 Project Overview

✨ Key Features

Dataset Requirements

Raw Input Format

Preprocessing Pipeline

Required Data Format

Data Preprocessing

Using Your Own Dataset

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
notebooks		notebooks
src		src
static		static
templates		templates
.gitignore		.gitignore
README.md		README.md

Bot-Manav/AI-threat-detect

Folders and files

Latest commit

History

Repository files navigation

🚨 AI-Driven Threat Detection & Prioritization

👥 Team Members

📜 Disclaimer

📌 Project Overview

✨ Key Features

Dataset Requirements

Raw Input Format

Preprocessing Pipeline

Required Data Format

Data Preprocessing

Using Your Own Dataset

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages