Skip to content

LSTM Autoencoder–based anomaly detection system for CAN-Bus traffic, featuring synthetic attack generation and reconstruction-error classification.

License

Notifications You must be signed in to change notification settings

Yigtwxx/CANomaly-LSTM

Repository files navigation

🚗 CANomaly-LSTM

LSTM Autoencoder–Based Anomaly Detection for automotive CAN-Bus

Python PyTorch License Platform Code Style

Confusion Matrix


📋 Table of Contents


✨ Overview

CANomaly-LSTM is a specialized, end-to-end anomaly detection pipeline designed for Controller Area Network (CAN) security. As modern vehicles become increasingly connected, they face growing threats from cyberattacks. This project provides a robust solution using Deep Learning to identify malicious activities on the CAN bus.

The system utilizes an LSTM (Long Short-Term Memory) Autoencoder architecture to learn the temporal patterns of normal CAN traffic. By analyzing reconstruction errors, it can effectively detect anomalies such as spoofing, replay attacks, and DoS attempts without requiring labeled attack data for training.

Key Capabilities

  • Synthetic Traffic Generation: Create realistic CAN data with customizable normal patterns and attack scenarios.
  • Unsupervised Learning: Trains only on normal data, making it capable of detecting zero-day attacks.
  • Automated Thresholding: Dynamically selects the optimal reconstruction error threshold to maximize F1-score.

💡 Features

🛡️ Comprehensive Attack Simulation

The built-in generator supports 4 distinct attack types to test system robustness:

  • Spoofing: Injecting fake messages with legitimate IDs.
  • Replay: Re-transmitting valid captured messages to deceive ECUs.
  • Unauthorized ID: Broadcasting messages with IDs not defined in the system DBC.
  • Payload Corruption: Randomizing data bytes to simulate fuzzing or sensor malfunctions.

🧠 Advanced Model Architecture

  • Input Features: One-hot encoded CAN IDs + Normalized Payload (8 bytes) + Inter-Arrival Time (IAT).
  • Sliding Window: Processes data in sequences (window size: 50, stride: 5) to capture temporal context.
  • Autoencoder: Compresses input into a latent representation and reconstructs it; high error indicates anomaly.

📁 Project Structure

CANomaly-LSTM/
├── data/                    # Data storage
│   ├── can_data.csv         # Generated synthetic dataset
│   └── recon_errors.csv     # Model outputs (errors & labels)
│
├── outputs/                 # Results & Visualizations
│   ├── confusion_matrix.png # Visual performance metric
│   └── confusion_report.txt # Detailed classification metrics
│
├── src/                     # Source Code
│   ├── generate_can_dataset.py  # Data generation with attack injection
│   ├── train_lstm_ae.py         # LSTM Autoencoder training loop
│   └── plot_confusion.py        # Evaluation & plotting scripts
│
├── .gitignore
├── LICENSE
├── README.md
├── requirements.txt         # Python dependencies
└── SECURITY.md

🛠️ Installation

  1. Clone the repository

    git clone https://github.com/Yigtwxx/CANomaly-LSTM.git
    cd CANomaly-LSTM
  2. Create a Virtual Environment (Optional but Recommended)

    python -m venv .venv
    # Windows:
    .venv\Scripts\activate
    # Linux/Mac:
    source .venv/bin/activate
  3. Install Dependencies

    pip install -r requirements.txt

🚀 Usage

Follow these steps to run the complete pipeline:

1. Data Generation

Generate a new synthetic dataset containing both normal traffic and injected attacks.

python src/generate_can_dataset.py

Output: data/can_data.csv

2. Model Training

Train the LSTM Autoencoder on the normal subset of the data.

python src/train_lstm_ae.py

Output: Trained model (in memory) & Reconstruction errors saved to data/recon_errors.csv

3. Evaluation

Calculate metrics, find the optimal threshold, and generate the confusion matrix.

python src/plot_confusion.py

Output: outputs/confusion_matrix.png & outputs/confusion_report.txt


📊 Results

The model achieves high performance in distinguishing between normal operation and various attack vectors.

Confusion Matrix

Confusion Matrix

Performance Metrics

Metric Score Description
Accuracy 98.91% Overall correctness of the model.
Precision 95.52% High reliability in anomaly alerts (low false positives).
Recall 60.95% Ability to detect the majority of attack instances.
F1-Score 0.7442 Balanced harmonic mean of Precision and Recall.

(Metrics based on the optimal threshold of 0.665126)


💬 Contact

Yiğit Erdoğan


Note: This project is for educational and research purposes. Always test security tools in controlled environments.

About

LSTM Autoencoder–based anomaly detection system for CAN-Bus traffic, featuring synthetic attack generation and reconstruction-error classification.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published

Languages