GitHub - Trojan3877/LogSight-AI: LogSight-AI is a real-time AIOps platform that ingests Kubernetes logs at > 50 k lines/sec, tokenizes them with a C++ SIMD engine, clusters patterns on-the-fly using HDBSCAN + Isolation Forest

LogSight-AI — Real-Time AIOps Log Intelligence Platform

LogSight-AI is a real-time AIOps platform designed to ingest, analyze, and monitor log data streams using machine learning to detect anomalies, failures, and system irregularities.

The system bridges:

Log ingestion pipelines
Streaming analytics
Machine learning inference
Observability dashboards

Core Capabilities

Real-time log ingestion and parsing
Anomaly detection using ML models
Time-series pattern recognition
Alert generation for system anomalies
Monitoring dashboards (Streamlit / Grafana)
API-driven inference layer

System Architecture Log Sources → Streaming Pipeline → Feature Extraction → ML Model → Anomaly Detection → Dashboard / Alerts

Tech Stack

Layer	Technology
Language	Python
Backend API	FastAPI
Dashboard	Streamlit
ML Tracking	MLflow
Containerization	Docker
Orchestration	Kubernetes
Monitoring	Prometheus + Grafana

Data Flow

Logs are ingested from system sources
Streaming pipeline processes events in real-time
Features are extracted from log patterns
ML model detects anomalies
Results are visualized and monitored

Use Cases

Infrastructure monitoring
Failure detection
Incident response automation
Cloud system observability
DevOps / SRE automation

Performance & Design Considerations

Low-latency streaming inference
Scalable microservices architecture
Efficient memory usage for log parsing
Horizontal scaling via Kubernetes
Real-time dashboard updates

Why This Project Matters

Modern systems generate massive volumes of logs.

This project demonstrates:

Real-time AI system design
Production-grade observability architecture
ML applied to infrastructure reliability
End-to-end AIOps pipeline implementation

🚀 How to Run

Prerequisites

Python 3.9+
Docker (optional, for containerized runs)

Local Installation

# 1. Clone the repository
git clone https://github.com/Trojan3877/LogSight-AI.git
cd LogSight-AI

# 2. (Recommended) Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate

# 3. Install the package with all dependencies
pip install -e ".[dev]"

# 4. Verify the installation
logsight health

Analyse a Log File

# Analyse a local log file
logsight analyze /var/log/syslog

# Pipe logs from stdin
cat app.log | logsight stdin

# Adjust thresholds
logsight analyze app.log --threshold 3.0 --window 200 --spike-threshold 0.3

Environment Variables

Copy .env.example to .env and customise as needed:

cp .env.example .env

Variable	Default	Description
`LOGSIGHT_THRESHOLD`	`2.5`	Z-score threshold for anomaly detection
`LOGSIGHT_WINDOW`	`100`	Sliding-window size for spike detection
`LOGSIGHT_SPIKE_THRESHOLD`	`0.25`	Error-rate fraction that constitutes a spike

Docker

# Build the image
docker build -t logsight-ai:latest .

# Verify the container starts correctly
docker run --rm logsight-ai:latest health

# Analyse a log file from the host
docker run --rm \
  -v /var/log:/logs:ro \
  logsight-ai:latest analyze /logs/syslog

Running Tests

# Run the full test suite
pytest

# Run with coverage report
pytest --cov=logsight --cov-report=term-missing

CI/CD

GitHub Actions automatically runs linting and tests on every push and pull request (see .github/workflows/ci.yml).

📌 Future Improvements

LLM-based log summarization
Root cause analysis using AI agents
Distributed log ingestion (Kafka integration)
Advanced anomaly detection (transformers, LSTMs)

❓ Why did you build LogSight-AI?

Modern distributed systems generate massive volumes of logs, making manual monitoring inefficient and error-prone. LogSight-AI was built to automate log analysis using machine learning, enabling real-time anomaly detection and improving system reliability.

❓ What problem does this solve?

Traditional log monitoring systems rely on static rules and thresholds, which fail in dynamic environments. LogSight-AI solves this by:

Learning patterns from historical log data
Detecting anomalies in real-time
Reducing alert fatigue through intelligent filtering
Improving incident response time

❓ How does the system work end-to-end?

Logs are ingested from system sources
Streaming pipeline processes incoming data
Features are extracted (timestamps, frequency, patterns)
Machine learning model evaluates log behavior
Anomalies are detected and flagged
Results are visualized in dashboards and alerts

❓ Why use machine learning for logs instead of rules?

Rule-based systems:

Break in dynamic environments
Require constant manual updates

ML-based systems:

Adapt to changing system behavior
Detect unknown patterns
Reduce human intervention

❓ What type of machine learning is used?

The system focuses on:

Time-series anomaly detection
Unsupervised / semi-supervised learning
Pattern recognition in log sequences

Future improvements may include:

Transformer-based anomaly detection
LSTM-based sequence modeling

❓ How is real-time performance handled?

Streaming ingestion minimizes latency
Lightweight feature extraction ensures fast processing
Model inference is optimized for low-latency execution
Containerized deployment allows horizontal scaling

❓ How does the system scale?

LogSight-AI is designed with scalability in mind:

Docker for containerization
Kubernetes for orchestration
Stateless services for horizontal scaling
Monitoring via Prometheus + Grafana

❓ How are anomalies defined?

Anomalies are deviations from learned normal behavior, such as:

Sudden spikes in error logs
Unusual frequency patterns
Unexpected log sequences
Rare or unseen events

❓ What are the main engineering challenges?

Handling high-volume log streams
Designing low-latency pipelines
Avoiding false positives in anomaly detection
Maintaining model performance over time
Ensuring system scalability

❓ How would you improve this system?

Planned enhancements include:

LLM-based log summarization
Root cause analysis using AI agents
Kafka-based distributed streaming
Transformer-based anomaly detection
Multi-region observability

❓ How does this compare to industry tools?

LogSight-AI aligns with systems like:

Datadog
Splunk
Elastic Observability

However, it differentiates itself by:

Integrating ML directly into the pipeline
Supporting real-time inference
Being fully customizable and extensible

❓ What did you learn from building this?

Designing real-time ML systems
Building scalable data pipelines
Applying ML to infrastructure problems
Understanding observability engineering
Bridging DevOps and AI (AIOps)

❓ Why is this project important for AI engineering?

This project demonstrates:

End-to-end ML system design
Real-time inference pipelines
Production-ready architecture
Practical application of AI to real-world systems

❓ How would this perform in production?

With proper deployment (Kubernetes + monitoring):

Handles high-volume log streams
Scales horizontally
Provides low-latency anomaly detection
Integrates with alerting systems

❓ Who would use this system?

DevOps Engineers
Site Reliability Engineers (SREs)
Cloud Infrastructure Teams
AI/ML Engineers working on AIOps

❓ What makes this project stand out?

Combines AI + DevOps (rare skill combination)
Real-time system design (not batch ML)
Production-ready architecture
Focus on observability and reliability

❓ How does this relate to large-scale AI systems?

Large AI systems (OpenAI, Meta, Netflix) rely heavily on:

Monitoring pipelines
Anomaly detection
Infrastructure observability

LogSight-AI reflects these real-world engineering requirements.

Multi-cluster observability support

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.github/workflows		.github/workflows
docs		docs
logsight		logsight
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Procfile		Procfile
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 How to Run

Prerequisites

Local Installation

Analyse a Log File

Environment Variables

Docker

Running Tests

CI/CD

📌 Future Improvements

❓ Why did you build LogSight-AI?

❓ What problem does this solve?

❓ How does the system work end-to-end?

❓ Why use machine learning for logs instead of rules?

❓ What type of machine learning is used?

❓ How is real-time performance handled?

❓ How does the system scale?

❓ How are anomalies defined?

❓ What are the main engineering challenges?

❓ How would you improve this system?

❓ How does this compare to industry tools?

❓ What did you learn from building this?

❓ Why is this project important for AI engineering?

❓ How would this perform in production?

❓ Who would use this system?

❓ What makes this project stand out?

❓ How does this relate to large-scale AI systems?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 How to Run

Prerequisites

Local Installation

Analyse a Log File

Environment Variables

Docker

Running Tests

CI/CD

📌 Future Improvements

❓ Why did you build LogSight-AI?

❓ What problem does this solve?

❓ How does the system work end-to-end?

❓ Why use machine learning for logs instead of rules?

❓ What type of machine learning is used?

❓ How is real-time performance handled?

❓ How does the system scale?

❓ How are anomalies defined?

❓ What are the main engineering challenges?

❓ How would you improve this system?

❓ How does this compare to industry tools?

❓ What did you learn from building this?

❓ Why is this project important for AI engineering?

❓ How would this perform in production?

❓ Who would use this system?

❓ What makes this project stand out?

❓ How does this relate to large-scale AI systems?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages