An intelligent, distributed microservice architecture that uses a Reinforcement Learning (PPO) agent to dynamically route tasks across a cluster of Java Spring Boot worker nodes based on real-time hardware capacity.
This project simulates a distributed system with three primary components:
- Master Node (API Gateway & Router): A Java Spring Boot application that acts as the single point of entry. It maintains a strictly ordered, thread-safe registry of all active workers and their hardware states.
- Worker Nodes (The Cluster): Isolated Java Spring Boot instances. Upon boot, they dynamically read their CPU and RAM constraints from the Docker environment and register themselves with the Master Node via gRPC/HTTP payloads.
- AI Inference Service: A Python FastAPI microservice hosting a trained Proximal Policy Optimization (PPO) model (
stable-baselines3). It receives the deterministic cluster state from the Master Node, calculates the optimal distribution of the incoming payload, and returns the target worker index.
- Reinforcement Learning Routing: Replaces static algorithms (Round Robin, Least Connections) with an AI model trained to prevent node starvation and optimize global cluster throughput.
- Active Load Shedding: Protects the cluster from DDoS-level starvation by proactively dropping tasks (returning 429/503/500 HTTP status codes) when the RL agent detects the cluster is at maximum capacity.
- Fully Containerized: Isolated Docker Bridge network ensures internal microservices communicate securely without exposing internal ports to the host machine.
- Docker & Docker Compose (WSL2 integration recommended for Windows)
- Java 17+ (For local development/compilation)
- Python 3.10+ (For local training/stress-testing)
- Maven
Due to strict microservice registration dependencies, the cluster must be booted in a specific sequence to ensure the Master Node is ready to accept registrations before the workers wake up.
1. Boot the Master Node & AI Service
docker-compose up --build -d master-node ai-service2. Boot the Worker Fleet
docker-compose up --build -d worker-1 worker-2 worker-3 worker-4You can test the AI's adaptability by altering the hardware constraints of the workers in your docker-compose.yml.
worker-2:
environment:
- WORKER_ID=worker-2
- WORKER_RAM=8192.0