Skip to content

darsh9510/distributed-rl-load-balancer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Autonomous RL Load Balancer

An intelligent, distributed microservice architecture that uses a Reinforcement Learning (PPO) agent to dynamically route tasks across a cluster of Java Spring Boot worker nodes based on real-time hardware capacity.

Architecture Overview

This project simulates a distributed system with three primary components:

  1. Master Node (API Gateway & Router): A Java Spring Boot application that acts as the single point of entry. It maintains a strictly ordered, thread-safe registry of all active workers and their hardware states.
  2. Worker Nodes (The Cluster): Isolated Java Spring Boot instances. Upon boot, they dynamically read their CPU and RAM constraints from the Docker environment and register themselves with the Master Node via gRPC/HTTP payloads.
  3. AI Inference Service: A Python FastAPI microservice hosting a trained Proximal Policy Optimization (PPO) model (stable-baselines3). It receives the deterministic cluster state from the Master Node, calculates the optimal distribution of the incoming payload, and returns the target worker index.

Key Features

  • Reinforcement Learning Routing: Replaces static algorithms (Round Robin, Least Connections) with an AI model trained to prevent node starvation and optimize global cluster throughput.
  • Active Load Shedding: Protects the cluster from DDoS-level starvation by proactively dropping tasks (returning 429/503/500 HTTP status codes) when the RL agent detects the cluster is at maximum capacity.
  • Fully Containerized: Isolated Docker Bridge network ensures internal microservices communicate securely without exposing internal ports to the host machine.

Prerequisites

  • Docker & Docker Compose (WSL2 integration recommended for Windows)
  • Java 17+ (For local development/compilation)
  • Python 3.10+ (For local training/stress-testing)
  • Maven

Boot Sequence

Due to strict microservice registration dependencies, the cluster must be booted in a specific sequence to ensure the Master Node is ready to accept registrations before the workers wake up.

1. Boot the Master Node & AI Service

docker-compose up --build -d master-node ai-service

2. Boot the Worker Fleet

docker-compose up --build -d worker-1 worker-2 worker-3 worker-4

Configuration (Docker Compose)

You can test the AI's adaptability by altering the hardware constraints of the workers in your docker-compose.yml.

  worker-2:
    environment:
      - WORKER_ID=worker-2
      - WORKER_RAM=8192.0

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors