RohanExploit · RohanExploit · Feb 25, 2026 · Feb 25, 2026 · Feb 25, 2026 · coderabbitai
diff --git a/CIVIC_INTELLIGENCE.md b/CIVIC_INTELLIGENCE.md
@@ -1,62 +1,71 @@
-# Daily Civic Intelligence Refinement Engine
+# 🧠 Daily Civic Intelligence Refinement Engine
 
-## Overview
-VishwaGuru's Civic Intelligence Engine is a self-improving AI system that runs daily at midnight to analyze civic issues, detect trends, and optimize the system's severity scoring logic based on real-world patterns.
+The Civic Intelligence Engine is a self-improving AI infrastructure that runs daily at midnight (UTC) to analyze civic issues, detect trends, and optimize system parameters automatically.
 
-## Architecture
+## 🚀 Overview
 
-The engine is composed of the following modules:
-1.  **TrendAnalyzer (`backend/trend_analyzer.py`):** Extracts top keywords and identifies geographic clusters using DBSCAN.
-2.  **AdaptiveWeights (`backend/adaptive_weights.py`):** Manages dynamic severity scoring weights stored in `backend/data/modelWeights.json`.
-3.  **CivicIntelligenceEngine (`backend/civic_intelligence.py`):** The orchestrator that runs the daily cycle.
+Every day at 00:00, the system:
+1.  **Analyzes** all civic issues submitted in the last 24 hours.
+2.  **Detects** new patterns, trending topics, and geographic clusters.
+3.  **Refines** severity scoring weights based on manual overrides (adaptive learning).
+4.  **Optimizes** duplicate detection thresholds based on clustering density.
+5.  **Generates** a "Civic Intelligence Index" score.
+6.  **Archives** a daily snapshot for transparency and auditability.
 
-## Daily Cycle Algorithm
+---
 
-Every day at 00:00 UTC, the system performs the following steps:
+## ⚙️ Core Components
 
-### 1. Trend Detection
-*   Analyzes all issues submitted in the last 24 hours.
-*   **Keyword Extraction:** Identifies top 5 most common keywords (excluding stop words).
-*   **Category Spikes:** Compares current category volume with the previous day's snapshot. A category is flagged as a "spike" if:
-    *   Volume > 5
-    *   Increase > 50% compared to yesterday.
-*   **Geographic Clustering:** Uses DBSCAN (Density-Based Spatial Clustering of Applications with Noise) to find clusters of issues (e.g., multiple reports of the same pothole).
+### 1. Trend Detection (`backend/trend_analyzer.py`)
+*   **Keyword Extraction**: Identifies top 5 most common keywords (excluding stop words) from issue descriptions.
+*   **Category Spikes**: Detects categories with a >50% increase in volume compared to the previous day.
+*   **Geographic Clustering**: Uses DBSCAN (Density-Based Spatial Clustering of Applications with Noise) to identify hotspots where multiple issues are reported in close proximity.
 
-### 2. Adaptive Weight Optimization
-The system learns from manual interventions:
-*   **Input:** Queries `EscalationAudit` logs for "Severity Upgrades" (manual overrides where an admin increased the severity).
-*   **Logic:** If a category receives ≥ 3 manual upgrades in 24 hours, its severity multiplier is increased by 10% (x1.1).
-*   **Goal:** To automatically classify similar future issues as higher severity, reducing the need for manual intervention.
+### 2. Adaptive Weight Optimization (`backend/adaptive_weights.py`)
+The system learns from human actions to improve its automated severity scoring.
+
+*   **Logic**: If administrators manually upgrade the severity of issues in a specific category (e.g., changing "pothole" from Low to Critical) more than 3 times in a day, the system infers that its default weight for that category is too low.
+*   **Action**: The category multiplier in `modelWeights.json` is automatically increased by 10%.
+*   **Constraint**: Weights are clamped between 0.5x and 3.0x to prevent runaway values.
 
 ### 3. Duplicate Pattern Learning
-*   **Input:** Geographic clustering density.
-*   **Logic:**
-    *   If many clusters (>5) are found: Increase duplicate search radius (x1.05) to better group reports.
-    *   If volume is high (>50) but no clusters: Increase radius (x1.05) as the current radius might be too strict.
-    *   If volume is low (<10) and radius is large: Decay radius (x0.95) to improve precision.
+The system adjusts the radius used for spatial deduplication based on the density of issues.
 
-### 4. Civic Intelligence Index
-A daily score (0-100) reflecting the city's civic health.
-*   **Base Score:** 70
-*   **Bonus:** +2.0 per resolved issue.
-*   **Penalty:** -0.5 per new issue.
-*   **Output:** Includes "Top Emerging Concern" and "Highest Severity Region".
+*   **Logic**:
+    *   **High Density** (> 5 clusters): Increases search radius by 5% to better group related issues.
+    *   **High Volume, No Clusters**: Increases search radius to catch potential duplicates that are slightly further apart.
+    *   **Low Volume**: Decays radius slightly (by 5%) to improve precision.
+*   **Implementation**: Updates `duplicate_search_radius` in `modelWeights.json`. This value is consumed by the `create_issue` endpoint for real-time deduplication.
 
-## Data Storage & Auditability
+### 4. Civic Intelligence Index (`backend/civic_intelligence.py`)
+A daily score (0-100) reflecting the civic health and system responsiveness.
 
-### Model Weights
-*   Dynamic weights are stored in `backend/data/modelWeights.json`.
-*   This file is hot-reloaded by the application without restart.
+*   **Formula**: `Base (70) + (Resolved Issues * 2) - (New Issues * 0.5)`
+*   **Insights**:
+    *   **Top Emerging Concern**: The category with the highest volume.
+    *   **Highest Severity Region**: The geographic center of the largest issue cluster.
+
+---
+
+## 📁 Data & Transparency
 
 ### Daily Snapshots
-*   Stored in `backend/data/dailySnapshots/YYYY-MM-DD.json`.
-*   Contains:
-    *   `civic_index`: The calculated score and metrics.
-    *   `trends`: Keywords, distribution, clusters, and detected spikes.
-    *   `weight_changes`: A detailed audit log of what weights were changed, the old value, the new value, and the reason.
-    *   `model_weights`: A copy of the full weight configuration at the time of the snapshot for full reproducibility.
-
-## Evolution Logic
-The system evolves by:
-1.  **Self-Correction:** If admins constantly upgrade "Pothole" severity, the system learns that "Pothole" is more critical than initially configured.
-2.  **Dynamic Sensitivity:** The duplicate detection radius "breathes" (expands/contracts) based on the density of reports, adapting to urban density changes or event-driven spikes.
+Snapshots are stored in `backend/data/dailySnapshots/YYYY-MM-DD.json`.
+They contain:
+*   `trends`: Top keywords, category distribution, clusters.
+*   `civic_index`: The daily score and insights.
+*   `weight_changes`: Audit log of any automatic weight adjustments.
-*   `weight_changes`: Audit log of any automatic weight adjustments.
+*   `weight_updates`: Audit log of any automatic weight adjustments.
-*   `weight_changes`: Audit log of any automatic weight adjustments.
+*   `weight_updates`: Audit log of any automatic weight adjustments.
+*   `model_weights`: The state of weights at that time.
+
+### Model Weights
+Dynamic configuration is stored in `backend/data/modelWeights.json`.
+*   `category_multipliers`: Dynamic severity weights.
+*   `duplicate_search_radius`: Dynamic search radius in meters.
+
+---
+
+## 🛠️ Architecture
+
+*   **Scheduler**: A lightweight `asyncio` loop in `backend/scheduler.py` triggers the job.
+*   **Execution**: `CivicIntelligenceEngine.run_daily_cycle` runs in a separate thread to avoid blocking the main event loop.
+*   **Persistence**: All state changes are persisted to JSON files, ensuring "Local First" architecture with no external API dependencies for core logic.
diff --git a/backend/requirements-render.txt b/backend/requirements-render.txt
@@ -19,4 +19,5 @@ SpeechRecognition
 pydub
 googletrans==4.0.2
 langdetect
-indic-nlp-library
+scikit-learn
+numpy<2.0.0
diff --git a/backend/routers/issues.py b/backend/routers/issues.py
@@ -29,6 +29,7 @@
     send_status_notification
 )
 from backend.spatial_utils import get_bounding_box, find_nearby_issues
+from backend.adaptive_weights import adaptive_weights
 from backend.cache import recent_issues_cache, nearby_issues_cache
 from backend.hf_api_service import verify_resolution_vqa
 from backend.dependencies import get_http_client
@@ -93,9 +94,12 @@ async def create_issue(
 
     if latitude is not None and longitude is not None:
         try:
-            # Find existing open issues within 50 meters
+            # Get dynamic search radius from adaptive weights
+            search_radius = adaptive_weights.get_duplicate_search_radius()
+
+            # Find existing open issues within search_radius (default 50m)
             # Optimization: Use bounding box to filter candidates in SQL
-            min_lat, max_lat, min_lon, max_lon = get_bounding_box(latitude, longitude, 50.0)
+            min_lat, max_lat, min_lon, max_lon = get_bounding_box(latitude, longitude, search_radius)
 
             # Performance Boost: Use column projection to avoid loading full model instances
             open_issues = await run_in_threadpool(
@@ -118,7 +122,7 @@ async def create_issue(
             )
 
             nearby_issues_with_distance = find_nearby_issues(
-                open_issues, latitude, longitude, radius_meters=50.0
+                open_issues, latitude, longitude, radius_meters=search_radius
             )
 
             if nearby_issues_with_distance:

diff --git a/backend/spatial_utils.py b/backend/spatial_utils.py
@@ -167,14 +167,14 @@ def cluster_issues_dbscan(issues: List[Issue], eps_meters: float = 30.0) -> List
         [issue.latitude, issue.longitude] for issue in valid_issues
     ])
 
-    # Convert eps from meters to degrees (approximate)
-    # 1 degree latitude ≈ 111,000 meters
-    # 1 degree longitude ≈ 111,000 * cos(latitude) meters
-    eps_degrees = eps_meters / 111000  # Rough approximation
+    # Convert eps from meters to radians
+    # Haversine metric expects inputs in radians and eps in radians
+    R = 6371000.0  # Earth's radius in meters
+    eps_radians = eps_meters / R
 
     # Perform DBSCAN clustering
     try:
-        db = DBSCAN(eps=eps_degrees, min_samples=1, metric='haversine').fit(
+        db = DBSCAN(eps=eps_radians, min_samples=1, metric='haversine').fit(
             np.radians(coordinates)
         )
 

diff --git a/backend/tests/test_civic_intelligence_system.py b/backend/tests/test_civic_intelligence_system.py
@@ -0,0 +1,152 @@
+import os
+import json
+import pytest
+import shutil
+import tempfile
+from datetime import datetime, timedelta, timezone
+from sqlalchemy import create_engine
+from sqlalchemy.orm import sessionmaker
+from unittest.mock import patch
-from unittest.mock import patch
+from unittest.mock import patch, MagicMock
-from unittest.mock import patch
+from unittest.mock import patch, MagicMock
+
+from backend.database import Base
+from backend.models import Issue, EscalationAudit, EscalationReason, Grievance, SeverityLevel, JurisdictionLevel, GrievanceStatus, Jurisdiction
+from backend.civic_intelligence import CivicIntelligenceEngine
+from backend.adaptive_weights import AdaptiveWeights
+
+# Test Data
+MOCK_WEIGHTS = {
+    "category_multipliers": {
+        "pothole": 1.0,
+        "garbage": 1.0
+    },
+    "duplicate_search_radius": 50.0,
+    "severity_keywords": {},
+    "urgency_patterns": [],
+    "category_keywords": {}
+}
+
+@pytest.fixture
+def temp_dirs():
+    # Create temp directories for snapshots and weights
+    temp_dir = tempfile.mkdtemp()
+    weights_file = os.path.join(temp_dir, "modelWeights.json")
+    snapshots_dir = os.path.join(temp_dir, "dailySnapshots")
+    os.makedirs(snapshots_dir)
+
+    # Initialize weights
+    with open(weights_file, 'w') as f:
+        json.dump(MOCK_WEIGHTS, f)
+
+    yield temp_dir, weights_file, snapshots_dir
+
+    shutil.rmtree(temp_dir)
+
+@pytest.fixture
+def db_session():
+    # In-memory SQLite DB
+    engine = create_engine("sqlite:///:memory:")
+    Base.metadata.create_all(engine)
+    Session = sessionmaker(bind=engine)
+    session = Session()
+    yield session
+    session.close()
+
+def test_daily_civic_intelligence_cycle(temp_dirs, db_session):
+    temp_dir, weights_file, snapshots_dir = temp_dirs
+
+    # Patch paths to use temp directory
+    with patch("backend.adaptive_weights.DATA_FILE", weights_file), \
+         patch("backend.civic_intelligence.SNAPSHOT_DIR", snapshots_dir), \
+         patch("backend.civic_intelligence.SessionLocal", return_value=db_session):
-    with patch("backend.adaptive_weights.DATA_FILE", weights_file), \
-         patch("backend.civic_intelligence.SNAPSHOT_DIR", snapshots_dir), \
-         patch("backend.civic_intelligence.SessionLocal", return_value=db_session):
+def test_daily_civic_intelligence_cycle(temp_dirs, db_session):
+    _, weights_file, snapshots_dir = temp_dirs
+    test_weights = AdaptiveWeights()
+
+    with patch("backend.adaptive_weights.DATA_FILE", weights_file), \
+         patch("backend.civic_intelligence.SNAPSHOT_DIR", snapshots_dir), \
+         patch("backend.civic_intelligence.SessionLocal", return_value=db_session), \
+         patch("backend.civic_intelligence.adaptive_weights", test_weights):
+        test_weights._weights = None  # Force reload from patched DATA_FILE
+        test_weights._last_loaded = 0
+        test_weights._load_weights()
+
+        assert test_weights.get_category_multipliers()["pothole"] == 1.0
+        assert test_weights.get_duplicate_search_radius() == 50.0
+
+        # Simulate weight update
+        test_weights._last_loaded = 0  # Force reload
+        test_weights._load_weights()
+
+        new_pothole_weight = test_weights.get_category_multipliers()["pothole"]
+
+        new_radius = test_weights.get_duplicate_search_radius()
-    with patch("backend.adaptive_weights.DATA_FILE", weights_file), \
-         patch("backend.civic_intelligence.SNAPSHOT_DIR", snapshots_dir), \
-         patch("backend.civic_intelligence.SessionLocal", return_value=db_session):
+def test_daily_civic_intelligence_cycle(temp_dirs, db_session):
+    _, weights_file, snapshots_dir = temp_dirs
+    test_weights = AdaptiveWeights()
+
+    with patch("backend.adaptive_weights.DATA_FILE", weights_file), \
+         patch("backend.civic_intelligence.SNAPSHOT_DIR", snapshots_dir), \
+         patch("backend.civic_intelligence.SessionLocal", return_value=db_session), \
+         patch("backend.civic_intelligence.adaptive_weights", test_weights):
+        test_weights._weights = None  # Force reload from patched DATA_FILE
+        test_weights._last_loaded = 0
+        test_weights._load_weights()
+
+        assert test_weights.get_category_multipliers()["pothole"] == 1.0
+        assert test_weights.get_duplicate_search_radius() == 50.0
+
+        # Simulate weight update
+        test_weights._last_loaded = 0  # Force reload
+        test_weights._load_weights()
+
+        new_pothole_weight = test_weights.get_category_multipliers()["pothole"]
+
+        new_radius = test_weights.get_duplicate_search_radius()
+
+        # Reload adaptive weights to pick up temp file
+        weights_system = AdaptiveWeights()
+        weights_system._weights = None # Force reload
+        weights_system._load_weights()
+
+        assert weights_system.get_category_multipliers()["pothole"] == 1.0
+        assert weights_system.get_duplicate_search_radius() == 50.0
+
+        # --- 1. Setup Data ---
+        now = datetime.now(timezone.utc)
+
+        # A. Create scattered issues to trigger "many issues, no clusters" -> increase radius
+        # Create 51 issues far apart so they don't form clusters (TrendAnalyzer requires >=3 items per cluster)
+        # But total volume > 50 triggers radius increase if no clusters found.
+        for i in range(51):
+            issue = Issue(
+                description=f"Scattered issue {i}",
+                category="pothole",
+                latitude=18.5204 + (i * 0.005), # ~500m apart
+                longitude=73.8567 + (i * 0.005),
+                created_at=now - timedelta(hours=2)
+            )
+            db_session.add(issue)
+
+        # B. Create manual severity upgrades (to trigger weight increase)
+        # We need Grievances linked to EscalationAudits
+        # Create a Jurisdiction first
+        jurisdiction = Jurisdiction(
+             level=JurisdictionLevel.LOCAL,
+             geographic_coverage={"city": "Pune"},
+             responsible_authority="PMC",
+             default_sla_hours=48
+        )
+        db_session.add(jurisdiction)
+        db_session.flush()
+
+        for i in range(4): # 4 upgrades > 3 threshold
+            grievance = Grievance(
+                category="pothole",
+                severity=SeverityLevel.LOW,
+                current_jurisdiction_id=jurisdiction.id,
+                assigned_authority="PMC",
+                sla_deadline=now + timedelta(days=2),
+                status=GrievanceStatus.OPEN
+            )
+            db_session.add(grievance)
+            db_session.flush()
+
+            audit = EscalationAudit(
+                grievance_id=grievance.id,
+                previous_authority="Bot",
+                new_authority="Admin",
+                reason=EscalationReason.SEVERITY_UPGRADE,
+                timestamp=now - timedelta(hours=5)
+            )
+            db_session.add(audit)
+
+        db_session.commit()
+
+        # --- 2. Run Cycle ---
+        engine = CivicIntelligenceEngine()
+        engine.run_daily_cycle()
+
+        # --- 3. Verify Results ---
+
+        # A. Check Snapshot creation
+        snapshot_files = os.listdir(snapshots_dir)
+        assert len(snapshot_files) == 1, "Snapshot file was not created"
+
+        with open(os.path.join(snapshots_dir, snapshot_files[0]), 'r') as f:
+            snapshot = json.load(f)
+
+        # Verify Index Data
+        assert snapshot["civic_index"]["new_issues_count"] >= 51
+
+        # Verify Weight Updates Logged
+        assert "pothole" in snapshot["weight_updates"]
+        assert snapshot["weight_updates"]["pothole"] == 4
+
+        # B. Check Weight Updates Persistence
+        # Force reload to verify persistence
+        weights_system._last_loaded = 0 # Force reload
+        weights_system._load_weights()
+
+        # Severity weight for "pothole" should increase
+        new_pothole_weight = weights_system.get_category_multipliers()["pothole"]
+        assert new_pothole_weight > 1.0, f"Pothole weight should increase from 1.0, got {new_pothole_weight}"
+
+        # Radius should increase because of clustering (> 5 clusters)
-        # Radius should increase because of clustering (> 5 clusters)
+        # Radius should increase due to high volume of issues with no clusters (cluster_count == 0 and len(issues_24h) > 50)
-        # Radius should increase because of clustering (> 5 clusters)
+        # Radius should increase due to high volume of issues with no clusters (cluster_count == 0 and len(issues_24h) > 50)
+        new_radius = weights_system.get_duplicate_search_radius()
+        assert new_radius > 50.0, f"Radius should have increased from 50.0, got {new_radius}"