Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 57 additions & 48 deletions CIVIC_INTELLIGENCE.md
Original file line number Diff line number Diff line change
@@ -1,62 +1,71 @@
# Daily Civic Intelligence Refinement Engine
# 🧠 Daily Civic Intelligence Refinement Engine

## Overview
VishwaGuru's Civic Intelligence Engine is a self-improving AI system that runs daily at midnight to analyze civic issues, detect trends, and optimize the system's severity scoring logic based on real-world patterns.
The Civic Intelligence Engine is a self-improving AI infrastructure that runs daily at midnight (UTC) to analyze civic issues, detect trends, and optimize system parameters automatically.

## Architecture
## 🚀 Overview

The engine is composed of the following modules:
1. **TrendAnalyzer (`backend/trend_analyzer.py`):** Extracts top keywords and identifies geographic clusters using DBSCAN.
2. **AdaptiveWeights (`backend/adaptive_weights.py`):** Manages dynamic severity scoring weights stored in `backend/data/modelWeights.json`.
3. **CivicIntelligenceEngine (`backend/civic_intelligence.py`):** The orchestrator that runs the daily cycle.
Every day at 00:00, the system:
1. **Analyzes** all civic issues submitted in the last 24 hours.
2. **Detects** new patterns, trending topics, and geographic clusters.
3. **Refines** severity scoring weights based on manual overrides (adaptive learning).
4. **Optimizes** duplicate detection thresholds based on clustering density.
5. **Generates** a "Civic Intelligence Index" score.
6. **Archives** a daily snapshot for transparency and auditability.

## Daily Cycle Algorithm
---

Every day at 00:00 UTC, the system performs the following steps:
## ⚙️ Core Components

### 1. Trend Detection
* Analyzes all issues submitted in the last 24 hours.
* **Keyword Extraction:** Identifies top 5 most common keywords (excluding stop words).
* **Category Spikes:** Compares current category volume with the previous day's snapshot. A category is flagged as a "spike" if:
* Volume > 5
* Increase > 50% compared to yesterday.
* **Geographic Clustering:** Uses DBSCAN (Density-Based Spatial Clustering of Applications with Noise) to find clusters of issues (e.g., multiple reports of the same pothole).
### 1. Trend Detection (`backend/trend_analyzer.py`)
* **Keyword Extraction**: Identifies top 5 most common keywords (excluding stop words) from issue descriptions.
* **Category Spikes**: Detects categories with a >50% increase in volume compared to the previous day.
* **Geographic Clustering**: Uses DBSCAN (Density-Based Spatial Clustering of Applications with Noise) to identify hotspots where multiple issues are reported in close proximity.

### 2. Adaptive Weight Optimization
The system learns from manual interventions:
* **Input:** Queries `EscalationAudit` logs for "Severity Upgrades" (manual overrides where an admin increased the severity).
* **Logic:** If a category receives ≥ 3 manual upgrades in 24 hours, its severity multiplier is increased by 10% (x1.1).
* **Goal:** To automatically classify similar future issues as higher severity, reducing the need for manual intervention.
### 2. Adaptive Weight Optimization (`backend/adaptive_weights.py`)
The system learns from human actions to improve its automated severity scoring.

* **Logic**: If administrators manually upgrade the severity of issues in a specific category (e.g., changing "pothole" from Low to Critical) more than 3 times in a day, the system infers that its default weight for that category is too low.
* **Action**: The category multiplier in `modelWeights.json` is automatically increased by 10%.
* **Constraint**: Weights are clamped between 0.5x and 3.0x to prevent runaway values.

### 3. Duplicate Pattern Learning
* **Input:** Geographic clustering density.
* **Logic:**
* If many clusters (>5) are found: Increase duplicate search radius (x1.05) to better group reports.
* If volume is high (>50) but no clusters: Increase radius (x1.05) as the current radius might be too strict.
* If volume is low (<10) and radius is large: Decay radius (x0.95) to improve precision.
The system adjusts the radius used for spatial deduplication based on the density of issues.

### 4. Civic Intelligence Index
A daily score (0-100) reflecting the city's civic health.
* **Base Score:** 70
* **Bonus:** +2.0 per resolved issue.
* **Penalty:** -0.5 per new issue.
* **Output:** Includes "Top Emerging Concern" and "Highest Severity Region".
* **Logic**:
* **High Density** (> 5 clusters): Increases search radius by 5% to better group related issues.
* **High Volume, No Clusters**: Increases search radius to catch potential duplicates that are slightly further apart.
* **Low Volume**: Decays radius slightly (by 5%) to improve precision.
* **Implementation**: Updates `duplicate_search_radius` in `modelWeights.json`. This value is consumed by the `create_issue` endpoint for real-time deduplication.

## Data Storage & Auditability
### 4. Civic Intelligence Index (`backend/civic_intelligence.py`)
A daily score (0-100) reflecting the civic health and system responsiveness.

### Model Weights
* Dynamic weights are stored in `backend/data/modelWeights.json`.
* This file is hot-reloaded by the application without restart.
* **Formula**: `Base (70) + (Resolved Issues * 2) - (New Issues * 0.5)`
* **Insights**:
* **Top Emerging Concern**: The category with the highest volume.
* **Highest Severity Region**: The geographic center of the largest issue cluster.

---

## 📁 Data & Transparency

### Daily Snapshots
* Stored in `backend/data/dailySnapshots/YYYY-MM-DD.json`.
* Contains:
* `civic_index`: The calculated score and metrics.
* `trends`: Keywords, distribution, clusters, and detected spikes.
* `weight_changes`: A detailed audit log of what weights were changed, the old value, the new value, and the reason.
* `model_weights`: A copy of the full weight configuration at the time of the snapshot for full reproducibility.

## Evolution Logic
The system evolves by:
1. **Self-Correction:** If admins constantly upgrade "Pothole" severity, the system learns that "Pothole" is more critical than initially configured.
2. **Dynamic Sensitivity:** The duplicate detection radius "breathes" (expands/contracts) based on the density of reports, adapting to urban density changes or event-driven spikes.
Snapshots are stored in `backend/data/dailySnapshots/YYYY-MM-DD.json`.
They contain:
* `trends`: Top keywords, category distribution, clusters.
* `civic_index`: The daily score and insights.
* `weight_changes`: Audit log of any automatic weight adjustments.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Snapshot field name is inconsistent with tested payload.

Line [57] documents weight_changes, but backend/tests/test_civic_intelligence_system.py (Line [138]-Line [139]) validates weight_updates. Please align the doc with the actual snapshot schema (or vice versa).

📝 Doc fix
-*   `weight_changes`: Audit log of any automatic weight adjustments.
+*   `weight_updates`: Audit log of any automatic weight adjustments.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* `weight_changes`: Audit log of any automatic weight adjustments.
* `weight_updates`: Audit log of any automatic weight adjustments.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CIVIC_INTELLIGENCE.md` at line 57, The docs list the snapshot field as
`weight_changes` but the tests assert `weight_updates`; update the snapshot
schema so both match by renaming the field in CIVIC_INTELLIGENCE.md from
`weight_changes` to `weight_updates` (or alternatively change the test to expect
`weight_changes`), ensuring the field name in the README and the schema/test
(`weight_updates` vs `weight_changes`) are identical across the codebase.

* `model_weights`: The state of weights at that time.

### Model Weights
Dynamic configuration is stored in `backend/data/modelWeights.json`.
* `category_multipliers`: Dynamic severity weights.
* `duplicate_search_radius`: Dynamic search radius in meters.

---

## 🛠️ Architecture

* **Scheduler**: A lightweight `asyncio` loop in `backend/scheduler.py` triggers the job.
* **Execution**: `CivicIntelligenceEngine.run_daily_cycle` runs in a separate thread to avoid blocking the main event loop.
* **Persistence**: All state changes are persisted to JSON files, ensuring "Local First" architecture with no external API dependencies for core logic.
3 changes: 2 additions & 1 deletion backend/requirements-render.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,5 @@ SpeechRecognition
pydub
googletrans==4.0.2
langdetect
indic-nlp-library
scikit-learn
numpy<2.0.0
10 changes: 7 additions & 3 deletions backend/routers/issues.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
send_status_notification
)
from backend.spatial_utils import get_bounding_box, find_nearby_issues
from backend.adaptive_weights import adaptive_weights
from backend.cache import recent_issues_cache, nearby_issues_cache
from backend.hf_api_service import verify_resolution_vqa
from backend.dependencies import get_http_client
Expand Down Expand Up @@ -93,9 +94,12 @@ async def create_issue(

if latitude is not None and longitude is not None:
try:
# Find existing open issues within 50 meters
# Get dynamic search radius from adaptive weights
search_radius = adaptive_weights.get_duplicate_search_radius()

# Find existing open issues within search_radius (default 50m)
# Optimization: Use bounding box to filter candidates in SQL
min_lat, max_lat, min_lon, max_lon = get_bounding_box(latitude, longitude, 50.0)
min_lat, max_lat, min_lon, max_lon = get_bounding_box(latitude, longitude, search_radius)

# Performance Boost: Use column projection to avoid loading full model instances
open_issues = await run_in_threadpool(
Expand All @@ -118,7 +122,7 @@ async def create_issue(
)

nearby_issues_with_distance = find_nearby_issues(
open_issues, latitude, longitude, radius_meters=50.0
open_issues, latitude, longitude, radius_meters=search_radius
)

if nearby_issues_with_distance:
Expand Down
10 changes: 5 additions & 5 deletions backend/spatial_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -167,14 +167,14 @@ def cluster_issues_dbscan(issues: List[Issue], eps_meters: float = 30.0) -> List
[issue.latitude, issue.longitude] for issue in valid_issues
])

# Convert eps from meters to degrees (approximate)
# 1 degree latitude ≈ 111,000 meters
# 1 degree longitude ≈ 111,000 * cos(latitude) meters
eps_degrees = eps_meters / 111000 # Rough approximation
# Convert eps from meters to radians
# Haversine metric expects inputs in radians and eps in radians
R = 6371000.0 # Earth's radius in meters
eps_radians = eps_meters / R
Comment on lines +170 to +173
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DBSCAN radius conversion fix from degrees to radians lacks a direct unit test. While test_civic_intelligence_system.py implicitly tests clustering behavior, consider adding a unit test in test_spatial_utils.py that verifies cluster_issues_dbscan correctly clusters issues within the specified eps_meters distance. This would ensure the mathematical conversion is correct and prevent regression.

Copilot uses AI. Check for mistakes.

# Perform DBSCAN clustering
try:
db = DBSCAN(eps=eps_degrees, min_samples=1, metric='haversine').fit(
db = DBSCAN(eps=eps_radians, min_samples=1, metric='haversine').fit(
np.radians(coordinates)
)

Expand Down
152 changes: 152 additions & 0 deletions backend/tests/test_civic_intelligence_system.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
import os
import json
import pytest
import shutil
import tempfile
from datetime import datetime, timedelta, timezone
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from unittest.mock import patch
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing MagicMock import needed to fix the SessionLocal mocking pattern on line 60. Add MagicMock to the import statement: from unittest.mock import patch, MagicMock

Suggested change
from unittest.mock import patch
from unittest.mock import patch, MagicMock

Copilot uses AI. Check for mistakes.

from backend.database import Base
from backend.models import Issue, EscalationAudit, EscalationReason, Grievance, SeverityLevel, JurisdictionLevel, GrievanceStatus, Jurisdiction
from backend.civic_intelligence import CivicIntelligenceEngine
from backend.adaptive_weights import AdaptiveWeights

# Test Data
MOCK_WEIGHTS = {
"category_multipliers": {
"pothole": 1.0,
"garbage": 1.0
},
"duplicate_search_radius": 50.0,
"severity_keywords": {},
"urgency_patterns": [],
"category_keywords": {}
}

@pytest.fixture
def temp_dirs():
# Create temp directories for snapshots and weights
temp_dir = tempfile.mkdtemp()
weights_file = os.path.join(temp_dir, "modelWeights.json")
snapshots_dir = os.path.join(temp_dir, "dailySnapshots")
os.makedirs(snapshots_dir)

# Initialize weights
with open(weights_file, 'w') as f:
json.dump(MOCK_WEIGHTS, f)

yield temp_dir, weights_file, snapshots_dir

shutil.rmtree(temp_dir)

@pytest.fixture
def db_session():
# In-memory SQLite DB
engine = create_engine("sqlite:///:memory:")
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()
yield session
session.close()

def test_daily_civic_intelligence_cycle(temp_dirs, db_session):
temp_dir, weights_file, snapshots_dir = temp_dirs

# Patch paths to use temp directory
with patch("backend.adaptive_weights.DATA_FILE", weights_file), \
patch("backend.civic_intelligence.SNAPSHOT_DIR", snapshots_dir), \
patch("backend.civic_intelligence.SessionLocal", return_value=db_session):
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SessionLocal mock is incorrectly configured. SessionLocal is called as a factory function (SessionLocal()) in civic_intelligence.py line 45, so the mock needs to be callable and return the session. Import MagicMock from unittest.mock, then use: mock_session_local = MagicMock(return_value=db_session), patch("backend.civic_intelligence.SessionLocal", mock_session_local). Currently, the test passes db_session directly as return_value to patch, which will fail when SessionLocal() is called because db_session is a Session object, not a callable.

Copilot uses AI. Check for mistakes.
Comment on lines +58 to +60
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Isolate the adaptive-weights singleton to avoid cross-test state leakage.

Line [58]-Line [60] patches DATA_FILE, but CivicIntelligenceEngine uses a module-level adaptive_weights object with cached state. That can make this test order-dependent if cached weights survive into later tests.

🧪 Suggested isolation patch
@@
 def test_daily_civic_intelligence_cycle(temp_dirs, db_session):
-    temp_dir, weights_file, snapshots_dir = temp_dirs
+    _, weights_file, snapshots_dir = temp_dirs
+    test_weights = AdaptiveWeights()
@@
-    with patch("backend.adaptive_weights.DATA_FILE", weights_file), \
-         patch("backend.civic_intelligence.SNAPSHOT_DIR", snapshots_dir), \
-         patch("backend.civic_intelligence.SessionLocal", return_value=db_session):
+    with patch("backend.adaptive_weights.DATA_FILE", weights_file), \
+         patch("backend.civic_intelligence.SNAPSHOT_DIR", snapshots_dir), \
+         patch("backend.civic_intelligence.SessionLocal", return_value=db_session), \
+         patch("backend.civic_intelligence.adaptive_weights", test_weights):
@@
-        weights_system = AdaptiveWeights()
-        weights_system._weights = None # Force reload
-        weights_system._load_weights()
+        test_weights._weights = None  # Force reload from patched DATA_FILE
+        test_weights._last_loaded = 0
+        test_weights._load_weights()
@@
-        assert weights_system.get_category_multipliers()["pothole"] == 1.0
-        assert weights_system.get_duplicate_search_radius() == 50.0
+        assert test_weights.get_category_multipliers()["pothole"] == 1.0
+        assert test_weights.get_duplicate_search_radius() == 50.0
@@
-        weights_system._last_loaded = 0 # Force reload
-        weights_system._load_weights()
+        test_weights._last_loaded = 0  # Force reload
+        test_weights._load_weights()
@@
-        new_pothole_weight = weights_system.get_category_multipliers()["pothole"]
+        new_pothole_weight = test_weights.get_category_multipliers()["pothole"]
@@
-        new_radius = weights_system.get_duplicate_search_radius()
+        new_radius = test_weights.get_duplicate_search_radius()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
with patch("backend.adaptive_weights.DATA_FILE", weights_file), \
patch("backend.civic_intelligence.SNAPSHOT_DIR", snapshots_dir), \
patch("backend.civic_intelligence.SessionLocal", return_value=db_session):
def test_daily_civic_intelligence_cycle(temp_dirs, db_session):
_, weights_file, snapshots_dir = temp_dirs
test_weights = AdaptiveWeights()
with patch("backend.adaptive_weights.DATA_FILE", weights_file), \
patch("backend.civic_intelligence.SNAPSHOT_DIR", snapshots_dir), \
patch("backend.civic_intelligence.SessionLocal", return_value=db_session), \
patch("backend.civic_intelligence.adaptive_weights", test_weights):
test_weights._weights = None # Force reload from patched DATA_FILE
test_weights._last_loaded = 0
test_weights._load_weights()
assert test_weights.get_category_multipliers()["pothole"] == 1.0
assert test_weights.get_duplicate_search_radius() == 50.0
# Simulate weight update
test_weights._last_loaded = 0 # Force reload
test_weights._load_weights()
new_pothole_weight = test_weights.get_category_multipliers()["pothole"]
new_radius = test_weights.get_duplicate_search_radius()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tests/test_civic_intelligence_system.py` around lines 58 - 60, The
test is leaking module-level adaptive_weights state used by
CivicIntelligenceEngine; instead of only patching
backend.adaptive_weights.DATA_FILE, replace or reset the module-level
adaptive_weights object so cached weights can't persist across tests—patch
backend.civic_intelligence.adaptive_weights to a fresh AdaptiveWeights instance
(or call its reset/clear method) before constructing CivicIntelligenceEngine,
ensuring DATA_FILE is still set to weights_file and SessionLocal/snapshot
patches remain in place.


# Reload adaptive weights to pick up temp file
weights_system = AdaptiveWeights()
weights_system._weights = None # Force reload
weights_system._load_weights()

assert weights_system.get_category_multipliers()["pothole"] == 1.0
assert weights_system.get_duplicate_search_radius() == 50.0

# --- 1. Setup Data ---
now = datetime.now(timezone.utc)

# A. Create scattered issues to trigger "many issues, no clusters" -> increase radius
# Create 51 issues far apart so they don't form clusters (TrendAnalyzer requires >=3 items per cluster)
# But total volume > 50 triggers radius increase if no clusters found.
for i in range(51):
issue = Issue(
description=f"Scattered issue {i}",
category="pothole",
latitude=18.5204 + (i * 0.005), # ~500m apart
longitude=73.8567 + (i * 0.005),
created_at=now - timedelta(hours=2)
)
db_session.add(issue)

# B. Create manual severity upgrades (to trigger weight increase)
# We need Grievances linked to EscalationAudits
# Create a Jurisdiction first
jurisdiction = Jurisdiction(
level=JurisdictionLevel.LOCAL,
geographic_coverage={"city": "Pune"},
responsible_authority="PMC",
default_sla_hours=48
)
db_session.add(jurisdiction)
db_session.flush()

for i in range(4): # 4 upgrades > 3 threshold
grievance = Grievance(
category="pothole",
severity=SeverityLevel.LOW,
current_jurisdiction_id=jurisdiction.id,
assigned_authority="PMC",
sla_deadline=now + timedelta(days=2),
status=GrievanceStatus.OPEN
)
db_session.add(grievance)
db_session.flush()

audit = EscalationAudit(
grievance_id=grievance.id,
previous_authority="Bot",
new_authority="Admin",
reason=EscalationReason.SEVERITY_UPGRADE,
timestamp=now - timedelta(hours=5)
)
db_session.add(audit)

db_session.commit()

# --- 2. Run Cycle ---
engine = CivicIntelligenceEngine()
engine.run_daily_cycle()

# --- 3. Verify Results ---

# A. Check Snapshot creation
snapshot_files = os.listdir(snapshots_dir)
assert len(snapshot_files) == 1, "Snapshot file was not created"

with open(os.path.join(snapshots_dir, snapshot_files[0]), 'r') as f:
snapshot = json.load(f)

# Verify Index Data
assert snapshot["civic_index"]["new_issues_count"] >= 51

# Verify Weight Updates Logged
assert "pothole" in snapshot["weight_updates"]
assert snapshot["weight_updates"]["pothole"] == 4

# B. Check Weight Updates Persistence
# Force reload to verify persistence
weights_system._last_loaded = 0 # Force reload
weights_system._load_weights()

# Severity weight for "pothole" should increase
new_pothole_weight = weights_system.get_category_multipliers()["pothole"]
assert new_pothole_weight > 1.0, f"Pothole weight should increase from 1.0, got {new_pothole_weight}"

# Radius should increase because of clustering (> 5 clusters)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment incorrectly states "Radius should increase because of clustering (> 5 clusters)" but the test setup creates scattered issues specifically to avoid forming clusters (line 73-74 comments). The radius increase is actually triggered by the "high volume, no clusters" condition (cluster_count == 0 and len(issues_24h) > 50) from civic_intelligence.py line 134-136. The comment should be corrected to reflect the actual test scenario.

Suggested change
# Radius should increase because of clustering (> 5 clusters)
# Radius should increase due to high volume of issues with no clusters (cluster_count == 0 and len(issues_24h) > 50)

Copilot uses AI. Check for mistakes.
new_radius = weights_system.get_duplicate_search_radius()
assert new_radius > 50.0, f"Radius should have increased from 50.0, got {new_radius}"