-
Notifications
You must be signed in to change notification settings - Fork 35
Implement Daily Civic Intelligence Refinement Engine #476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,62 +1,71 @@ | ||
| # Daily Civic Intelligence Refinement Engine | ||
| # 🧠 Daily Civic Intelligence Refinement Engine | ||
|
|
||
| ## Overview | ||
| VishwaGuru's Civic Intelligence Engine is a self-improving AI system that runs daily at midnight to analyze civic issues, detect trends, and optimize the system's severity scoring logic based on real-world patterns. | ||
| The Civic Intelligence Engine is a self-improving AI infrastructure that runs daily at midnight (UTC) to analyze civic issues, detect trends, and optimize system parameters automatically. | ||
|
|
||
| ## Architecture | ||
| ## 🚀 Overview | ||
|
|
||
| The engine is composed of the following modules: | ||
| 1. **TrendAnalyzer (`backend/trend_analyzer.py`):** Extracts top keywords and identifies geographic clusters using DBSCAN. | ||
| 2. **AdaptiveWeights (`backend/adaptive_weights.py`):** Manages dynamic severity scoring weights stored in `backend/data/modelWeights.json`. | ||
| 3. **CivicIntelligenceEngine (`backend/civic_intelligence.py`):** The orchestrator that runs the daily cycle. | ||
| Every day at 00:00, the system: | ||
| 1. **Analyzes** all civic issues submitted in the last 24 hours. | ||
| 2. **Detects** new patterns, trending topics, and geographic clusters. | ||
| 3. **Refines** severity scoring weights based on manual overrides (adaptive learning). | ||
| 4. **Optimizes** duplicate detection thresholds based on clustering density. | ||
| 5. **Generates** a "Civic Intelligence Index" score. | ||
| 6. **Archives** a daily snapshot for transparency and auditability. | ||
|
|
||
| ## Daily Cycle Algorithm | ||
| --- | ||
|
|
||
| Every day at 00:00 UTC, the system performs the following steps: | ||
| ## ⚙️ Core Components | ||
|
|
||
| ### 1. Trend Detection | ||
| * Analyzes all issues submitted in the last 24 hours. | ||
| * **Keyword Extraction:** Identifies top 5 most common keywords (excluding stop words). | ||
| * **Category Spikes:** Compares current category volume with the previous day's snapshot. A category is flagged as a "spike" if: | ||
| * Volume > 5 | ||
| * Increase > 50% compared to yesterday. | ||
| * **Geographic Clustering:** Uses DBSCAN (Density-Based Spatial Clustering of Applications with Noise) to find clusters of issues (e.g., multiple reports of the same pothole). | ||
| ### 1. Trend Detection (`backend/trend_analyzer.py`) | ||
| * **Keyword Extraction**: Identifies top 5 most common keywords (excluding stop words) from issue descriptions. | ||
| * **Category Spikes**: Detects categories with a >50% increase in volume compared to the previous day. | ||
| * **Geographic Clustering**: Uses DBSCAN (Density-Based Spatial Clustering of Applications with Noise) to identify hotspots where multiple issues are reported in close proximity. | ||
|
|
||
| ### 2. Adaptive Weight Optimization | ||
| The system learns from manual interventions: | ||
| * **Input:** Queries `EscalationAudit` logs for "Severity Upgrades" (manual overrides where an admin increased the severity). | ||
| * **Logic:** If a category receives ≥ 3 manual upgrades in 24 hours, its severity multiplier is increased by 10% (x1.1). | ||
| * **Goal:** To automatically classify similar future issues as higher severity, reducing the need for manual intervention. | ||
| ### 2. Adaptive Weight Optimization (`backend/adaptive_weights.py`) | ||
| The system learns from human actions to improve its automated severity scoring. | ||
|
|
||
| * **Logic**: If administrators manually upgrade the severity of issues in a specific category (e.g., changing "pothole" from Low to Critical) more than 3 times in a day, the system infers that its default weight for that category is too low. | ||
| * **Action**: The category multiplier in `modelWeights.json` is automatically increased by 10%. | ||
| * **Constraint**: Weights are clamped between 0.5x and 3.0x to prevent runaway values. | ||
|
|
||
| ### 3. Duplicate Pattern Learning | ||
| * **Input:** Geographic clustering density. | ||
| * **Logic:** | ||
| * If many clusters (>5) are found: Increase duplicate search radius (x1.05) to better group reports. | ||
| * If volume is high (>50) but no clusters: Increase radius (x1.05) as the current radius might be too strict. | ||
| * If volume is low (<10) and radius is large: Decay radius (x0.95) to improve precision. | ||
| The system adjusts the radius used for spatial deduplication based on the density of issues. | ||
|
|
||
| ### 4. Civic Intelligence Index | ||
| A daily score (0-100) reflecting the city's civic health. | ||
| * **Base Score:** 70 | ||
| * **Bonus:** +2.0 per resolved issue. | ||
| * **Penalty:** -0.5 per new issue. | ||
| * **Output:** Includes "Top Emerging Concern" and "Highest Severity Region". | ||
| * **Logic**: | ||
| * **High Density** (> 5 clusters): Increases search radius by 5% to better group related issues. | ||
| * **High Volume, No Clusters**: Increases search radius to catch potential duplicates that are slightly further apart. | ||
| * **Low Volume**: Decays radius slightly (by 5%) to improve precision. | ||
| * **Implementation**: Updates `duplicate_search_radius` in `modelWeights.json`. This value is consumed by the `create_issue` endpoint for real-time deduplication. | ||
|
|
||
| ## Data Storage & Auditability | ||
| ### 4. Civic Intelligence Index (`backend/civic_intelligence.py`) | ||
| A daily score (0-100) reflecting the civic health and system responsiveness. | ||
|
|
||
| ### Model Weights | ||
| * Dynamic weights are stored in `backend/data/modelWeights.json`. | ||
| * This file is hot-reloaded by the application without restart. | ||
| * **Formula**: `Base (70) + (Resolved Issues * 2) - (New Issues * 0.5)` | ||
| * **Insights**: | ||
| * **Top Emerging Concern**: The category with the highest volume. | ||
| * **Highest Severity Region**: The geographic center of the largest issue cluster. | ||
|
|
||
| --- | ||
|
|
||
| ## 📁 Data & Transparency | ||
|
|
||
| ### Daily Snapshots | ||
| * Stored in `backend/data/dailySnapshots/YYYY-MM-DD.json`. | ||
| * Contains: | ||
| * `civic_index`: The calculated score and metrics. | ||
| * `trends`: Keywords, distribution, clusters, and detected spikes. | ||
| * `weight_changes`: A detailed audit log of what weights were changed, the old value, the new value, and the reason. | ||
| * `model_weights`: A copy of the full weight configuration at the time of the snapshot for full reproducibility. | ||
|
|
||
| ## Evolution Logic | ||
| The system evolves by: | ||
| 1. **Self-Correction:** If admins constantly upgrade "Pothole" severity, the system learns that "Pothole" is more critical than initially configured. | ||
| 2. **Dynamic Sensitivity:** The duplicate detection radius "breathes" (expands/contracts) based on the density of reports, adapting to urban density changes or event-driven spikes. | ||
| Snapshots are stored in `backend/data/dailySnapshots/YYYY-MM-DD.json`. | ||
| They contain: | ||
| * `trends`: Top keywords, category distribution, clusters. | ||
| * `civic_index`: The daily score and insights. | ||
| * `weight_changes`: Audit log of any automatic weight adjustments. | ||
| * `model_weights`: The state of weights at that time. | ||
|
|
||
| ### Model Weights | ||
| Dynamic configuration is stored in `backend/data/modelWeights.json`. | ||
| * `category_multipliers`: Dynamic severity weights. | ||
| * `duplicate_search_radius`: Dynamic search radius in meters. | ||
|
|
||
| --- | ||
|
|
||
| ## 🛠️ Architecture | ||
|
|
||
| * **Scheduler**: A lightweight `asyncio` loop in `backend/scheduler.py` triggers the job. | ||
| * **Execution**: `CivicIntelligenceEngine.run_daily_cycle` runs in a separate thread to avoid blocking the main event loop. | ||
| * **Persistence**: All state changes are persisted to JSON files, ensuring "Local First" architecture with no external API dependencies for core logic. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,4 +19,5 @@ SpeechRecognition | |
| pydub | ||
| googletrans==4.0.2 | ||
| langdetect | ||
| indic-nlp-library | ||
| scikit-learn | ||
| numpy<2.0.0 | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -167,14 +167,14 @@ def cluster_issues_dbscan(issues: List[Issue], eps_meters: float = 30.0) -> List | |
| [issue.latitude, issue.longitude] for issue in valid_issues | ||
| ]) | ||
|
|
||
| # Convert eps from meters to degrees (approximate) | ||
| # 1 degree latitude ≈ 111,000 meters | ||
| # 1 degree longitude ≈ 111,000 * cos(latitude) meters | ||
| eps_degrees = eps_meters / 111000 # Rough approximation | ||
| # Convert eps from meters to radians | ||
| # Haversine metric expects inputs in radians and eps in radians | ||
| R = 6371000.0 # Earth's radius in meters | ||
| eps_radians = eps_meters / R | ||
|
Comment on lines
+170
to
+173
|
||
|
|
||
| # Perform DBSCAN clustering | ||
| try: | ||
| db = DBSCAN(eps=eps_degrees, min_samples=1, metric='haversine').fit( | ||
| db = DBSCAN(eps=eps_radians, min_samples=1, metric='haversine').fit( | ||
| np.radians(coordinates) | ||
| ) | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,152 @@ | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| import os | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| import json | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| import pytest | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| import shutil | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| import tempfile | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| from datetime import datetime, timedelta, timezone | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| from sqlalchemy import create_engine | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| from sqlalchemy.orm import sessionmaker | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| from unittest.mock import patch | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| from unittest.mock import patch | |
| from unittest.mock import patch, MagicMock |
Copilot
AI
Feb 25, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SessionLocal mock is incorrectly configured. SessionLocal is called as a factory function (SessionLocal()) in civic_intelligence.py line 45, so the mock needs to be callable and return the session. Import MagicMock from unittest.mock, then use: mock_session_local = MagicMock(return_value=db_session), patch("backend.civic_intelligence.SessionLocal", mock_session_local). Currently, the test passes db_session directly as return_value to patch, which will fail when SessionLocal() is called because db_session is a Session object, not a callable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isolate the adaptive-weights singleton to avoid cross-test state leakage.
Line [58]-Line [60] patches DATA_FILE, but CivicIntelligenceEngine uses a module-level adaptive_weights object with cached state. That can make this test order-dependent if cached weights survive into later tests.
🧪 Suggested isolation patch
@@
def test_daily_civic_intelligence_cycle(temp_dirs, db_session):
- temp_dir, weights_file, snapshots_dir = temp_dirs
+ _, weights_file, snapshots_dir = temp_dirs
+ test_weights = AdaptiveWeights()
@@
- with patch("backend.adaptive_weights.DATA_FILE", weights_file), \
- patch("backend.civic_intelligence.SNAPSHOT_DIR", snapshots_dir), \
- patch("backend.civic_intelligence.SessionLocal", return_value=db_session):
+ with patch("backend.adaptive_weights.DATA_FILE", weights_file), \
+ patch("backend.civic_intelligence.SNAPSHOT_DIR", snapshots_dir), \
+ patch("backend.civic_intelligence.SessionLocal", return_value=db_session), \
+ patch("backend.civic_intelligence.adaptive_weights", test_weights):
@@
- weights_system = AdaptiveWeights()
- weights_system._weights = None # Force reload
- weights_system._load_weights()
+ test_weights._weights = None # Force reload from patched DATA_FILE
+ test_weights._last_loaded = 0
+ test_weights._load_weights()
@@
- assert weights_system.get_category_multipliers()["pothole"] == 1.0
- assert weights_system.get_duplicate_search_radius() == 50.0
+ assert test_weights.get_category_multipliers()["pothole"] == 1.0
+ assert test_weights.get_duplicate_search_radius() == 50.0
@@
- weights_system._last_loaded = 0 # Force reload
- weights_system._load_weights()
+ test_weights._last_loaded = 0 # Force reload
+ test_weights._load_weights()
@@
- new_pothole_weight = weights_system.get_category_multipliers()["pothole"]
+ new_pothole_weight = test_weights.get_category_multipliers()["pothole"]
@@
- new_radius = weights_system.get_duplicate_search_radius()
+ new_radius = test_weights.get_duplicate_search_radius()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| with patch("backend.adaptive_weights.DATA_FILE", weights_file), \ | |
| patch("backend.civic_intelligence.SNAPSHOT_DIR", snapshots_dir), \ | |
| patch("backend.civic_intelligence.SessionLocal", return_value=db_session): | |
| def test_daily_civic_intelligence_cycle(temp_dirs, db_session): | |
| _, weights_file, snapshots_dir = temp_dirs | |
| test_weights = AdaptiveWeights() | |
| with patch("backend.adaptive_weights.DATA_FILE", weights_file), \ | |
| patch("backend.civic_intelligence.SNAPSHOT_DIR", snapshots_dir), \ | |
| patch("backend.civic_intelligence.SessionLocal", return_value=db_session), \ | |
| patch("backend.civic_intelligence.adaptive_weights", test_weights): | |
| test_weights._weights = None # Force reload from patched DATA_FILE | |
| test_weights._last_loaded = 0 | |
| test_weights._load_weights() | |
| assert test_weights.get_category_multipliers()["pothole"] == 1.0 | |
| assert test_weights.get_duplicate_search_radius() == 50.0 | |
| # Simulate weight update | |
| test_weights._last_loaded = 0 # Force reload | |
| test_weights._load_weights() | |
| new_pothole_weight = test_weights.get_category_multipliers()["pothole"] | |
| new_radius = test_weights.get_duplicate_search_radius() |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/tests/test_civic_intelligence_system.py` around lines 58 - 60, The
test is leaking module-level adaptive_weights state used by
CivicIntelligenceEngine; instead of only patching
backend.adaptive_weights.DATA_FILE, replace or reset the module-level
adaptive_weights object so cached weights can't persist across tests—patch
backend.civic_intelligence.adaptive_weights to a fresh AdaptiveWeights instance
(or call its reset/clear method) before constructing CivicIntelligenceEngine,
ensuring DATA_FILE is still set to weights_file and SessionLocal/snapshot
patches remain in place.
Copilot
AI
Feb 25, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment incorrectly states "Radius should increase because of clustering (> 5 clusters)" but the test setup creates scattered issues specifically to avoid forming clusters (line 73-74 comments). The radius increase is actually triggered by the "high volume, no clusters" condition (cluster_count == 0 and len(issues_24h) > 50) from civic_intelligence.py line 134-136. The comment should be corrected to reflect the actual test scenario.
| # Radius should increase because of clustering (> 5 clusters) | |
| # Radius should increase due to high volume of issues with no clusters (cluster_count == 0 and len(issues_24h) > 50) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Snapshot field name is inconsistent with tested payload.
Line [57] documents
weight_changes, butbackend/tests/test_civic_intelligence_system.py(Line [138]-Line [139]) validatesweight_updates. Please align the doc with the actual snapshot schema (or vice versa).📝 Doc fix
📝 Committable suggestion
🤖 Prompt for AI Agents