diff --git a/CLUSTERING.md b/CLUSTERING.md
index f002e5ed..17c6356e 100644
--- a/CLUSTERING.md
+++ b/CLUSTERING.md
@@ -4,11 +4,12 @@
The clustering mechanism groups similar reports within each domain using unsupervised machine learning (SBERT embeddings and agglomerative clustering) and creates a bucket for each cluster.
-## One-time clustering of existing reports
+## Running Full Clustering
+Note that running full clustering **will delete existing clusters and cluster-based buckets** and recreate them from scratch. Generally we'll need to do it only once.
-This command clusters reports by similarity and creates buckets for existing reports and intended to be run once.
-Note that rerunning this command **will delete existing clusters and cluster-based buckets** and recreate them from scratch.
+There are two ways to run full clustering:
+### 1. Using the Command Line
```bash
# Cluster reports for a specific domain only
@@ -48,6 +49,73 @@ The command performs the following steps:
6. Clusters are saved to the database along with corresponding buckets. Each bucket receives a signature containing the domain and cluster ID for future report assignment.
+### 2. Using the UI
+
+It is also possible to initiate full clustering through the web interface at `/reportmanager/clustering/`:
+
+1. **Prerequisites for local development**:
+ ```bash
+ 1. Install and start Redis
+
+ For example, on MacOS:
+ brew install redis
+ brew services start redis
+
+ 2. Start the Celery worker to handle background tasks
+ uv run --extra server celery -A celeryconf worker --loglevel=info --concurrency=1 -Q celery,cron
+ ```
+
+2. **Running Full Clustering**:
+ - Navigate to the Clustering page
+ - Optionally specify a domain to cluster only reports from that domain
+ - Click "Run Clustering" to start the process
+ - The job runs in the background and you can monitor its progress in the job history table
+
+3. **Job Types**:
+ - **Full**: Re-clusters all reports from scratch (deletes existing clusters)
+ - **Incremental**: Automatically triages new unbucketed reports against existing clusters (runs hourly via Celery Beat)
+
+The web interface shows:
+- Job history with status, completion time, and number of buckets created
+- Real-time progress updates (polls every 10 seconds)
+- Error messages if a job fails
+
+## Incremental Triage of New Reports
+
+After the initial full clustering, new incoming reports need to be assigned to appropriate buckets. This is handled by the `triage_new_reports` command, which runs every hour.
+
+### How it works
+
+1. **Match to existing clusters**: For each unbucketed report, the system:
+ - Generates a semantic embedding for the report text
+ - For each cluster in report's domain:
+ - Compare the input to every member in that cluster
+ - Find the N most similar members
+ - Calculate the average of those similarity scores
+ - Assigns the report to the cluster with the highest average similarity, if that average exceeds the domain's threshold.
+2. **Cluster unmatched reports**: Reports that don't match any existing cluster are clustered among themselves:
+ - Groups similar unmatched reports into new clusters
+ - Creates new cluster-based buckets for these groups
+
+3. **Domain-based fallback**: Reports that still don't cluster are assigned to default domain-based buckets
+
+### Report Quality Criteria
+
+Similarly to full clustering, reports are only considered for clustering if they have:
+- Non-empty comment text
+- ML validity probability > 0.03 (not spam/invalid)
+
+Low-quality reports skip clustering and go directly to domain-based buckets.
+
+### Running Manually
+
+You can also run triage manually:
+```bash
+uv run -p 3.12 --extra=server server/manage.py triage_new_reports
+```
+
+Note: This command requires at least one successful full clustering run to have occurred first.
+
## Clustering algorithm details
### Semantic Embeddings
diff --git a/pyproject.toml b/pyproject.toml
index 4dd11a06..fc22d1a0 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -61,6 +61,7 @@ server = [
"djangorestframework==3.16.1",
"google-cloud-bigquery==3.40.0",
"pyyaml==6.0.3",
+ "redis[hiredis]>=5.0.0",
"scikit-learn>=1.3.0",
"sentence-transformers>=2.2.0",
"whitenoise==6.11.0",
diff --git a/server/frontend/src/api.js b/server/frontend/src/api.js
index ac07c60b..963d23be 100644
--- a/server/frontend/src/api.js
+++ b/server/frontend/src/api.js
@@ -12,6 +12,14 @@ export const retrieveBucket = async (id) =>
export const listBuckets = async (params) =>
(await mainAxios.get("/reportmanager/rest/buckets/", { params })).data;
+export const clusterReports = async (domain = null) =>
+ (await mainAxios.post("/reportmanager/rest/buckets/cluster/", { domain }))
+ .data;
+
+export const listClusteringJobs = async (params) =>
+ (await mainAxios.get("/reportmanager/rest/clustering-jobs/", { params }))
+ .data;
+
export const reportStats = async (params) =>
(await mainAxios.get("/reportmanager/rest/reports/stats/", { params })).data;
diff --git a/server/frontend/src/components/Clustering.vue b/server/frontend/src/components/Clustering.vue
new file mode 100644
index 00000000..e901fb1d
--- /dev/null
+++ b/server/frontend/src/components/Clustering.vue
@@ -0,0 +1,233 @@
+
+
+
+ Report Clustering
+
+
+
Cluster Similar Reports
+
+ This tool analyzes reports and groups similar ones into clusters,
+ creating buckets automatically based on similarity.
+
+
+
+
What happens when you run clustering?
+
+
Existing clusters and cluster-based buckets will be deleted
+
Reports are analyzed for similarity using ML
+
Similar reports are grouped into clusters
+
+ New buckets are created for each cluster with description format:
+ "domain [Cluster cluster_id]"
+
+
Reports are automatically assigned to their cluster buckets
+
+ This operation runs in the background and may take several minutes
+
+
+
+
+ Clustering is currently in
+ progress. Please wait until it completes before starting a new run.
+
+
+
+
+
+
+ Leave empty to cluster reports across all domains, or specify a domain
+ to cluster only reports from that domain.
+
+
+
+
+
+
+ You don't have permission to run clustering.
+
+
+
+ {{ successMessage }}
+
+
+
+ {{ errorMessage }}
+
+
+
+
+
History
+
+ Loading...
+
+
+
+ No clustering jobs have been run yet.
+
+
+
+
+
+
Type
+
Started At
+
Completed At
+
Status
+
Domain
+
Cluster-based buckets created
+
Error
+
+
+
+
+
+ Full
+ Incremental
+
+
{{ formatDate(job.started_at) }}
+
+ {{
+ formatDate(job.completed_at)
+ }}
+ In Progress
+