Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
56c191f
feat(assessment): Implement assessment evaluation orchestration service
vprashrex Mar 31, 2026
bce0944
chore(migrations): renumber assessment migration to revision 55
vprashrex Apr 27, 2026
f2e512e
Refactor assessment evaluation handling to use dedicated AssessmentRu…
vprashrex Apr 27, 2026
0c04860
feat(assessment): Enhance image handling with MIME type detection and…
vprashrex Apr 27, 2026
cd946ba
refactor: Remove unused feature flag imports from models
vprashrex Apr 27, 2026
360edee
refactor: Clean up logger statements and improve code formatting in a…
vprashrex Apr 27, 2026
e0af9d6
refactor: Remove unused feature flag dependency from assessment routes
vprashrex Apr 27, 2026
593eea3
refactor(tests): Update cron job tests to use AsyncMock for asynchron…
vprashrex Apr 27, 2026
147ff6c
feat(assessment): Add new API documentation and enhance assessment fu…
vprashrex Apr 28, 2026
3e44365
Merge branch 'main' into feature/assessment
vprashrex Apr 28, 2026
b0efcac
refactor: Organize imports across multiple files for improved readabi…
vprashrex Apr 28, 2026
928cad3
test: Add unit tests for assessment utilities, mappers, parsing, proc…
vprashrex Apr 29, 2026
2f6cdb8
feat: Enhance assessment processing and export functionality with imp…
vprashrex Apr 29, 2026
bc55665
feat: Refactor assessment processing and improve callback payload han…
vprashrex Apr 29, 2026
77235a4
feat: Enhance dataset handling by rejecting legacy Excel format (.xls…
vprashrex Apr 29, 2026
148b619
feat: Add tests for Excel dataset handling, including parsing and err…
vprashrex Apr 29, 2026
f2d7c24
Add comprehensive tests for assessment functionality
vprashrex Apr 29, 2026
bc07542
refactor: Simplify patch statements in assessment tests for improved …
vprashrex Apr 29, 2026
535a65a
refactor: Remove assessment event broadcasting and related code from …
vprashrex Apr 30, 2026
91c011e
Refactor assessment tests and routes for improved structure and clarity
vprashrex May 3, 2026
a54e105
refactor: Clean up comments and remove unused imports in assessment p…
vprashrex May 3, 2026
fc2180c
refactor: Update map_kaapi_to_openai_params to accept session and kaa…
vprashrex May 3, 2026
ebc6ead
Refactor config and document CRUD operations to enforce tagging rules
vprashrex May 4, 2026
4a4b9c6
refactor: Remove tag handling from document CRUD operations and relat…
vprashrex May 4, 2026
7c82585
refactor: Update type hints and reorganize imports in documents.py fo…
vprashrex May 4, 2026
54f856d
refactor: Reorganize imports and clean up exception handling in docum…
vprashrex May 4, 2026
e011850
refactor: Update tag parameter formatting in config routes for consis…
vprashrex May 4, 2026
6086a0b
Merge branch 'main' into feature/assessment
vprashrex May 4, 2026
50e5c09
refactor: Remove commented-out sections and reorganize import stateme…
vprashrex May 5, 2026
fbcc36d
refactor: Standardize tag parameter usage across config and version e…
vprashrex May 5, 2026
4757c05
refactor: Simplify description formatting for ConfigUpdate tag field
vprashrex May 5, 2026
81bac63
feat: Enhance assessment CRUD operations with error handling and type…
vprashrex May 5, 2026
6feb82a
refactor: Improve code readability by formatting long lines in assess…
vprashrex May 5, 2026
7fbf109
refactor: Change status field type from Literal to str for flexibilit…
vprashrex May 5, 2026
f9e40a4
fix: Import _load_dataset_rows in _load_dataset_rows_for_run for prop…
vprashrex May 5, 2026
9a83031
refactor: Rename mapping functions for clarity and update related usages
vprashrex May 5, 2026
7d94a2f
Merge branch 'main' into feature/assessment
vprashrex May 6, 2026
96404ea
Merge branch 'main' into feature/assessment
AkhileshNegi May 6, 2026
48fdba4
refactor: simplify error message in create_assessment_dataset function
vprashrex May 6, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
227 changes: 227 additions & 0 deletions backend/app/alembic/versions/055_add_assessment_manager_table.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,227 @@
"""add assessment and assessment_run tables

Revision ID: 055
Revises: 054
Create Date: 2026-03-26 23:30:00.000000

"""

import sqlalchemy as sa
import sqlmodel.sql.sqltypes
from alembic import op
from sqlalchemy.dialects import postgresql

# revision identifiers, used by Alembic.
revision = "055"
down_revision = "054"
branch_labels = None
depends_on = None


def upgrade():
op.create_table(
"assessment",
sa.Column(
"id",
sa.Integer(),
nullable=False,
comment="Unique identifier for the assessment",
),
sa.Column(
"experiment_name",
sqlmodel.sql.sqltypes.AutoString(),
nullable=False,
comment="Name of the experiment grouping its config runs",
),
sa.Column(
"dataset_id",
sa.Integer(),
nullable=False,
comment="Reference to the evaluation dataset",
),
sa.Column(
"status",
sqlmodel.sql.sqltypes.AutoString(),
nullable=False,
server_default="pending",
comment=(
"Aggregate status: pending, processing, completed, "
"completed_with_errors, failed"
),
),
sa.Column(
"organization_id",
sa.Integer(),
nullable=False,
comment="Reference to the organization",
),
sa.Column(
"project_id",
sa.Integer(),
nullable=False,
comment="Reference to the project",
),
sa.Column(
"inserted_at",
sa.DateTime(),
nullable=False,
comment="Timestamp when the assessment was created",
),
sa.Column(
"updated_at",
sa.DateTime(),
nullable=False,
comment="Timestamp when the assessment was last updated",
),
sa.ForeignKeyConstraint(
["dataset_id"],
["evaluation_dataset.id"],
name="fk_assessment_dataset_id",
ondelete="CASCADE",
),
sa.ForeignKeyConstraint(
["organization_id"],
["organization.id"],
name="fk_assessment_organization_id",
ondelete="CASCADE",
),
sa.ForeignKeyConstraint(
["project_id"],
["project.id"],
Comment on lines +83 to +90
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name is missing as in line:79 above

name="fk_assessment_project_id",
ondelete="CASCADE",
),
sa.PrimaryKeyConstraint("id"),
)
op.create_index(
op.f("ix_assessment_experiment_name"),
"assessment",
["experiment_name"],
unique=False,
)
op.create_index(
"idx_assessment_org_project",
"assessment",
["organization_id", "project_id", "inserted_at"],
unique=False,
)
op.create_index(
"idx_assessment_status",
"assessment",
["status"],
unique=False,
)

op.create_table(
"assessment_run",
sa.Column(
"id",
sa.Integer(),
nullable=False,
comment="Unique identifier for the assessment run",
),
sa.Column(
"assessment_id",
sa.Integer(),
nullable=False,
comment="Reference to the parent assessment",
),
sa.Column(
"config_id",
sa.Uuid(),
nullable=False,
comment="Reference to the stored config used",
),
sa.Column(
"config_version",
sa.Integer(),
nullable=False,
comment="Version of the config used",
),
sa.Column(
"status",
sqlmodel.sql.sqltypes.AutoString(),
nullable=False,
server_default="pending",
comment="Run status: pending, processing, completed, failed",
),
sa.Column(
"batch_job_id",
sa.Integer(),
nullable=True,
comment="Reference to the batch job processing this run",
),
sa.Column(
"total_items",
sa.Integer(),
nullable=False,
server_default="0",
comment="Total number of dataset items in this run",
),
sa.Column(
"input",
postgresql.JSONB(astext_type=sa.Text()),
nullable=False,
comment=(
"Assessment input: prompt_template, text_columns, attachments, "
"output_schema"
),
),
sa.Column(
"object_store_url",
sqlmodel.sql.sqltypes.AutoString(),
nullable=True,
comment="S3 URL of processed batch results",
),
sa.Column(
"error_message",
sa.Text(),
nullable=True,
comment="Error message if the run failed",
),
sa.Column(
"inserted_at",
sa.DateTime(),
nullable=False,
comment="Timestamp when the run was created",
),
sa.Column(
"updated_at",
sa.DateTime(),
nullable=False,
comment="Timestamp when the run was last updated",
),
sa.ForeignKeyConstraint(
["assessment_id"],
["assessment.id"],
name="fk_assessment_run_assessment_id",
ondelete="CASCADE",
),
sa.ForeignKeyConstraint(
["config_id"],
["config.id"],
name="fk_assessment_run_config_id",
),
sa.ForeignKeyConstraint(
["batch_job_id"],
["batch_job.id"],
name="fk_assessment_run_batch_job_id",
ondelete="SET NULL",
),
sa.PrimaryKeyConstraint("id"),
)
op.create_index(
"idx_assessment_run_assessment_id",
"assessment_run",
["assessment_id"],
unique=False,
)


def downgrade():
op.drop_index("idx_assessment_run_assessment_id", table_name="assessment_run")
op.drop_table("assessment_run")
op.drop_index("idx_assessment_status", table_name="assessment")
op.drop_index("idx_assessment_org_project", table_name="assessment")
op.drop_index(op.f("ix_assessment_experiment_name"), table_name="assessment")
op.drop_table("assessment")
82 changes: 82 additions & 0 deletions backend/app/alembic/versions/056_add_config_tag.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
"""add tag column to config table

Revision ID: 056
Revises: 055
Comment on lines +1 to +4
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need two migrations?

Create Date: 2026-05-03 12:00:00.000000

"""

import sqlalchemy as sa
from alembic import op
from sqlalchemy.dialects import postgresql

# revision identifiers, used by Alembic.
revision = "056"
down_revision = "055"
branch_labels = None
depends_on = None


CONFIG_TAG_VALUES = ("default", "ASSESSMENT")
DEFAULT_TAG_SERVER_DEFAULT = sa.text("'default'::config_tag")


def upgrade():
config_tag = postgresql.ENUM(
*CONFIG_TAG_VALUES,
name="config_tag",
create_type=False,
)
config_tag.create(op.get_bind(), checkfirst=True)

with op.get_context().autocommit_block():
op.execute("ALTER TYPE config_tag ADD VALUE IF NOT EXISTS 'default'")
op.execute("ALTER TYPE config_tag ADD VALUE IF NOT EXISTS 'ASSESSMENT'")

op.add_column(
"config",
sa.Column(
"tag",
config_tag,
nullable=False,
server_default=DEFAULT_TAG_SERVER_DEFAULT,
comment=(
"Tag classifying the config: "
"'default' for general use, 'ASSESSMENT' for configs used in assessments. "
),
),
)

op.execute(
"""
UPDATE config
SET tag = 'ASSESSMENT'
FROM (
SELECT DISTINCT config_id
FROM assessment_run
) AS assessment_configs
WHERE config.id = assessment_configs.config_id
"""
)

with op.get_context().autocommit_block():
op.create_index(
"idx_config_project_id_tag_active",
"config",
["project_id", "tag", sa.text("updated_at DESC")],
unique=False,
postgresql_where=sa.text("deleted_at IS NULL"),
postgresql_concurrently=True,
)


def downgrade():
with op.get_context().autocommit_block():
op.drop_index(
"idx_config_project_id_tag_active",
table_name="config",
postgresql_concurrently=True,
)

op.drop_column("config", "tag")
sa.Enum(name="config_tag").drop(op.get_bind(), checkfirst=True)
7 changes: 7 additions & 0 deletions backend/app/api/docs/assessment/create_run.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Start an assessment across one or more stored config versions.

Creates an assessment and one child assessment run per config, then submits each
run to batch processing.

Optional `system_instruction` is forwarded into each generated provider request
as the system/developer instruction for that assessment run.
4 changes: 4 additions & 0 deletions backend/app/api/docs/assessment/delete_dataset.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Delete an assessment dataset.

This removes dataset metadata and associated storage references for the
given dataset in the current organization and project.
4 changes: 4 additions & 0 deletions backend/app/api/docs/assessment/export_assessment_results.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Export results for all child runs under an assessment.

For `json`, returns a flat list in the API response. For `csv`/`xlsx`,
returns one file for a single run or a ZIP archive when multiple runs exist.
3 changes: 3 additions & 0 deletions backend/app/api/docs/assessment/export_run_results.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Export results for a single assessment run.

Supports `json`, `csv`, and `xlsx` output formats.
3 changes: 3 additions & 0 deletions backend/app/api/docs/assessment/get_assessment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Get an assessment by ID.

Returns aggregate run counts and status metadata for the assessment.
3 changes: 3 additions & 0 deletions backend/app/api/docs/assessment/get_dataset.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Get a single assessment dataset by ID.

Optionally include a signed URL to download the original uploaded file.
3 changes: 3 additions & 0 deletions backend/app/api/docs/assessment/get_run.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Get a single assessment run by ID.

Returns run metadata, status, config reference, and assessment input payload.
3 changes: 3 additions & 0 deletions backend/app/api/docs/assessment/list_assessments.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
List assessments runs for the current organization/project.

Each record includes aggregate status counters across its child runs.
3 changes: 3 additions & 0 deletions backend/app/api/docs/assessment/list_datasets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
List assessment datasets for the current organization and project.

Supports pagination via `limit` and `offset`.
4 changes: 4 additions & 0 deletions backend/app/api/docs/assessment/list_runs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
List assessment runs for the current organization/project.

Optionally filter by `assessment_id` to list runs for a specific parent
assessment.
4 changes: 4 additions & 0 deletions backend/app/api/docs/assessment/retry_assessment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Retry an existing assessment.

Reuses the original dataset and config references from the selected
assessment and creates a fresh assessment with new child runs.
4 changes: 4 additions & 0 deletions backend/app/api/docs/assessment/retry_run.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Retry a single assessment run.

Creates a new assessment using the same dataset and config used by the
selected child run.
4 changes: 4 additions & 0 deletions backend/app/api/docs/assessment/upload_dataset.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Upload a CSV or Excel dataset for assessment workflows.

The file is stored in object storage and indexed as an assessment dataset
for the current organization and project.
4 changes: 4 additions & 0 deletions backend/app/api/docs/config/create_version.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ create a new version under the same configuration with an incremented version nu
Version numbers are automatically incremented sequentially (1, 2, 3, etc.)
and cannot be manually set or skipped.

When `tag` is omitted, this endpoint only resolves general configurations:
configs tagged `default`. Pass an explicit
tag such as `ASSESSMENT` for tagged config surfaces.

## Important
- This endpoint accepts partial updates using dict[str, Any] for config_blob.
- Only the fields that need to be updated should be provided.
Expand Down
4 changes: 4 additions & 0 deletions backend/app/api/docs/config/get_version.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
Retrieve a specific version of a configuration.

When `tag` is omitted, this endpoint only resolves versions for general
configurations: configs tagged `default`. Pass
an explicit tag such as `ASSESSMENT` for tagged config surfaces.

Returns the complete version details including the full configuration
blob (config_blob) with all LLM parameters.
Loading
Loading