-
Notifications
You must be signed in to change notification settings - Fork 10
Assessment: AI Assessment Pipeline #788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
39 commits
Select commit
Hold shift + click to select a range
56c191f
feat(assessment): Implement assessment evaluation orchestration service
vprashrex bce0944
chore(migrations): renumber assessment migration to revision 55
vprashrex f2e512e
Refactor assessment evaluation handling to use dedicated AssessmentRu…
vprashrex 0c04860
feat(assessment): Enhance image handling with MIME type detection and…
vprashrex cd946ba
refactor: Remove unused feature flag imports from models
vprashrex 360edee
refactor: Clean up logger statements and improve code formatting in a…
vprashrex e0af9d6
refactor: Remove unused feature flag dependency from assessment routes
vprashrex 593eea3
refactor(tests): Update cron job tests to use AsyncMock for asynchron…
vprashrex 147ff6c
feat(assessment): Add new API documentation and enhance assessment fu…
vprashrex 3e44365
Merge branch 'main' into feature/assessment
vprashrex b0efcac
refactor: Organize imports across multiple files for improved readabi…
vprashrex 928cad3
test: Add unit tests for assessment utilities, mappers, parsing, proc…
vprashrex 2f6cdb8
feat: Enhance assessment processing and export functionality with imp…
vprashrex bc55665
feat: Refactor assessment processing and improve callback payload han…
vprashrex 77235a4
feat: Enhance dataset handling by rejecting legacy Excel format (.xls…
vprashrex 148b619
feat: Add tests for Excel dataset handling, including parsing and err…
vprashrex f2d7c24
Add comprehensive tests for assessment functionality
vprashrex bc07542
refactor: Simplify patch statements in assessment tests for improved …
vprashrex 535a65a
refactor: Remove assessment event broadcasting and related code from …
vprashrex 91c011e
Refactor assessment tests and routes for improved structure and clarity
vprashrex a54e105
refactor: Clean up comments and remove unused imports in assessment p…
vprashrex fc2180c
refactor: Update map_kaapi_to_openai_params to accept session and kaa…
vprashrex ebc6ead
Refactor config and document CRUD operations to enforce tagging rules
vprashrex 4a4b9c6
refactor: Remove tag handling from document CRUD operations and relat…
vprashrex 7c82585
refactor: Update type hints and reorganize imports in documents.py fo…
vprashrex 54f856d
refactor: Reorganize imports and clean up exception handling in docum…
vprashrex e011850
refactor: Update tag parameter formatting in config routes for consis…
vprashrex 6086a0b
Merge branch 'main' into feature/assessment
vprashrex 50e5c09
refactor: Remove commented-out sections and reorganize import stateme…
vprashrex fbcc36d
refactor: Standardize tag parameter usage across config and version e…
vprashrex 4757c05
refactor: Simplify description formatting for ConfigUpdate tag field
vprashrex 81bac63
feat: Enhance assessment CRUD operations with error handling and type…
vprashrex 6feb82a
refactor: Improve code readability by formatting long lines in assess…
vprashrex 7fbf109
refactor: Change status field type from Literal to str for flexibilit…
vprashrex f9e40a4
fix: Import _load_dataset_rows in _load_dataset_rows_for_run for prop…
vprashrex 9a83031
refactor: Rename mapping functions for clarity and update related usages
vprashrex 7d94a2f
Merge branch 'main' into feature/assessment
vprashrex 96404ea
Merge branch 'main' into feature/assessment
AkhileshNegi 48fdba4
refactor: simplify error message in create_assessment_dataset function
vprashrex File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
227 changes: 227 additions & 0 deletions
227
backend/app/alembic/versions/055_add_assessment_manager_table.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,227 @@ | ||
| """add assessment and assessment_run tables | ||
|
|
||
| Revision ID: 055 | ||
| Revises: 054 | ||
| Create Date: 2026-03-26 23:30:00.000000 | ||
|
|
||
| """ | ||
|
|
||
| import sqlalchemy as sa | ||
| import sqlmodel.sql.sqltypes | ||
| from alembic import op | ||
| from sqlalchemy.dialects import postgresql | ||
|
|
||
| # revision identifiers, used by Alembic. | ||
| revision = "055" | ||
| down_revision = "054" | ||
| branch_labels = None | ||
| depends_on = None | ||
|
|
||
|
|
||
| def upgrade(): | ||
| op.create_table( | ||
| "assessment", | ||
| sa.Column( | ||
| "id", | ||
| sa.Integer(), | ||
| nullable=False, | ||
| comment="Unique identifier for the assessment", | ||
| ), | ||
| sa.Column( | ||
| "experiment_name", | ||
| sqlmodel.sql.sqltypes.AutoString(), | ||
| nullable=False, | ||
| comment="Name of the experiment grouping its config runs", | ||
| ), | ||
| sa.Column( | ||
| "dataset_id", | ||
| sa.Integer(), | ||
| nullable=False, | ||
| comment="Reference to the evaluation dataset", | ||
| ), | ||
| sa.Column( | ||
| "status", | ||
| sqlmodel.sql.sqltypes.AutoString(), | ||
| nullable=False, | ||
| server_default="pending", | ||
| comment=( | ||
| "Aggregate status: pending, processing, completed, " | ||
| "completed_with_errors, failed" | ||
| ), | ||
| ), | ||
| sa.Column( | ||
| "organization_id", | ||
| sa.Integer(), | ||
| nullable=False, | ||
| comment="Reference to the organization", | ||
| ), | ||
| sa.Column( | ||
| "project_id", | ||
| sa.Integer(), | ||
| nullable=False, | ||
| comment="Reference to the project", | ||
| ), | ||
| sa.Column( | ||
| "inserted_at", | ||
| sa.DateTime(), | ||
| nullable=False, | ||
| comment="Timestamp when the assessment was created", | ||
| ), | ||
| sa.Column( | ||
| "updated_at", | ||
| sa.DateTime(), | ||
| nullable=False, | ||
| comment="Timestamp when the assessment was last updated", | ||
| ), | ||
| sa.ForeignKeyConstraint( | ||
| ["dataset_id"], | ||
| ["evaluation_dataset.id"], | ||
| name="fk_assessment_dataset_id", | ||
| ondelete="CASCADE", | ||
| ), | ||
| sa.ForeignKeyConstraint( | ||
| ["organization_id"], | ||
| ["organization.id"], | ||
| name="fk_assessment_organization_id", | ||
| ondelete="CASCADE", | ||
| ), | ||
| sa.ForeignKeyConstraint( | ||
| ["project_id"], | ||
| ["project.id"], | ||
| name="fk_assessment_project_id", | ||
| ondelete="CASCADE", | ||
| ), | ||
| sa.PrimaryKeyConstraint("id"), | ||
| ) | ||
| op.create_index( | ||
| op.f("ix_assessment_experiment_name"), | ||
| "assessment", | ||
| ["experiment_name"], | ||
| unique=False, | ||
| ) | ||
| op.create_index( | ||
| "idx_assessment_org_project", | ||
| "assessment", | ||
| ["organization_id", "project_id", "inserted_at"], | ||
| unique=False, | ||
| ) | ||
| op.create_index( | ||
| "idx_assessment_status", | ||
| "assessment", | ||
| ["status"], | ||
| unique=False, | ||
| ) | ||
|
|
||
| op.create_table( | ||
| "assessment_run", | ||
| sa.Column( | ||
| "id", | ||
| sa.Integer(), | ||
| nullable=False, | ||
| comment="Unique identifier for the assessment run", | ||
| ), | ||
| sa.Column( | ||
| "assessment_id", | ||
| sa.Integer(), | ||
| nullable=False, | ||
| comment="Reference to the parent assessment", | ||
| ), | ||
| sa.Column( | ||
| "config_id", | ||
| sa.Uuid(), | ||
| nullable=False, | ||
| comment="Reference to the stored config used", | ||
| ), | ||
| sa.Column( | ||
| "config_version", | ||
| sa.Integer(), | ||
| nullable=False, | ||
| comment="Version of the config used", | ||
| ), | ||
| sa.Column( | ||
| "status", | ||
| sqlmodel.sql.sqltypes.AutoString(), | ||
| nullable=False, | ||
| server_default="pending", | ||
| comment="Run status: pending, processing, completed, failed", | ||
| ), | ||
| sa.Column( | ||
| "batch_job_id", | ||
| sa.Integer(), | ||
| nullable=True, | ||
| comment="Reference to the batch job processing this run", | ||
| ), | ||
| sa.Column( | ||
| "total_items", | ||
| sa.Integer(), | ||
| nullable=False, | ||
| server_default="0", | ||
| comment="Total number of dataset items in this run", | ||
| ), | ||
| sa.Column( | ||
| "input", | ||
| postgresql.JSONB(astext_type=sa.Text()), | ||
| nullable=False, | ||
| comment=( | ||
| "Assessment input: prompt_template, text_columns, attachments, " | ||
| "output_schema" | ||
| ), | ||
| ), | ||
| sa.Column( | ||
| "object_store_url", | ||
| sqlmodel.sql.sqltypes.AutoString(), | ||
| nullable=True, | ||
| comment="S3 URL of processed batch results", | ||
| ), | ||
| sa.Column( | ||
| "error_message", | ||
| sa.Text(), | ||
| nullable=True, | ||
| comment="Error message if the run failed", | ||
| ), | ||
| sa.Column( | ||
| "inserted_at", | ||
| sa.DateTime(), | ||
| nullable=False, | ||
| comment="Timestamp when the run was created", | ||
| ), | ||
| sa.Column( | ||
| "updated_at", | ||
| sa.DateTime(), | ||
| nullable=False, | ||
| comment="Timestamp when the run was last updated", | ||
| ), | ||
| sa.ForeignKeyConstraint( | ||
| ["assessment_id"], | ||
| ["assessment.id"], | ||
| name="fk_assessment_run_assessment_id", | ||
| ondelete="CASCADE", | ||
| ), | ||
| sa.ForeignKeyConstraint( | ||
| ["config_id"], | ||
| ["config.id"], | ||
| name="fk_assessment_run_config_id", | ||
| ), | ||
| sa.ForeignKeyConstraint( | ||
| ["batch_job_id"], | ||
| ["batch_job.id"], | ||
| name="fk_assessment_run_batch_job_id", | ||
| ondelete="SET NULL", | ||
| ), | ||
| sa.PrimaryKeyConstraint("id"), | ||
| ) | ||
| op.create_index( | ||
| "idx_assessment_run_assessment_id", | ||
| "assessment_run", | ||
| ["assessment_id"], | ||
| unique=False, | ||
| ) | ||
|
|
||
|
|
||
| def downgrade(): | ||
| op.drop_index("idx_assessment_run_assessment_id", table_name="assessment_run") | ||
| op.drop_table("assessment_run") | ||
| op.drop_index("idx_assessment_status", table_name="assessment") | ||
| op.drop_index("idx_assessment_org_project", table_name="assessment") | ||
| op.drop_index(op.f("ix_assessment_experiment_name"), table_name="assessment") | ||
| op.drop_table("assessment") | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,82 @@ | ||
| """add tag column to config table | ||
|
|
||
| Revision ID: 056 | ||
| Revises: 055 | ||
|
Comment on lines
+1
to
+4
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we need two migrations? |
||
| Create Date: 2026-05-03 12:00:00.000000 | ||
|
|
||
| """ | ||
|
|
||
| import sqlalchemy as sa | ||
| from alembic import op | ||
| from sqlalchemy.dialects import postgresql | ||
|
|
||
| # revision identifiers, used by Alembic. | ||
| revision = "056" | ||
| down_revision = "055" | ||
| branch_labels = None | ||
| depends_on = None | ||
|
|
||
|
|
||
| CONFIG_TAG_VALUES = ("default", "ASSESSMENT") | ||
| DEFAULT_TAG_SERVER_DEFAULT = sa.text("'default'::config_tag") | ||
|
|
||
|
|
||
| def upgrade(): | ||
| config_tag = postgresql.ENUM( | ||
| *CONFIG_TAG_VALUES, | ||
| name="config_tag", | ||
| create_type=False, | ||
| ) | ||
| config_tag.create(op.get_bind(), checkfirst=True) | ||
|
|
||
| with op.get_context().autocommit_block(): | ||
| op.execute("ALTER TYPE config_tag ADD VALUE IF NOT EXISTS 'default'") | ||
| op.execute("ALTER TYPE config_tag ADD VALUE IF NOT EXISTS 'ASSESSMENT'") | ||
|
|
||
| op.add_column( | ||
| "config", | ||
| sa.Column( | ||
| "tag", | ||
| config_tag, | ||
| nullable=False, | ||
| server_default=DEFAULT_TAG_SERVER_DEFAULT, | ||
| comment=( | ||
| "Tag classifying the config: " | ||
| "'default' for general use, 'ASSESSMENT' for configs used in assessments. " | ||
| ), | ||
| ), | ||
| ) | ||
|
|
||
| op.execute( | ||
| """ | ||
| UPDATE config | ||
| SET tag = 'ASSESSMENT' | ||
| FROM ( | ||
| SELECT DISTINCT config_id | ||
| FROM assessment_run | ||
| ) AS assessment_configs | ||
| WHERE config.id = assessment_configs.config_id | ||
| """ | ||
| ) | ||
|
|
||
| with op.get_context().autocommit_block(): | ||
| op.create_index( | ||
| "idx_config_project_id_tag_active", | ||
| "config", | ||
| ["project_id", "tag", sa.text("updated_at DESC")], | ||
| unique=False, | ||
| postgresql_where=sa.text("deleted_at IS NULL"), | ||
| postgresql_concurrently=True, | ||
| ) | ||
|
|
||
|
|
||
| def downgrade(): | ||
| with op.get_context().autocommit_block(): | ||
| op.drop_index( | ||
| "idx_config_project_id_tag_active", | ||
| table_name="config", | ||
| postgresql_concurrently=True, | ||
| ) | ||
|
|
||
| op.drop_column("config", "tag") | ||
| sa.Enum(name="config_tag").drop(op.get_bind(), checkfirst=True) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| Start an assessment across one or more stored config versions. | ||
|
|
||
| Creates an assessment and one child assessment run per config, then submits each | ||
| run to batch processing. | ||
|
|
||
| Optional `system_instruction` is forwarded into each generated provider request | ||
| as the system/developer instruction for that assessment run. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| Delete an assessment dataset. | ||
|
|
||
| This removes dataset metadata and associated storage references for the | ||
| given dataset in the current organization and project. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| Export results for all child runs under an assessment. | ||
|
|
||
| For `json`, returns a flat list in the API response. For `csv`/`xlsx`, | ||
| returns one file for a single run or a ZIP archive when multiple runs exist. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| Export results for a single assessment run. | ||
|
|
||
| Supports `json`, `csv`, and `xlsx` output formats. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| Get an assessment by ID. | ||
|
|
||
| Returns aggregate run counts and status metadata for the assessment. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| Get a single assessment dataset by ID. | ||
|
|
||
| Optionally include a signed URL to download the original uploaded file. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| Get a single assessment run by ID. | ||
|
|
||
| Returns run metadata, status, config reference, and assessment input payload. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| List assessments runs for the current organization/project. | ||
|
|
||
| Each record includes aggregate status counters across its child runs. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| List assessment datasets for the current organization and project. | ||
|
|
||
| Supports pagination via `limit` and `offset`. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| List assessment runs for the current organization/project. | ||
|
|
||
| Optionally filter by `assessment_id` to list runs for a specific parent | ||
| assessment. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| Retry an existing assessment. | ||
|
|
||
| Reuses the original dataset and config references from the selected | ||
| assessment and creates a fresh assessment with new child runs. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| Retry a single assessment run. | ||
|
|
||
| Creates a new assessment using the same dataset and config used by the | ||
| selected child run. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| Upload a CSV or Excel dataset for assessment workflows. | ||
|
|
||
| The file is stored in object storage and indexed as an assessment dataset | ||
| for the current organization and project. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,8 @@ | ||
| Retrieve a specific version of a configuration. | ||
|
|
||
| When `tag` is omitted, this endpoint only resolves versions for general | ||
| configurations: configs tagged `default`. Pass | ||
| an explicit tag such as `ASSESSMENT` for tagged config surfaces. | ||
|
|
||
| Returns the complete version details including the full configuration | ||
| blob (config_blob) with all LLM parameters. |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name is missing as in line:79 above