Skip to content

Jsonl based evaluation flow#300

Draft
yangm2 wants to merge 5 commits intocodeforpdx:mainfrom
yangm2:jsonl-based-evaluation-flow
Draft

Jsonl based evaluation flow#300
yangm2 wants to merge 5 commits intocodeforpdx:mainfrom
yangm2:jsonl-based-evaluation-flow

Conversation

@yangm2
Copy link
Contributor

@yangm2 yangm2 commented Mar 9, 2026

What type of PR is this? (check all applicable)

  • Refactor
  • Feature
  • Bug Fix
  • Optimization
  • Documentation Update
  • Infrastructure
  • Maintenance

Description

This PR starts to move away from the original csv file with the test scenarios and uses the JSONL format with a scenario-schema that we can validate.

The scenario-schema is important because we reference the underlying structure of the scenarios (e.g. facts, city) in the evaluation flow (e.g. instructions to the LLM-as-a-Judge).

This flow also makes it easier to leverage the LangSmith web interface to make changes to the dataset. This is important for the non-coder members of the team who are responsible for the legal content of the dataset.

Unrelated - put the PR Code Review criteria into CLAUDE.md.

Related Tickets & Documents

  • Related Issue #
  • Closes #

QA Instructions, Screenshots, Recordings

Please replace this line with instructions on how to test your changes, a note on the devices and browsers this has been tested on, as well as any relevant images for UI changes.

Added/updated tests?

  • Yes
  • No, and this is why: not yet
  • I need help with writing tests

Documentation

  • If this PR changes the system architecture, Architecture.md has been updated

[optional] Are there any post deployment tasks we need to perform?

yangm2 added 2 commits March 8, 2026 17:43
add a schema file to enforce scenario structure and validation flow
add CLI to work with LangSmith website
@yangm2 yangm2 self-assigned this Mar 9, 2026
@yangm2
Copy link
Contributor Author

yangm2 commented Mar 9, 2026

@TruMichael-jpg & @dan-moncada - just a head's up that this is something I'm working on to enable the LangSmith web interface flow. Checkout the updates to EVALUATION.md to get an idea about the direction the flow is going.

@leekahung - head's up that I'm moving the evaluation stuff into a separate directory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant