On testability, evaluation and enahncements #80

pasevin · 2025-08-10T09:59:54Z

pasevin
Aug 10, 2025

Hey @CasJam, thank you for putting this out! It can become an excellent framework for professionals and organizations!

I'm already thinking about the contributions I could make to this repo and noticing a few PRs already rolling in from other devs.
Upon reviewing the pending PRs, I've noticed that they include significant adjustments and improvements tailored to the specific needs.

Some even extend the framework with custom scripts to build dashboards, etc.

That's all awesome, but I can't stop thinking about the potential for this to get out of hand pretty quickly. What do you think about this, and how will you approach reviewing and accepting or rejecting these PRs going forward?

You have a neat, minimal base that is working well. I'm pleasantly surprised that you found the balance between overbloating the context and still keeping the agent sane. It's able to follow the rules pretty consistently.

How can we avoid overcomplicating things and still test and evaluate the results while adding new features?

Have you considered implementing something like a JSON Schema validation (with versioning) for instruction files? Or adding structured logging to track agent decision points and create a regression test suite with golden master testing?

There are a lot of interesting paths we can take up to defining a domain-specific language :)

The key is to move from "hope the agent follows instructions" to "guarantee the agent follows instructions through systematic validation and formal methods."

Of course, this is a big step, and maybe you don't want to expand on this entirely, and just leave it as is, giving people the way to create their forks, which is fine too!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

On testability, evaluation and enahncements #80

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

On testability, evaluation and enahncements #80

Uh oh!

pasevin Aug 10, 2025

Replies: 0 comments

pasevin
Aug 10, 2025