You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey @CasJam, thank you for putting this out! It can become an excellent framework for professionals and organizations!
I'm already thinking about the contributions I could make to this repo and noticing a few PRs already rolling in from other devs.
Upon reviewing the pending PRs, I've noticed that they include significant adjustments and improvements tailored to the specific needs.
Some even extend the framework with custom scripts to build dashboards, etc.
That's all awesome, but I can't stop thinking about the potential for this to get out of hand pretty quickly. What do you think about this, and how will you approach reviewing and accepting or rejecting these PRs going forward?
You have a neat, minimal base that is working well. I'm pleasantly surprised that you found the balance between overbloating the context and still keeping the agent sane. It's able to follow the rules pretty consistently.
How can we avoid overcomplicating things and still test and evaluate the results while adding new features?
Have you considered implementing something like a JSON Schema validation (with versioning) for instruction files? Or adding structured logging to track agent decision points and create a regression test suite with golden master testing?
There are a lot of interesting paths we can take up to defining a domain-specific language :)
The key is to move from "hope the agent follows instructions" to "guarantee the agent follows instructions through systematic validation and formal methods."
Of course, this is a big step, and maybe you don't want to expand on this entirely, and just leave it as is, giving people the way to create their forks, which is fine too!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hey @CasJam, thank you for putting this out! It can become an excellent framework for professionals and organizations!
I'm already thinking about the contributions I could make to this repo and noticing a few PRs already rolling in from other devs.
Upon reviewing the pending PRs, I've noticed that they include significant adjustments and improvements tailored to the specific needs.
Some even extend the framework with custom scripts to build dashboards, etc.
That's all awesome, but I can't stop thinking about the potential for this to get out of hand pretty quickly. What do you think about this, and how will you approach reviewing and accepting or rejecting these PRs going forward?
You have a neat, minimal base that is working well. I'm pleasantly surprised that you found the balance between overbloating the context and still keeping the agent sane. It's able to follow the rules pretty consistently.
How can we avoid overcomplicating things and still test and evaluate the results while adding new features?
Have you considered implementing something like a JSON Schema validation (with versioning) for instruction files? Or adding structured logging to track agent decision points and create a regression test suite with golden master testing?
There are a lot of interesting paths we can take up to defining a domain-specific language :)
The key is to move from "hope the agent follows instructions" to "guarantee the agent follows instructions through systematic validation and formal methods."
Of course, this is a big step, and maybe you don't want to expand on this entirely, and just leave it as is, giving people the way to create their forks, which is fine too!
Beta Was this translation helpful? Give feedback.
All reactions