Agentsim is an open-source tool for turning vague software requests into structured delivery packages.
It is designed for solo software freelancers who need help with planning, building, reviewing, and handing off client work.
A user can give Agentsim a rough client request:
Build an inventory request system for a flower company
Agentsim turns that request into a project package containing planning documents, technical notes, a runnable app prototype, QA review, handoff notes, and trace files.
Agentsim focuses on durable project artifacts rather than chat transcripts.
Goal -> Plan -> Decisions -> Artifacts -> Review -> Final package
Agentsim is alpha, pre-1.0 software. The current repo is a CLI proof of concept for one narrow software-freelance workflow, not a general-purpose agent platform or production delivery system.
Current capability:
- run a local CLI workflow from a high-level client request
- generate planning, technical, review, client handoff, trace, and app prototype files
- run in deterministic mock mode without provider keys
- persist run, task, artifact, event, message, and approval state under each run
- inspect workflow graph, artifacts, context, tools, and a local read-only run viewer from the CLI
- import optional read-only repo context for local package generation
- inspect and resume local runs from the CLI after approvals are resolved
- run a deterministic mock eval suite across several freelance software prompts
- use a model-agnostic Chat Completions-compatible provider in live mode when configured
Current limitations:
- generated apps are simple prototypes and still need human review before client delivery
- the workflow is tuned for the first software-freelance demo, not arbitrary domains
- approvals, workspace execution, command execution, repair loops, and provider routing are still early
- the workflow graph is local-first; it is not a distributed scheduler
- generated-app install/build command checks are disabled by default and require explicit opt-in
- no hosted service, SaaS dashboard, or production deployment automation exists yet
Solo software freelancers often do the work of a small software team by themselves:
- clarifying vague client requests
- defining scope and risks
- choosing architecture
- building prototypes
- testing and reviewing delivery quality
- writing handoff notes
- preserving decisions for later changes
Large teams have process, specialists, review loops, and delivery discipline. Solo developers usually have scattered notes, a chat window, and a lot of project context they must keep in their head.
Agentsim provides a structured workflow for turning client requests into reviewed, handoff-ready software artifacts.
The first use case is freelance software delivery:
vague client request
-> AI-assisted workflow
-> scoped plan
-> reviewed runnable prototype
-> handoff-ready package
The generated package currently follows this shape:
outputs/{runId}/final-package/
+-- client/
| +-- project-summary.md
| +-- handoff-notes.md
| +-- user-guide.md
+-- planning/
| +-- requirements.md
| +-- scope.md
| +-- assumptions.md
| +-- timeline.md
| +-- risks.md
+-- technical/
| +-- architecture.md
| +-- task-breakdown.md
+-- app/
| +-- package.json
| +-- src/
| +-- README.md
+-- review/
| +-- qa-report.md
| +-- code-review.md
| +-- known-issues.md
+-- trace/
+-- events.jsonl
+-- agent-messages.json
+-- agent-actions.json
+-- decisions.json
+-- app-spec.json
+-- app-validation.json
+-- workflow-graph.json
+-- tool-registry.json
+-- artifact-lineage.json
The v0 demo is intentionally small. It validates the core workflow before the project expands.
For the flower-company inventory example, a useful run should produce artifacts such as:
planning/requirements.mddescribing request creation, list viewing, status updates, and admin needstechnical/architecture.mdexplaining the simple local app structure and data modelreview/qa-report.mdchecking whether the generated package matches the scoped requirementstrace/events.jsonlshowing the workflow events that led to the final packagetrace/agent-messages.jsonshowing structured task assignments, handoffs, and review requests between agentstrace/agent-actions.jsonshowing each role-owned action, input artifacts, output artifact, model/source, and statustrace/product-brief.jsonrecording the prompt-specific product brief used before DomainSpec/AppSpec derivationtrace/app-spec.jsonrecording the deterministic runnable app contract, renderer archetype, deferred features, unresolved questions, and acceptance scenarios used by the renderertrace/app-validation.jsonrecording structured generated-app validation checks, including command checks when enabledapp/test-report.mdsummarizing generated-app syntax, install, build, and local API smoke checks, or skipped execution when commands are not enabled
Agentsim is not trying to be a foundation model, a cloud IDE, or a generic agent marketplace.
The project is built around a few principles:
- Artifacts are more useful than chat transcripts.
- Review loops are safer than unchecked generation.
- Important decisions should be traceable.
- Domain-specific workflows are more useful than generic agent swarms.
- High-risk actions should require human approval.
- Model providers should be replaceable.
The goal is to make AI-assisted software delivery easier to inspect, review, and continue over time.
Agentsim currently has a CLI-based proof of concept:
- TypeScript CLI
- local filesystem workspace
- mock mode for deterministic no-key runs
- model-agnostic live provider mode through the Chat Completions-compatible
ModelProviderboundary - artifact-producing agent steps for intake, planning, architecture, implementation, QA, and delivery
- durable artifact generation
- artifact lineage, decision logs, approval records, and event traces
- AppSpec-driven runnable generated app package for prompt-specific single-entity
crud-workflow,booking-lite, andinventory-liteprototypes - optional generated-app execution evidence for syntax, install, build, health, list, create, and status-transition checks under the local dev command policy
- dependency-free local run viewer for Goal, Progress, Decisions, Artifacts, Review, Trace, and Final Package inspection
The current generated app contract is intentionally narrow: live mode may use a model to extract a schema-constrained product brief, but app code is still rendered by controlled local renderers with explicit capability manifests. Broader behavior such as external calendar sync, production inventory math, auth, deployment, payments, and multi-entity relations is captured as deferred unless a renderer explicitly supports it.
- Node.js 20.11+
- pnpm
Install dependencies:
pnpm installRun the demo:
pnpm demo "Build an inventory request system for a flower company"Use mock mode explicitly for deterministic no-key runs:
pnpm demo "Build an inventory request system for a flower company" --mockUse live mode with any Chat Completions-compatible provider:
AGENTSIM_MODEL_PROVIDER=chat-completions-compatible
AGENTSIM_MODEL_API_KEY=...
AGENTSIM_MODEL_BASE_URL=https://api.openai.com/v1
AGENTSIM_MODEL_NAME=gpt-4.1-mini
pnpm demo "Build an inventory request system for a flower company" --liveExisting OPENAI_* and OPENAI_COMPATIBLE_* environment variables are still supported as aliases.
Optional live-provider reliability controls are AGENTSIM_MODEL_TIMEOUT_MS, AGENTSIM_MODEL_MAX_RETRIES, AGENTSIM_MODEL_RETRY_BASE_DELAY_MS, and AGENTSIM_MODEL_MAX_TOKENS.
If no live key is configured and --live is not passed, Agentsim falls back to mock mode.
Agentsim owns the runtime orchestration, artifact graph, review state, event trace, and final package validation. Live providers only implement the small ModelProvider.generate() boundary.
Each run now writes internal state beside the final package:
outputs/{runId}/state/
+-- run.json
+-- tasks.json
+-- messages.json
+-- events.jsonl
+-- artifacts.json
+-- approvals.json
+-- domain-spec.json
+-- domain-inference.json
+-- workflow-graph.json
+-- tool-registry.json
Use the inspection commands to read that state:
pnpm agentsim inspect <runId>
pnpm agentsim tasks <runId>
pnpm agentsim events <runId>
pnpm agentsim artifacts <runId>
pnpm agentsim approvals <runId>
pnpm agentsim graph <runId>
pnpm agentsim contexts <runId>
pnpm agentsim viewer <runId> --port 4317
pnpm agentsim toolsagentsim viewer starts a read-only HTTP server on 127.0.0.1. It serves a normalized run model and final-package previews from the selected run output. It does not provide chat, approval, rerun, deployment, or editing controls.
Approval records can be resolved locally:
pnpm agentsim approve <runId> <approvalId>
pnpm agentsim reject <runId> <approvalId>
pnpm agentsim resume <runId> [--mock|--live]If a run pauses for approval, resolve it with approve or reject. resume rehydrates the existing run state from outputs/{runId}/state/ and continues the scheduler from the same run ID. It refuses terminal runs and runs with pending approvals. When no provider flag is supplied, resume uses the persisted run model mode.
The current mock workflow only uses safe default operations, so completed mock runs usually have auto-approved artifact records rather than pending dangerous tool approvals. Dangerous tools such as run_command stay disabled unless the orchestrator is explicitly configured to allow commands.
Optional local context can be imported without copying private source into the final package:
pnpm agentsim run "Build a client portal for this project" --mock --repo .Command execution remains disabled by default for agent tools. Generated-app install/build checks are run automatically only when commands are explicitly enabled and allowed by the selected command policy. Model/tool-initiated run_command calls still require tool approval, a context policy that allows run_command, and a command policy:
pnpm agentsim run "Build an inventory request system" --mock --allow-commands --command-policy devAvailable command policies are strict, dev, and unsafe-local. unsafe-local requires --allow-commands and is intended only for trusted local workspaces.
pnpm typecheck
pnpm test
pnpm build
pnpm eval:mockpnpm eval:mock writes JSON and Markdown reports under outputs/evals/, including run duration, validation status, and a coarse failure category for each prompt.
Useful entry points:
src/cli.ts- CLI command surfacesrc/workflow.ts- compatibility wrapper for the orchestratorsrc/orchestrator.ts- run, resume, scheduler, validation, and approval orchestrationsrc/types.ts- core contractssrc/agents/- role definitions and artifact-producing step registrysrc/core/- artifacts, events, hashing, redaction, path safety, repositories, scheduler, tools, and workspace behaviorsrc/viewer/- read-only local run viewer model, HTTP server, HTML, CSS, and browser-side renderingsrc/providers/- model provider boundarysrc/templates/- generated delivery package templatestests/- workflow, CLI, hashing, and redaction coverage
Agentsim is currently built around these concepts:
AgentAgentStepTaskRunRunTaskArtifactWorkspaceDecisionApprovalEvent
Planned contracts include:
OrganizationProjectDomainPackWorkspaceDriverModelProviderArtifactStoreEventStore
The current orchestrator compiles the existing AgentStep registry into persisted tasks, marks dependency-ready tasks, records task lifecycle events, supports approval pause/resume, and keeps the existing final package contract intact. Tools execute through permission-aware wrappers, with dangerous actions requiring approval and explicit command opt-in before execution.
Validation results are converted into review verdicts. Passing validation completes the run. Unrecoverable validation failures fail the run. Recoverable failures emit a repair-loop-disabled event by default; the experimental repair path can create a bounded fix task but does not yet execute automated repairs.
The current implementation is deliberately thin. The priority is to make the software-delivery workflow useful before expanding the platform surface area.
The best contributions right now improve the current vertical slice instead of widening the platform too early.
Who should contribute:
- developers interested in practical CLI tools and local-first workflows
- people who care about artifact quality, review loops, and traceability
- freelancers or technical reviewers who can identify gaps in handoff packages
- contributors willing to keep changes small, testable, and aligned with the current scope
High-impact areas:
- make the CLI demo more reliable across different software project prompts
- add artifact validation for final package completeness
- improve the generated app quality without bloating the demo
- strengthen QA review artifacts and known-issues reporting
- extend the local workspace driver with safe command execution
- improve trace files so every run is easier to inspect and debug
- add domain-pack boundaries only where they make the software-freelance workflow cleaner
Please avoid starting with:
- full web dashboards
- agent marketplaces
- complex persistent memory
- Docker-per-agent infrastructure
- SaaS account systems
- generic multi-agent chat UI
The project needs contributors who want to make one narrow workflow genuinely useful before making the system broad.
Read CONTRIBUTING.md before opening a pull request.
Agentsim uses the Developer Certificate of Origin. Sign off commits with:
git commit -sGood pull requests are small, testable, and aligned with the artifact-first product direction.
The public near-term path is:
- harder CLI demo
- local workspace execution loop
- web artifact viewer
Agentsim is licensed under the Apache License 2.0. See LICENSE.
Code generated by Agentsim for a user's project belongs to that user, subject to the licenses of any dependencies, templates, or third-party assets included in the generated project.