Agentsim

Agentsim is an open-source tool for turning vague software requests into structured delivery packages.

It is designed for solo software freelancers who need help with planning, building, reviewing, and handing off client work.

A user can give Agentsim a rough client request:

Build an inventory request system for a flower company

Agentsim turns that request into a project package containing planning documents, technical notes, a runnable app prototype, QA review, handoff notes, and trace files.

Agentsim focuses on durable project artifacts rather than chat transcripts.

Goal -> Plan -> Decisions -> Artifacts -> Review -> Final package

Project Status

Agentsim is alpha, pre-1.0 software. The current repo is a CLI proof of concept for one narrow software-freelance workflow, not a general-purpose agent platform or production delivery system.

Current capability:

run a local CLI workflow from a high-level client request
generate planning, technical, review, client handoff, trace, and app prototype files
run in deterministic mock mode without provider keys
persist run, task, artifact, event, message, and approval state under each run
inspect workflow graph, artifacts, context, tools, and a local read-only run viewer from the CLI
import optional read-only repo context for local package generation
inspect and resume local runs from the CLI after approvals are resolved
run a deterministic mock eval suite across several freelance software prompts
use a model-agnostic Chat Completions-compatible provider in live mode when configured

Current limitations:

generated apps are simple prototypes and still need human review before client delivery
the workflow is tuned for the first software-freelance demo, not arbitrary domains
approvals, workspace execution, command execution, repair loops, and provider routing are still early
the workflow graph is local-first; it is not a distributed scheduler
generated-app install/build command checks are disabled by default and require explicit opt-in
no hosted service, SaaS dashboard, or production deployment automation exists yet

Problem

Solo software freelancers often do the work of a small software team by themselves:

clarifying vague client requests
defining scope and risks
choosing architecture
building prototypes
testing and reviewing delivery quality
writing handoff notes
preserving decisions for later changes

Large teams have process, specialists, review loops, and delivery discipline. Solo developers usually have scattered notes, a chat window, and a lot of project context they must keep in their head.

Agentsim provides a structured workflow for turning client requests into reviewed, handoff-ready software artifacts.

Current Focus

The first use case is freelance software delivery:

vague client request
-> AI-assisted workflow
-> scoped plan
-> reviewed runnable prototype
-> handoff-ready package

The generated package currently follows this shape:

outputs/{runId}/final-package/
+-- client/
|   +-- project-summary.md
|   +-- handoff-notes.md
|   +-- user-guide.md
+-- planning/
|   +-- requirements.md
|   +-- scope.md
|   +-- assumptions.md
|   +-- timeline.md
|   +-- risks.md
+-- technical/
|   +-- architecture.md
|   +-- task-breakdown.md
+-- app/
|   +-- package.json
|   +-- src/
|   +-- README.md
+-- review/
|   +-- qa-report.md
|   +-- code-review.md
|   +-- known-issues.md
+-- trace/
    +-- events.jsonl
    +-- agent-messages.json
    +-- agent-actions.json
    +-- decisions.json
    +-- app-spec.json
    +-- app-validation.json
    +-- workflow-graph.json
    +-- tool-registry.json
    +-- artifact-lineage.json

The v0 demo is intentionally small. It validates the core workflow before the project expands.

For the flower-company inventory example, a useful run should produce artifacts such as:

planning/requirements.md describing request creation, list viewing, status updates, and admin needs
technical/architecture.md explaining the simple local app structure and data model
review/qa-report.md checking whether the generated package matches the scoped requirements
trace/events.jsonl showing the workflow events that led to the final package
trace/agent-messages.json showing structured task assignments, handoffs, and review requests between agents
trace/agent-actions.json showing each role-owned action, input artifacts, output artifact, model/source, and status
trace/product-brief.json recording the prompt-specific product brief used before DomainSpec/AppSpec derivation
trace/app-spec.json recording the deterministic runnable app contract, renderer archetype, deferred features, unresolved questions, and acceptance scenarios used by the renderer
trace/app-validation.json recording structured generated-app validation checks, including command checks when enabled
app/test-report.md summarizing generated-app syntax, install, build, and local API smoke checks, or skipped execution when commands are not enabled

Design Principles

Agentsim is not trying to be a foundation model, a cloud IDE, or a generic agent marketplace.

The project is built around a few principles:

Artifacts are more useful than chat transcripts.
Review loops are safer than unchecked generation.
Important decisions should be traceable.
Domain-specific workflows are more useful than generic agent swarms.
High-risk actions should require human approval.
Model providers should be replaceable.

The goal is to make AI-assisted software delivery easier to inspect, review, and continue over time.

Current Capabilities

Agentsim currently has a CLI-based proof of concept:

TypeScript CLI
local filesystem workspace
mock mode for deterministic no-key runs
model-agnostic live provider mode through the Chat Completions-compatible ModelProvider boundary
artifact-producing agent steps for intake, planning, architecture, implementation, QA, and delivery
durable artifact generation
artifact lineage, decision logs, approval records, and event traces
AppSpec-driven runnable generated app package for prompt-specific single-entity crud-workflow, booking-lite, and inventory-lite prototypes
optional generated-app execution evidence for syntax, install, build, health, list, create, and status-transition checks under the local dev command policy
dependency-free local run viewer for Goal, Progress, Decisions, Artifacts, Review, Trace, and Final Package inspection

The current generated app contract is intentionally narrow: live mode may use a model to extract a schema-constrained product brief, but app code is still rendered by controlled local renderers with explicit capability manifests. Broader behavior such as external calendar sync, production inventory math, auth, deployment, payments, and multi-entity relations is captured as deferred unless a renderer explicitly supports it.

Quickstart

Requirements

Node.js 20.11+
pnpm

Install dependencies:

pnpm install

Run the demo:

pnpm demo "Build an inventory request system for a flower company"

Use mock mode explicitly for deterministic no-key runs:

pnpm demo "Build an inventory request system for a flower company" --mock

Use live mode with any Chat Completions-compatible provider:

AGENTSIM_MODEL_PROVIDER=chat-completions-compatible
AGENTSIM_MODEL_API_KEY=...
AGENTSIM_MODEL_BASE_URL=https://api.openai.com/v1
AGENTSIM_MODEL_NAME=gpt-4.1-mini
pnpm demo "Build an inventory request system for a flower company" --live

Existing OPENAI_* and OPENAI_COMPATIBLE_* environment variables are still supported as aliases. Optional live-provider reliability controls are AGENTSIM_MODEL_TIMEOUT_MS, AGENTSIM_MODEL_MAX_RETRIES, AGENTSIM_MODEL_RETRY_BASE_DELAY_MS, and AGENTSIM_MODEL_MAX_TOKENS.

If no live key is configured and --live is not passed, Agentsim falls back to mock mode.

Agentsim owns the runtime orchestration, artifact graph, review state, event trace, and final package validation. Live providers only implement the small ModelProvider.generate() boundary.

Inspect A Run

Each run now writes internal state beside the final package:

outputs/{runId}/state/
+-- run.json
+-- tasks.json
+-- messages.json
+-- events.jsonl
+-- artifacts.json
+-- approvals.json
+-- domain-spec.json
+-- domain-inference.json
+-- workflow-graph.json
+-- tool-registry.json

Use the inspection commands to read that state:

pnpm agentsim inspect <runId>
pnpm agentsim tasks <runId>
pnpm agentsim events <runId>
pnpm agentsim artifacts <runId>
pnpm agentsim approvals <runId>
pnpm agentsim graph <runId>
pnpm agentsim contexts <runId>
pnpm agentsim viewer <runId> --port 4317
pnpm agentsim tools

agentsim viewer starts a read-only HTTP server on 127.0.0.1. It serves a normalized run model and final-package previews from the selected run output. It does not provide chat, approval, rerun, deployment, or editing controls.

Approval records can be resolved locally:

pnpm agentsim approve <runId> <approvalId>
pnpm agentsim reject <runId> <approvalId>
pnpm agentsim resume <runId> [--mock|--live]

If a run pauses for approval, resolve it with approve or reject. resume rehydrates the existing run state from outputs/{runId}/state/ and continues the scheduler from the same run ID. It refuses terminal runs and runs with pending approvals. When no provider flag is supplied, resume uses the persisted run model mode.

The current mock workflow only uses safe default operations, so completed mock runs usually have auto-approved artifact records rather than pending dangerous tool approvals. Dangerous tools such as run_command stay disabled unless the orchestrator is explicitly configured to allow commands.

Optional local context can be imported without copying private source into the final package:

pnpm agentsim run "Build a client portal for this project" --mock --repo .

Command execution remains disabled by default for agent tools. Generated-app install/build checks are run automatically only when commands are explicitly enabled and allowed by the selected command policy. Model/tool-initiated run_command calls still require tool approval, a context policy that allows run_command, and a command policy:

pnpm agentsim run "Build an inventory request system" --mock --allow-commands --command-policy dev

Available command policies are strict, dev, and unsafe-local. unsafe-local requires --allow-commands and is intended only for trusted local workspaces.

Development

pnpm typecheck
pnpm test
pnpm build
pnpm eval:mock

pnpm eval:mock writes JSON and Markdown reports under outputs/evals/, including run duration, validation status, and a coarse failure category for each prompt.

Useful entry points:

src/cli.ts - CLI command surface
src/workflow.ts - compatibility wrapper for the orchestrator
src/orchestrator.ts - run, resume, scheduler, validation, and approval orchestration
src/types.ts - core contracts
src/agents/ - role definitions and artifact-producing step registry
src/core/ - artifacts, events, hashing, redaction, path safety, repositories, scheduler, tools, and workspace behavior
src/viewer/ - read-only local run viewer model, HTTP server, HTML, CSS, and browser-side rendering
src/providers/ - model provider boundary
src/templates/ - generated delivery package templates
tests/ - workflow, CLI, hashing, and redaction coverage

Project Model

Agentsim is currently built around these concepts:

Agent
AgentStep
TaskRun
Run
Task
Artifact
Workspace
Decision
Approval
Event

Planned contracts include:

Organization
Project
DomainPack
WorkspaceDriver
ModelProvider
ArtifactStore
EventStore

The current orchestrator compiles the existing AgentStep registry into persisted tasks, marks dependency-ready tasks, records task lifecycle events, supports approval pause/resume, and keeps the existing final package contract intact. Tools execute through permission-aware wrappers, with dangerous actions requiring approval and explicit command opt-in before execution.

Validation results are converted into review verdicts. Passing validation completes the run. Unrecoverable validation failures fail the run. Recoverable failures emit a repair-loop-disabled event by default; the experimental repair path can create a bounded fix task but does not yet execute automated repairs.

The current implementation is deliberately thin. The priority is to make the software-delivery workflow useful before expanding the platform surface area.

Contributing

The best contributions right now improve the current vertical slice instead of widening the platform too early.

Who should contribute:

developers interested in practical CLI tools and local-first workflows
people who care about artifact quality, review loops, and traceability
freelancers or technical reviewers who can identify gaps in handoff packages
contributors willing to keep changes small, testable, and aligned with the current scope

High-impact areas:

make the CLI demo more reliable across different software project prompts
add artifact validation for final package completeness
improve the generated app quality without bloating the demo
strengthen QA review artifacts and known-issues reporting
extend the local workspace driver with safe command execution
improve trace files so every run is easier to inspect and debug
add domain-pack boundaries only where they make the software-freelance workflow cleaner

Please avoid starting with:

full web dashboards
agent marketplaces
complex persistent memory
Docker-per-agent infrastructure
SaaS account systems
generic multi-agent chat UI

The project needs contributors who want to make one narrow workflow genuinely useful before making the system broad.

Read CONTRIBUTING.md before opening a pull request.

Agentsim uses the Developer Certificate of Origin. Sign off commits with:

git commit -s

Good pull requests are small, testable, and aligned with the artifact-first product direction.

Roadmap

The public near-term path is:

harder CLI demo
local workspace execution loop
web artifact viewer

License

Agentsim is licensed under the Apache License 2.0. See LICENSE.

Code generated by Agentsim for a user's project belongs to that user, subject to the licenses of any dependencies, templates, or third-party assets included in the generated project.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github		.github
docs		docs
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc.json		.prettierrc.json
AGENTS.md		AGENTS.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
eslint.config.js		eslint.config.js
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentsim

Project Status

Problem

Current Focus

Design Principles

Current Capabilities

Quickstart

Requirements

Inspect A Run

Development

Project Model

Contributing

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agentsim

Project Status

Problem

Current Focus

Design Principles

Current Capabilities

Quickstart

Requirements

Inspect A Run

Development

Project Model

Contributing

Roadmap

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages