Simili Bot

AI-Powered GitHub Issue Intelligence.

Automatically detect duplicate issues, find similar issues with semantic search, and intelligently route issues across repositories.

Features

Semantic Duplicate Detection — Find related issues using AI-powered embeddings, not just keyword matching.
Cross-Repository Search — Search for similar issues across your organization.
Intelligent Routing — Automatically transfer issues to the correct repository based on content.
Smart Triage — AI-powered labeling and quality assessment.
Modular Pipeline — Customize workflows with plug-and-play steps.
Multi-Repo Support — Central configuration with per-repo overrides.

Architecture

Simili uses a "Lego with Blueprints" architecture:

Lego Blocks: Independent, reusable pipeline steps (Gatekeeper, Similarity, Triage, etc.).
Blueprints: Pre-defined workflows for common use cases.
State Branch: Git-based state management using an orphan branch (no comment scanning).

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│ Gatekeeper  │───▶│  Similarity │───▶│   Triage    │───▶│   Action    │
│   Check     │    │   Search    │    │  Analysis   │    │  Executor   │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘

Quick Start

Simili-Bot supports both Single-Repository and Organization-wide setups.

Setup Guides

Guide	Description
Single Repo Setup	Instructions for setting up Simili-Bot on a standalone repository.
Organization Setup	Best practices for deploying across an organization using Reusable Workflows.

AI Provider Configuration

Simili supports both Gemini and OpenAI.

Set at least one key: GEMINI_API_KEY or OPENAI_API_KEY
If both keys are set, Simili uses Gemini by default (Gemini takes precedence)
If only one key is set, Simili uses that provider

Default models:

LLM: gemini-2.0-flash-lite (Gemini), gpt-5.2 (OpenAI)
Embeddings: text-embedding-004 (Gemini), text-embedding-3-small (OpenAI)

If you override embedding.model, keep embedding.dimensions aligned with the model:

text-embedding-004 -> 768
gemini-embedding-001 -> 3072
text-embedding-3-small -> 1536
text-embedding-3-large -> 3072

Examples

We provide copy-pasteable examples to get you started quickly:

Multi-Repo Examples: Includes shared workflow, caller workflow, and central config.
Single-Repo Examples: Standard workflow and configuration.

Available Workflows

You can specify a workflow in your simili.yaml or define custom steps.

Preset	Description
`issue-triage`	Full pipeline: similarity search, duplicate check, triage analysis, and action execution.
`similarity-only`	Runs similarity search only. Useful for "Find Similar Issues" features without auto-triage.
`index-only`	Indexes issues to the vector database without providing feedback.

CLI Commands

Simili provides a powerful CLI for local development, testing, and batch operations.

`simili index`

Bulk index issues from a GitHub repository into the vector database.

simili index --repo owner/repo --workers 5 --limit 100

Flags:

--repo (required): Target repository (owner/name)
--workers: Number of concurrent workers (default: 5)
--since: Start from issue number or timestamp
--limit: Maximum issues to index
--dry-run: Simulate without writing to database

`simili process`

Process a single issue through the pipeline.

simili process --issue issue.json --workflow issue-triage --dry-run

Flags:

--issue: Path to issue JSON file
--workflow: Workflow preset to run (default: "issue-triage")
--dry-run: Run without side effects
--repo, --org, --number: Override issue fields

`simili batch`

Process multiple issues from a JSON file in batch mode. All operations run in dry-run mode to prevent GitHub writes.

simili batch --file issues.json --format csv --out-file results.csv --workers 5

Use Cases:

Test bot logic on historical data without spamming repositories
Generate reports showing similarity analysis and duplicate detection
Analyze issues from repositories where you lack write access
Bulk identify transfer recommendations and quality scores

Flags:

--file (required): Path to JSON file with array of issues
--out-file: Output file path (stdout if not specified)
--format: Output format: json or csv (default: json)
--workers: Number of concurrent workers (default: 1)
--workflow: Workflow preset (default: "issue-triage")
--collection: Override Qdrant collection name
--threshold: Override similarity threshold
--duplicate-threshold: Override duplicate confidence threshold
--top-k: Override max similar issues to show

Input Format:

Create a JSON file with an array of issues:

[
  {
    "org": "owner",
    "repo": "repo-name",
    "number": 123,
    "title": "Issue title",
    "body": "Issue description...",
    "state": "open",
    "labels": ["bug", "high-priority"],
    "author": "username",
    "created_at": "2026-02-10T10:00:00Z"
  }
]

Output Formats:

JSON: Full pipeline results with detailed analysis
CSV: Flattened summary for spreadsheet analysis

Example Workflow:

# 1. Index repository issues
simili index --repo ballerina-platform/ballerina-library --workers 10

# 2. Prepare test issues in batch.json
# 3. Run batch analysis
simili batch --file batch.json --format csv --out-file analysis.csv --workers 5

# 4. Review results
cat analysis.csv

Configuration

Minimal .github/simili.yaml example:

qdrant:
  url: "${QDRANT_URL}"
  api_key: "${QDRANT_API_KEY}"
  collection: "my-issues"

embedding:
  provider: "gemini"
  api_key: "${GEMINI_API_KEY}"
  model: "gemini-embedding-001"

llm:
  provider: "gemini"
  api_key: "${GEMINI_API_KEY}"
  model: "gemini-2.5-flash"
  # temperature: 0.3

defaults:
  similarity_threshold: 0.65
  max_similar_to_show: 5

Notes:

llm.model defaults to gemini-2.5-flash when omitted.
llm.api_key can be omitted if GEMINI_API_KEY is set.
You can override the model at runtime with LLM_MODEL.

Development

# Clone the repository
git clone https://github.com/similigh/simili-bot.git
cd simili-bot

# Build
go build ./...

# Run tests
go test ./...

# Lint
go vet ./...

License

This project is licensed under the Apache License 2.0 — see the LICENSE file for details.

Made by the Simili Team

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
.claude/sessions		.claude/sessions
.github		.github
DOCS		DOCS
assets		assets
cmd		cmd
internal		internal
tests/integration		tests/integration
.env.sample		.env.sample
.gitignore		.gitignore
.goreleaser.yaml		.goreleaser.yaml
.simili.yaml		.simili.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
action.yml		action.yml
go.mod		go.mod
go.sum		go.sum
simili-cli		simili-cli

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simili Bot

Features

Architecture

Quick Start

Setup Guides

AI Provider Configuration

Examples

Available Workflows

CLI Commands

`simili index`

`simili process`

`simili batch`

Configuration

Development

License

About

Uh oh!

Releases 6

Packages

Contributors 8

Uh oh!

Languages

License

similigh/simili-bot

Folders and files

Latest commit

History

Repository files navigation

Simili Bot

Features

Architecture

Quick Start

Setup Guides

AI Provider Configuration

Examples

Available Workflows

CLI Commands

simili index

simili process

simili batch

Configuration

Development

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Contributors 8

Uh oh!

Languages

`simili index`

`simili process`

`simili batch`

Packages