Skip to content

AI-powered GitHub issue intelligence - semantic duplicate detection, cross-repo search, and intelligent issue routing

License

Notifications You must be signed in to change notification settings

similigh/simili-bot

Simili Logo

Simili Bot

AI-Powered GitHub Issue Intelligence.

Build Status Release License Stars

Automatically detect duplicate issues, find similar issues with semantic search, and intelligently route issues across repositories.

Star History Chart


Features

  • Semantic Duplicate Detection — Find related issues using AI-powered embeddings, not just keyword matching.
  • Cross-Repository Search — Search for similar issues across your organization.
  • Intelligent Routing — Automatically transfer issues to the correct repository based on content.
  • Smart Triage — AI-powered labeling and quality assessment.
  • Modular Pipeline — Customize workflows with plug-and-play steps.
  • Multi-Repo Support — Central configuration with per-repo overrides.

Architecture

Simili uses a "Lego with Blueprints" architecture:

  • Lego Blocks: Independent, reusable pipeline steps (Gatekeeper, Similarity, Triage, etc.).
  • Blueprints: Pre-defined workflows for common use cases.
  • State Branch: Git-based state management using an orphan branch (no comment scanning).
┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│ Gatekeeper  │───▶│  Similarity │───▶│   Triage    │───▶│   Action    │
│   Check     │    │   Search    │    │  Analysis   │    │  Executor   │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘

Quick Start

Simili-Bot supports both Single-Repository and Organization-wide setups.

Setup Guides

Guide Description
Single Repo Setup Instructions for setting up Simili-Bot on a standalone repository.
Organization Setup Best practices for deploying across an organization using Reusable Workflows.

AI Provider Configuration

Simili supports both Gemini and OpenAI.

  • Set at least one key: GEMINI_API_KEY or OPENAI_API_KEY
  • If both keys are set, Simili uses Gemini by default (Gemini takes precedence)
  • If only one key is set, Simili uses that provider

Default models:

  • LLM: gemini-2.0-flash-lite (Gemini), gpt-5.2 (OpenAI)
  • Embeddings: text-embedding-004 (Gemini), text-embedding-3-small (OpenAI)

If you override embedding.model, keep embedding.dimensions aligned with the model:

  • text-embedding-004 -> 768
  • gemini-embedding-001 -> 3072
  • text-embedding-3-small -> 1536
  • text-embedding-3-large -> 3072

Examples

We provide copy-pasteable examples to get you started quickly:

Available Workflows

You can specify a workflow in your simili.yaml or define custom steps.

Preset Description
issue-triage Full pipeline: similarity search, duplicate check, triage analysis, and action execution.
similarity-only Runs similarity search only. Useful for "Find Similar Issues" features without auto-triage.
index-only Indexes issues to the vector database without providing feedback.

CLI Commands

Simili provides a powerful CLI for local development, testing, and batch operations.

simili index

Bulk index issues from a GitHub repository into the vector database.

simili index --repo owner/repo --workers 5 --limit 100

Flags:

  • --repo (required): Target repository (owner/name)
  • --workers: Number of concurrent workers (default: 5)
  • --since: Start from issue number or timestamp
  • --limit: Maximum issues to index
  • --dry-run: Simulate without writing to database

simili process

Process a single issue through the pipeline.

simili process --issue issue.json --workflow issue-triage --dry-run

Flags:

  • --issue: Path to issue JSON file
  • --workflow: Workflow preset to run (default: "issue-triage")
  • --dry-run: Run without side effects
  • --repo, --org, --number: Override issue fields

simili batch

Process multiple issues from a JSON file in batch mode. All operations run in dry-run mode to prevent GitHub writes.

simili batch --file issues.json --format csv --out-file results.csv --workers 5

Use Cases:

  • Test bot logic on historical data without spamming repositories
  • Generate reports showing similarity analysis and duplicate detection
  • Analyze issues from repositories where you lack write access
  • Bulk identify transfer recommendations and quality scores

Flags:

  • --file (required): Path to JSON file with array of issues
  • --out-file: Output file path (stdout if not specified)
  • --format: Output format: json or csv (default: json)
  • --workers: Number of concurrent workers (default: 1)
  • --workflow: Workflow preset (default: "issue-triage")
  • --collection: Override Qdrant collection name
  • --threshold: Override similarity threshold
  • --duplicate-threshold: Override duplicate confidence threshold
  • --top-k: Override max similar issues to show

Input Format:

Create a JSON file with an array of issues:

[
  {
    "org": "owner",
    "repo": "repo-name",
    "number": 123,
    "title": "Issue title",
    "body": "Issue description...",
    "state": "open",
    "labels": ["bug", "high-priority"],
    "author": "username",
    "created_at": "2026-02-10T10:00:00Z"
  }
]

Output Formats:

  • JSON: Full pipeline results with detailed analysis
  • CSV: Flattened summary for spreadsheet analysis

Example Workflow:

# 1. Index repository issues
simili index --repo ballerina-platform/ballerina-library --workers 10

# 2. Prepare test issues in batch.json
# 3. Run batch analysis
simili batch --file batch.json --format csv --out-file analysis.csv --workers 5

# 4. Review results
cat analysis.csv

Configuration

Minimal .github/simili.yaml example:

qdrant:
  url: "${QDRANT_URL}"
  api_key: "${QDRANT_API_KEY}"
  collection: "my-issues"

embedding:
  provider: "gemini"
  api_key: "${GEMINI_API_KEY}"
  model: "gemini-embedding-001"

llm:
  provider: "gemini"
  api_key: "${GEMINI_API_KEY}"
  model: "gemini-2.5-flash"
  # temperature: 0.3

defaults:
  similarity_threshold: 0.65
  max_similar_to_show: 5

Notes:

  • llm.model defaults to gemini-2.5-flash when omitted.
  • llm.api_key can be omitted if GEMINI_API_KEY is set.
  • You can override the model at runtime with LLM_MODEL.

Development

# Clone the repository
git clone https://github.com/similigh/simili-bot.git
cd simili-bot

# Build
go build ./...

# Run tests
go test ./...

# Lint
go vet ./...

License

This project is licensed under the Apache License 2.0 — see the LICENSE file for details.


Made by the Simili Team

About

AI-powered GitHub issue intelligence - semantic duplicate detection, cross-repo search, and intelligent issue routing

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 8