| title | CodeSage |
|---|---|
| colorFrom | blue |
| colorTo | green |
| sdk | streamlit |
| sdk_version | 1.28.0 |
| app_file | ui/app.py |
| pinned | false |
CodeSage is a codebase analysis tool that combines static analysis and Retrieval-Augmented Generation (RAG) to assist with code exploration and refactoring. It uses Abstract Syntax Tree (AST) parsing, vector embeddings, and dependency graph traversal to answer architectural queries and compute complexity metrics.
The system implements a RAG pipeline to process natural language queries about the codebase:
- Vector Search: Converts queries into dense vector embeddings and retrieves relevant code segments from a local FAISS index.
- Graph Context: Extracts structural relationships (imports, class hierarchies, function calls) using an in-memory
NetworkXgraph. - LLM Integration: Passes the context to a Large Language Model (via Groq) to synthesize responses based on the provided codebase data.
Constructs a relational dependency graph mapping files, classes, and methods. Users can supply a target module to identify its downstream dependencies and evaluate the potential impact of modifications.
Utilizes Python's native ast module to analyze source code complexity. It tracks:
- Cyclomatic complexity.
- Function length and parameter counts.
- Multiple-exit-point control structures.
Integrates LLM-driven tools for code maintenance:
- Bug Hunter: Analyzes error tracebacks, locates the failing source file using the vector index, and suggests fixes.
- Auto-Refactor: Processes specified code blocks to recommend PEP-8 compliant and structurally improved alternatives.
A watchdog daemon monitors the active directory tree for modifications, updating semantic vectors and the AST graph to maintain index consistency.
- Analysis:
ast,networkx - Vector Search:
sentence-transformers(all-MiniLM-L6-v2),faiss-cpu - LLM API: Groq (
llama-3.1-8b-instant) - UI Framework: Streamlit
1. Environment Setup Requires Python 3.10 or higher.
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt2. Configuration
Copy the .env.example template to .env and configure your API keys:
GROQ_API_KEY=your_api_key_here
LLM_MODEL=llama-3.1-8b-instant
EMBEDDING_MODEL=all-MiniLM-L6-v2The application requires an initial index of the codebase to process queries.
1. Build the Index Parse the target directory to generate the required index artifacts:
python3 main.py index2. Start the Interface Launch the Streamlit dashboard locally:
python3 main.py uiThe Streamlit client runs on localhost:8501 by default.
3. Run the File Monitor (Optional) Start a background process to update the index upon file modifications:
python3 main.py watch