A web application that analyzes GitHub repositories to measure contributor quality and productivity. Get insights into commits, pull requests, issues, and code quality with AI-powered analysis.
- Fetches GitHub Data: Retrieves commits, pull requests, issues, and comments from any repository
- AI Quality Analysis: Uses OpenAI to evaluate the quality of commit messages, PR descriptions, and issues
- Contributor Metrics: Tracks individual contributor statistics including lines changed, PRs created, and quality scores
- Code Quality Analysis: Performs static analysis on Python code to measure complexity and maintainability
- Interactive Dashboard: Visualizes all metrics with charts, graphs, and detailed breakdowns
- Persistent Storage: Saves all analysis to a PostgreSQL database for quick access
# Clone the repository
git clone <repository-url>
cd github_analysis
# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtCreate a .env file in the project root (or copy from .env.example):
cp .env.example .envEdit .env and add your configuration:
# Database Configuration (Required)
DATABASE_URL=postgresql://username:password@host:port/database_name
# API Keys (Optional - can also be entered through the web UI)
GITHUB_TOKEN=your_github_token
OPENAI_API_KEY=your_openai_api_keyImportant: The DATABASE_URL must point to a valid PostgreSQL database. The API keys can be provided either in the .env file or through the web interface.
streamlit run app.pyThe application will open in your browser at http://localhost:8501
-
Enter API Keys on the home page:
- GitHub Personal Access Token (Create one here)
- OpenAI API Key (Get from OpenAI)
-
Enter Repository URL: Paste the full GitHub URL (e.g.,
https://github.com/owner/repo) -
Click "Analyze Repository": The app will fetch and analyze all repository data
-
View Results: Automatically redirected to the dashboard with interactive visualizations
- Commit history and statistics
- Pull request analysis with review quality
- Issue tracking and description quality
- Code quality metrics (complexity, maintainability)
- Language breakdown and file statistics
- Individual contribution metrics
- Quality scores (0-10 scale) for commits, PRs, and issues
- Visual comparisons with radar charts and graphs
- Leaderboards and rankings
- Contributors: Compare contributors across multiple dimensions
- Pull Requests: Detailed PR list with quality indicators
- Issues: Issue tracking with quality metrics
- Code Quality: Static analysis results and improvement suggestions
- Repository Content: Language distribution and file structure
The app uses URL routing for easy navigation:
- Home:
http://localhost:8501/ - Analyzing:
http://localhost:8501/?page=analyse&url=REPO_URL - Repository Dashboard:
http://localhost:8501/?owner=USERNAME&repo=REPO_NAME
All analysis results are stored in a PostgreSQL database. The connection URL is configured via the DATABASE_URL environment variable in .env. Previously analyzed repositories appear on the home page with options to:
- View Dashboard: See existing analysis
- Re-analyze: Fetch fresh data and update metrics
- Python 3.9+
- PostgreSQL database (configured via
.envfile) - GitHub Personal Access Token (can be set in
.envor via web UI) - OpenAI API Key (can be set in
.envor via web UI) - Internet connection for API calls
- GitHub API: Free (5,000 requests/hour for authenticated users)
- OpenAI API: ~$0.01 per 100 items analyzed (using gpt-5-nano model)
- Storage: Minimal (PostgreSQL database)
If you hit GitHub's rate limit, wait an hour or use a different token.
Some repositories may not have Python files, which affects code quality analysis. Other metrics will still be available.
API keys are required and entered through the web interface. They persist for the duration of your browser session.
- Streamlit: Web application framework
- PyGithub: GitHub API wrapper
- OpenAI: AI-powered quality analysis
- PostgreSQL: Production-grade database
- SQLAlchemy: Database ORM
- Plotly: Interactive visualizations
- Radon: Python code quality analysis
This project is for educational purposes as part of a data engineering course.