IIT KGP Internship Agent

An AI-powered multi-agent pipeline that automates personalized research internship outreach for IIT Kharagpur students.

Features

** CV Parsing**: Extracts skills, projects, interests from PDF/DOCX CVs using Gemini
** Faculty Scraping**: Scrapes 188+ faculty profiles from IIT KGP website with real emails
** Intelligent Matching**: Auto-selects relevant departments (CS, EC, EE) based on your CV
** Research Enrichment**: Uses Serper API to find professor publications & research areas
** Personalized Emails**: Generates tailored outreach emails with match scores (0.0-1.0)
** Cover Letters**: Creates professional cover letters for each professor

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                        LangGraph Workflow                            │
├─────────────────────────────────────────────────────────────────────┤
│                                                                       │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────────────┐   │
│  │   Parse CV   │───►│ Recommend    │───►│   Scrape Faculty     │   │
│  │   (Gemini)  │    │ Departments  │    │   (Playwright)       │   │
│  └──────────────┘    └──────────────┘    └──────────┬───────────┘   │
│                                                      │               │
│                                                      ▼               │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │                    Enrich Profiles                            │   │
│  │                    (Serper API)                               │   │
│  └──────────────────────────────────────┬───────────────────────┘   │
│                                          │                           │
│                                          ▼                           │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │               Generate Personalized Emails                    │   │
│  │                       (Gemini)                                │   │
│  └──────────────────────────────────────────────────────────────┘   │
│                                                                       │
└─────────────────────────────────────────────────────────────────────┘

Quick Start

1. Clone & Install

git clone <repo-url>
cd internship-using-ai-agents
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
playwright install chromium

2. Configure API Keys

cp .env.example .env
# Edit .env with your keys:
# - GOOGLE_API_KEY (required for Gemini - get from aistudio.google.com)
# - SERPER_API_KEY (free tier: 2500/month at serper.dev)

3. Run Pipeline

# Basic run with 10 professors
python -m src.main --cv your_cv.pdf --limit 10

# Target specific department
python -m src.main --cv your_cv.pdf --departments "Computer Science" --limit 20

# Set minimum match score
python -m src.main --cv your_cv.pdf --limit 15 --min-score 0.3

Project Structure

devsoc/
├── src/
│   ├── agents/                 # AI Agents
│   │   ├── cv_parser.py        # Agent 1: CV Parsing
│   │   ├── faculty_scraper.py  # Agent 2: Faculty Scraping
│   │   ├── research_enrichment.py  # Agent 3: Research Enrichment
│   │   └── personalization.py  # Agent 4: Email Generation
│   │
│   ├── tools/                  # Agent Tools
│   │   ├── document_parser.py  # PDF/DOCX extraction
│   │   ├── web_scraper.py      # Playwright scraper
│   │   └── search_tool.py      # Serper API integration
│   │
│   ├── graph/                  # LangGraph Workflow
│   │   ├── workflow.py         # Node definitions & edges
│   │   └── state.py            # TypedDict state schema
│   │
│   ├── schemas/                # Pydantic Models
│   │   ├── cv.py               # StudentCV, Project, etc.
│   │   ├── faculty.py          # FacultyProfile
│   │   ├── research.py         # EnrichedProfile
│   │   └── output.py           # EmailOutput
│   │
│   ├── utils/                  # Utilities
│   │   ├── config.py           # Pydantic Settings
│   │   ├── logger.py           # Structured logging
│   │   └── department_recommender.py  # CV → Department mapping
│   │
│   └── main.py                 # CLI entry point
│
├── data/
│   ├── raw/                    # Scraped faculty profiles
│   ├── enriched/               # Enriched with research data
│   └── outputs/                # Generated emails (JSON)
│
├── requirements.txt
├── .env.example
└── ARCHITECTURE.md             # Detailed architecture docs

Configuration

Environment Variable	Description	Required
`GOOGLE_API_KEY`	Google API key for Gemini	Yes
`SERPER_API_KEY`	Serper.dev API key (free tier)	Yes
`SCRAPER_DELAY`	Delay between requests (default: 2.0s)	No
`LLM_MODEL`	Gemini model (default: gemini-1.5-flash)	No
`LOG_LEVEL`	Logging level (default: INFO)	No

Sample Output

{
  "professor_name": "Abhijnan Chakraborty",
  "professor_email": "abhijnan@cse.iitkgp.ac.in",
  "department": "CS",
  "match_reasons": ["AI research", "DeepFake detection"],
  "overall_match_score": 0.65,
  "email_subject": "Research Internship Inquiry - AI-Powered Platforms",
  "email_body": "Dear Professor Chakraborty,\n\n..."
}

How It Works

1. CV Parsing

Extracts text from PDF/DOCX using PyMuPDF/python-docx
Gemini structures data into: skills, projects, internships, interests
Identifies relevant departments based on skills (ML → CS, VLSI → EC)

2. Faculty Scraping

Navigates to department pages (/department/CS, /department/EC)
Selects "Faculty" from dropdown, waits for dynamic content
Extracts: name, email, designation, research areas, profile URL

3. Research Enrichment

Searches Google via Serper API for each professor
Queries: general info, publications, Google Scholar
Enhances profiles with recent research, publications

4. Email Generation

Calculates match score based on CV ↔ Professor overlap
Gemini generates personalized email referencing:
- Specific professor research areas
- Student's relevant projects
- Common interests/technologies

Performance

Metric	Value
Departments scraped	5 (CS, EC, EE, MA, CH)
Faculty profiles	188 total
Serper searches/prof	3 queries
Email generation	~15s per email
Avg match score	0.55 - 0.70

Development

# Run tests
pytest tests/

# Format code
black src/

# Type checking
mypy src/

🤝 Contributing

We welcome contributions! We have prepared a detailed roadmap of standalone features that are ready to be built. Whether you are a beginner or an expert, there is something for you.

👉 Check out the Community Roadmap & Contribution Guide

Quick Start for Contributors

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

MIT License - Built for DevSoC IIT Kharagpur

Credits

LangGraph - Agent orchestration framework
Playwright - Web scraping
Serper.dev - Google Search API
Google Gemini - LLM for NLU/NLG

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Learning Scripts		Learning Scripts
api		api
sample_cvs		sample_cvs
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
COMMUNITY_ROADMAP.md		COMMUNITY_ROADMAP.md
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
visualize_graph.py		visualize_graph.py
workflow_graph.png		workflow_graph.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IIT KGP Internship Agent

Features

Architecture

Quick Start

1. Clone & Install

2. Configure API Keys

3. Run Pipeline

Project Structure

Configuration

Sample Output

How It Works

1. CV Parsing

2. Faculty Scraping

3. Research Enrichment

4. Email Generation

Performance

Development

🤝 Contributing

Quick Start for Contributors

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

IIT KGP Internship Agent

Features

Architecture

Quick Start

1. Clone & Install

2. Configure API Keys

3. Run Pipeline

Project Structure

Configuration

Sample Output

How It Works

1. CV Parsing

2. Faculty Scraping

3. Research Enrichment

4. Email Generation

Performance

Development

🤝 Contributing

Quick Start for Contributors

License

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages