An intelligent attendance analytics chatbot for NSUT that predicts leave eligibility and provides insights into your attendance patterns using web scraping and conversational AI.
The logged-in workspace now combines chat with an interactive dashboard:
- subject, semester, date-range, status, and search filters
- overall metrics, subject comparison bars, cumulative attendance trend, subject table, and date-wise records
- authenticated profile header with portal name, roll number, and captured student photo when the portal exposes it
- chat commands for summary, subject-wise details, absences, risk/safe subjects, profile, website surfaces, and shortcut help
- Performance: Playwright is 2-3x faster than Selenium for modern web apps
- Better Frame Handling: Seamlessly navigates complex frame structures (like the NSUT portal's banner/data frames)
- Built-in Captcha Support: Easy screenshot capture for headless automation
- Multi-language: Works with Python, Node.js, Java, .NET (we use Python)
- Sync & Async: We use sync API for simplicity; async available for scaling
The NSUT portal enforces CAPTCHA to prevent automated abuse. Our flow:
- User submits roll number + password
- Backend loads the NSUT login form (framed)
- Playwright captures the CAPTCHA image & sends to frontend
- User solves CAPTCHA in the UI
- Backend submits CAPTCHA + credentials β scrapes attendance data
- Results cached for 5 minutes to avoid repeated logins
βββββββββββββββ ββββββββββββββββββββ
β Frontend ββββββ JSON API βββββΊβ Flask Backend β
β (HTML/JS) β β (app.py) β
βββββββββββββββ ββββββββββββββββββββ
β
βββββββββΌβββββββββ
β Scraper β
β (playwright, β
β beautifulsoup)β
ββββββββββββββββββ
β
βΌ
ββββββββββββββββββββ
β NSUT Portal β
β (framed structure)
ββββββββββββββββββββ
Kairon/
βββ README.md # This file
βββ requirements.txt # Project dependencies
βββ .env # Credentials (DO NOT COMMIT)
β
βββ backend/
β βββ app.py # Flask API routes
β βββ scraper.py # Web scraper (Playwright/BeautifulSoup)
β βββ chatbot.py # Chatbot Q&A engine
β βββ playwright_manager.py # Playwright lifecycle manager
β βββ logging_config.py # Structured logging setup
β βββ requirements.txt # Backend-specific dependencies
β βββ data/ # Cached attendance JSON files
β
βββ frontend/
β βββ index.html # Main UI
β βββ style.css # Styling
β βββ js/
β βββ app.js # Frontend logic
β
βββ css/
β βββ main.css # Shared CSS
β
βββ .venv/ # Python virtual environment (gitignored)
- Python 3.12+
- macOS / Linux / Windows (with WSL2)
cd /Volumes/algsoch/sachin/Kairon# Create a Python 3.12 virtual environment
python3.12 -m venv .venv
# Activate it
source .venv/bin/activate
# On Windows:
# .venv\Scripts\activate# Ensure pip is installed in the venv
.venv/bin/python -m ensurepip --upgrade
.venv/bin/python -m pip install --upgrade pip setuptools wheel# Install all project requirements
.venv/bin/python -m pip install -r requirements.txt
# Download Playwright browsers (required for scraping)
.venv/bin/python -m playwright install chromiumCreate a .env file in the project root:
cat > .env << 'EOF'
roll_no=YOUR_ROLL_NUMBER
password=YOUR_PASSWORD
EOF.env to Git. It's already in .gitignore.
cd backend
../.venv/bin/python app.pyYou should see:
* Serving Flask app 'app'
* Debug mode: on
* Running on http://127.0.0.1:5000
Navigate to http://127.0.0.1:5000 and log in with your NSUT credentials.
All endpoints return JSON. Requires session_id (except login/cache check).
Start login flow. Frontend sends roll number + password; backend captures CAPTCHA.
Request:
{
"rollno": "2024ABC0000",
"password": "your_password"
}semester is intentionally not required. The scraper reads the authenticated attendance form and tries likely year/semester filters internally.
Response (Success):
{
"success": true,
"session_id": "uuid-string",
"captcha_base64": "data:image/png;base64,..."
}Next Step: User solves CAPTCHA & calls /api/captcha.
Submit CAPTCHA solution & scrape attendance.
Request:
{
"session_id": "uuid-string",
"captcha": "ABC123"
}Response (Success):
{
"success": true,
"message": "Login successful! I've fetched your attendance data..."
}Chat with the attendance assistant. Try:
"HI"β Full attendance dashboard"SW"β Subject-wise list, then enter a number for details"TOTAL"β Overall attendance and total absent classes"ABSENT"β Subject-wise absences, plus exact dates when v2 day-wise data exists"SAFE"β Subjects where the student can skip classes while staying above 75%"RISK"β Borderline or below-threshold subjects"PLAN"β Priority action plan with next-missed-class impact"SEMESTERS"β Synced semester/year filters and semester-wise summary"PROFILE"β Authenticated student profile summary in the local app"CALENDAR"β Portal marks such as GH/TL/CS/MB"WEBSITE"β Authenticated website sections discovered after login"MEMEC303"β Details for one subject by code/name
Request:
{
"session_id": "uuid-string",
"message": "PROFILE"
}Response:
{
"assistant_version": "data-analysis-assistant-v2",
"reply": "**Student profile from attendance portal data**\n\n- Name: **Example Student**\n- Roll no: **20...000**\n- Degree: **B.Tech.**\n- Department: **MECHANICAL ENGINEERING**\n- Semester: **3**\n- Academic year: **2025-26**\n- Portal photo: **available**"
}Public docs use redacted sample identifiers. Do not paste a real roll number, encrypted portal URL, student ID, or portal screenshot into README/PR text.
Load previous attendance data from local cache. Skip CAPTCHA if cached.
Request:
{
"rollno": "2024ABC0000"
}Response (if cache exists):
{
"success": true,
"session_id": "new-uuid",
"message": "Loaded from cache",
"assistant_version": "data-analysis-assistant-v2",
"cache_schema_version": 2,
"cache_needs_refresh": false
}Get raw attendance analysis (JSON).
Request:
{
"session_id": "uuid-string"
}Response:
{
"success": true,
"analysis": {
"schema_version": 2,
"student": {
"name": "Example Student",
"rollno": "2024ABC0000",
"department": "MECHANICAL ENGINEERING",
"degree": "B.Tech.",
"photo_available": true
},
"attendance": [
{
"subject": "Strength of Materials",
"code": "MEMEC303",
"attended": 37,
"total": 49,
"absent": 12,
"percentage": 75.51,
"status_75": "borderline",
"status_65": "safe",
"absent_dates": ["2025-08-01", "2025-08-04"]
}
],
"insights": {
"overall_percentage": 82.51,
"total_attended": 217,
"total_classes": 263,
"total_absent": 46
}
}
}The server logs all endpoint access with request/response details:
[INFO] POST /api/login | Status: 200 | Duration: 8.45s
[INFO] POST /api/captcha | Status: 200 | Duration: 15.32s
[INFO] POST /api/chat | Status: 200 | Duration: 0.12s
[ERROR] POST /api/login | Status: 401 | Reason: Invalid credentials
Edit backend/app.py, in the login() function, change:
scraper = AttendanceScraper(use_mock=False)to:
scraper = AttendanceScraper(use_mock=True)Then restart the server. Mock logins return instant results without contacting NSUT.
cd backend
../.venv/bin/python -m pytest test_scraper.py -v- New scraper logic? β Add to
backend/scraper.pyβAttendanceScraperclass - New chatbot features? β Add to
backend/chatbot.pyβChatbotEngineclass - New API route? β Add to
backend/app.pyβ Register with@app.route() - Frontend logic? β Edit
frontend/js/app.js
Set in backend/app.py:
import logging
logging.basicConfig(level=logging.DEBUG)- Activate venv:
source .venv/bin/activate - Reinstall deps:
.venv/bin/python -m pip install -r requirements.txt
- Run:
.venv/bin/python -m playwright install chromium - Verify:
.venv/bin/python -c "from playwright.sync_api import sync_playwright; sync_playwright().start()"
- NSUT portal may be down or changed structure
- Check: Visit https://www.imsnsit.org/imsnsit/ manually
- Debug screenshot saved as
debug_menu_final.pnginbackend/(on error)
- Sessions last 5 minutes
- Re-login from scratch: POST to
/api/loginagain
# Kill process using port 5000
lsof -i :5000 | grep LISTEN | awk '{print $2}' | xargs kill -9
# Or use different port in app.py:
# app.run(port=5001)See diagram below (generated with Mermaid)
Each user gets a unique session_id UUID. The server maintains active sessions for 5 minutes before cleanup.
Attendance data is cached per user (rollno) in backend/data/<rollno>.json. Check cache before re-scraping.
After initial scrape, the bot clicks on subject links to fetch per-day attendance records (Present/Absent for each date).
- 75%: Default threshold (most strict) β minimum for eligibility
- 65%: Extended threshold (more lenient) β backup option
Because the app uses Playwright to open the NSUT portal in a hidden Chromium browser, Render needs a build step that installs Chromium and a start command that binds Gunicorn to Render's assigned port.
Since the Flask backend automatically serves the frontend files, you only need to deploy a single Web Service!
The easiest path is the provided render.yaml Blueprint. Use the manual settings below when you want to configure the Web Service yourself from the Render dashboard.
- Go to your Render Dashboard.
- Click New -> Blueprint.
- Connect your GitHub repository.
- Select the repository containing
render.yaml. - Click Apply.
If you do not use Blueprint, create New -> Web Service and use these values:
| Render field | Value |
|---|---|
| Runtime | Python 3 |
| Root Directory | Leave empty, or set to repository root |
| Build Command | pip install --upgrade pip && pip install -r backend/requirements.txt && PLAYWRIGHT_BROWSERS_PATH=/opt/render/project/playwright python -m playwright install --with-deps chromium |
| Start Command | cd backend && gunicorn app:app --bind 0.0.0.0:$PORT --workers 1 --threads 1 --timeout 180 |
| Health Check Path | /api/config |
Add these environment variables in Environment:
| Key | Value |
|---|---|
PLAYWRIGHT_BROWSERS_PATH |
/opt/render/project/playwright |
PYTHON_VERSION |
3.12.4 |
HOST |
0.0.0.0 |
roll_no |
Your test roll number, only if you want the login form prefilled |
password |
Your portal password, only as a Render secret |
ATTENDANCE_YEAR |
Optional preferred academic year, for example 2025-26 |
ATTENDANCE_SEMESTER |
Optional preferred semester, for example 4 |
CAPTCHA_SOLVER |
Optional: runanywhere or tesseract |
RUNANYWHERE_CAPTCHA_URL |
Required only when CAPTCHA_SOLVER=runanywhere |
RUNANYWHERE_API_KEY |
Required only when your Runanywhere endpoint needs an API key |
Render supplies PORT automatically; do not hard-code it. Keep backend/data/, backend/scrape/, .env, screenshots, and debug HTML out of git because they are local runtime artifacts and may contain portal data.
If the live app says Playwright browser failed to start, the deployed service was built without Chromium or with a different PLAYWRIGHT_BROWSERS_PATH than runtime. Update the Build Command and environment variable above, then trigger Manual Deploy -> Clear build cache & deploy on Render.
Note: The first deployment can take 2-4 minutes because Chromium is downloaded during the build.
MIT License. See LICENSE file (if present).
For issues, check logs or raise an issue in the repository.
Happy learning! π
- Raw HTML Parsing: Bypasses the portal's complex frameset architecture and CSS
display: nonerestrictions to reliably extract links. - Captcha Streaming: Captures the NSUT captcha image and streams it to the modern UI for human-in-the-loop solving.
- Local Caching: Saves your deep-scraped day-wise data to
backend/data/locally so you only have to log in once! - Intelligent Predictions: Calculates "Safe to Skip" and "Needed Classes" based on dynamic 75% and 65% thresholds.