Mission: Build "GraphMind"—an advanced conversational AI that uses Graph of Thoughts (GoT) and Mixture of Experts (MoE) to provide verified answers using ONLY data scraped from MetaKGP.
Duration: 5 Days (Monday - Friday)
| Event | Date | Time | Details |
|---|---|---|---|
| Kickoff | Mon, Jan 12 | 5:00 PM | Problem Release & Team Formation |
| Code Freeze | Fri, Jan 16 | 12:00 PM | Submission Deadline (Strict) |
Large Language Models (LLMs) often hallucinate when dealing with niche, institutional knowledge. They generate plausible-sounding but factually incorrect information because they lack real-time access to specific local data.
Your task is to build a chatbot that answers questions strictly using data you scrape from MetaKGP / MetaWiki.
Why this is hard:
- No Pre-made Dataset: You must build the pipeline to scrape, clean, and index the data yourself.
- Stale Data Risks: LLMs have outdated internal knowledge about IIT Kharagpur; you must force them to use your scraped data.
- Hallucination: If the scraper misses a page, the LLM might make something up. Your verification system must prevent this.
- Team Size: Strictly 4 Members per team.
Your solution must integrate these three techniques:
- Scraper: You must write a script to crawl
wiki.metakgp.org(and related MetaWiki pages). - Ingestion: Clean the HTML/Wikitext and chunk it for retrieval (RAG).
- Constraint: NO external datasets allowed. If the answer isn't on MetaKGP, the bot should say "I don't know."
Model reasoning as a directed graph.
- Nodes: Facts extracted from your scraped documents.
- Edges: Logical connections between different wiki pages.
- Goal: Connect disparate pieces of info (e.g., connect a Society page to a Student page).
Implement specific verifiers that check against your scraped data:
- Source Matcher: "Does the text in the retrieved chunk actually support this claim?"
- Hallucination Hunter: "Is the bot inventing details not present in the scraped context?"
- Logic Expert: "Does the conclusion follow from the premises?"
User: "Who are the governors of the Technology Literary Society?"
System Output:
- Step 1 (Scrape/Retrieve): System searches vector store for "Technology Literary Society governors".
- Step 2 (Reasoning Paths):
- Path A: Claims "John Doe" (Based on 2018 data). -> Context Expert: Outdated.
- Path B: Claims "Jane Smith" (Based on hallucination). -> Source Matcher: Citation missing.
- Path C: Claims "Current Governors listed in 2025 section". -> Source Matcher: Verified.
- Step 3 (Final Answer): "The current governors are... [List]. (Source: MetaKGP/TLS_Page)"
- Allowed Source: ONLY
wiki.metakgp.org(and associated MetaWiki domains). - Forbidden: Wikipedia, Google Search API, or pre-trained knowledge usage.
- Scraping: You must implement the scraping logic. Using a pre-downloaded dump is not allowed—your code must show how data is fetched.
- Open Source Only: (LangChain, Scrapy, BeautifulSoup, Selenium, etc.)
- API Limits: Stay within provided free tier limits ($50/team).
| Criteria | Points | Description |
|---|---|---|
| Data Pipeline | 30 | Effectiveness of the scraper, cleaning, and indexing strategies. |
| Verification (MoE) | 30 | Ability to detect and stop hallucinations using the experts. |
| MetaKGP Fidelity | 20 | CRITICAL: Answers must be traceable back to specific MetaKGP URLs. |
| UX & Demo | 20 | Working chatbot, citation links, and graph visualization. |
Deadline: Friday, Jan 16 @ 12:00 PM.
- Fork this repository.
- Create a folder:
submissions/YOUR_TEAM_NAME. - Include your Scraper Code and Chatbot Code.
- Add a
README.mdusing the Submission Template. - Open a Pull Request (PR) to the
mainbranch.
- Source Code (Scraper + Bot).