Turn educational text into a validated, difficulty-ranked quiz using an agentic pipeline.
Deterministic by default, with optional free LLM-based refinement.
Students and educators often struggle to convert raw notes into effective assessment material. Existing tools either require manual effort or rely entirely on opaque large language models that can hallucinate, behave inconsistently, and are difficult to validate.
Autonomous Knowledge Extractor is an agentic, modular pipeline that processes educational text end-to-end to:
- Extract key concepts and definitions
- Organize concepts into a hierarchical knowledge graph
- Generate quiz questions from grounded definitions
- Rank questions by difficulty
- Validate difficulty logic and consistency
The system runs locally using deterministic logic, with an optional LLM refinement step that improves wording without affecting correctness.
- Agentic architecture with clearly separated stages
- Explicit knowledge graph (IS_A, PART_OF, CONTAINS relationships)
- Deterministic core logic (same input produces same output)
- Automatic difficulty ranking based on graph structure
- Built-in self-validation of quiz difficulty
- Optional LLM refinement with safe fallback (no dependency on LLM availability)
Raw Text → Preprocessor → Concept Extractor → Hierarchy Builder
→ Knowledge Graph → Quiz Generator → Difficulty Ranker
→ Difficulty Validator → Final Quiz
Python 3.10 or newer is required.
Clone the repository and move into the project directory:
git clone https://github.com/yourusername/autonomous-knowledge-extractor.git
cd autonomous-knowledge-extractor
There are no mandatory external dependencies.
Run the built-in demo text:
python -m src.main --demo
Create a text file (for example, notes.txt) containing educational content and run:
python -m src.main --input-file notes.txt
Enable optional LLM-based wording refinement:
python -m src.main --input-file notes.txt --use-llm
The system will automatically fall back to deterministic logic if no API token is available.
Heuristic NLP identifies noun phrases and definition-style sentences to extract candidate concepts.
Semantic relationships such as IS_A and PART_OF are inferred to build a knowledge graph.
Questions are generated only from grounded definitions to avoid noise and hallucinations.
Difficulty is computed using graph depth, concept specificity, and relationship complexity.
A validation stage checks that difficulty ordering is logically consistent before output is shown.
The system follows an agentic workflow:
- Decomposition: each stage is handled by a specialized agent
- State passing: agents communicate via structured artifacts such as graphs and quizzes
- Reflection: the validator critiques and approves the output before release
This mirrors modern agentic system design without relying entirely on LLM reasoning.
MIT License.
Built for Hackathon 2025.