A full-stack RAG (Retrieval-Augmented Generation) application built with Embabel that enables intelligent conversations over your documents. Ask questions about your content, and get accurate, context-aware answers powered by AI and semantic search.
Knowledge Agent is a demonstration project showcasing how to build production-ready AI agents using the Embabel framework on the JVM. It combines:
- Intelligent document understanding - Ingest markdown files, PDFs, and other documents
- Semantic search - Find relevant information using vector embeddings and Apache Lucene
- Conversational AI - Chat naturally with your documents using OpenAI's GPT models
- Full-stack experience - Modern React frontend with Chakra UI and robust Java backend
- Spring Security - Built-in authentication for secure access
This project serves as both a working application and a reference implementation for building your own AI-powered knowledge bases.
The chat interface in action - asking questions about Embabel blog posts with real-time agent event monitoring
DICE at work - extracting and managing knowledge propositions with entity mentions, reasoning, and confidence scores
- π Document Ingestion: Automatically process and index documents from the
data/directory using Apache Tika - π Semantic Search: Leverage Lucene-based vector search to find contextually relevant information
- π¬ Conversational Interface: Intuitive chat UI with real-time streaming responses
- π§ Proposition Memory: Extract and manage structured knowledge propositions from conversations with DICE (Dynamic Insight Capture Engine)
- π Authentication: Spring Security integration with user-aware responses
- π¨ Modern UI: React + TypeScript + Vite with Chakra UI components
- ποΈ Production-Ready: Spring Boot backend optimized for reliability and performance
knowledge-agent/
βββ agent/ # Spring Boot backend application
β βββ src/main/java/
β β βββ dev/jettro/knowledge/
β β βββ chat/ # Chat actions and SSE streaming
β β βββ ingest/ # Document ingestion endpoints
β β βββ security/ # Authentication configuration
β βββ pom.xml
βββ frontend/ # React + Vite frontend
β βββ src/
β β βββ components/ # React components (chat, auth, UI)
β β βββ hooks/ # Custom React hooks
β β βββ context/ # React context providers
β β βββ api.ts # Backend API client
β βββ package.json
βββ data/ # Document corpus (markdown files)
βββ pom.xml # Parent Maven configuration
Backend:
- Java 21
- Spring Boot 3.5.9
- Embabel Agent SDK 0.3.1 (RAG framework)
- Apache Lucene (vector search)
- Apache Tika (document processing)
- OpenAI API (LLM and embeddings)
Frontend:
- React 19
- TypeScript
- Vite 7
- Chakra UI 3
- Server-Sent Events (SSE) for streaming
- Java 21 or higher
- Maven 3.6+
- Node.js 20+ (automatically installed by frontend-maven-plugin)
- OpenAI API Key
-
Set up your OpenAI API key:
export OPENAI_API_KEY='your-api-key-here'
-
Add your documents (optional): Place markdown files or other documents in the
data/directory. The project includes sample blog posts about Embabel and related topics.
The project uses Maven to orchestrate both backend and frontend builds:
# Build everything (backend + frontend)
mvn clean package
# The frontend build is automatically triggered during the Maven build process
# via the frontend-maven-plugin# Run the Spring Boot application
cd agent
mvn spring-boot:run
# Or run the packaged JAR
java -jar target/agent-1.0-SNAPSHOT.jarThe application will start on http://localhost:8080
For active frontend development with hot reloading:
# Terminal 1: Run the backend
cd agent
mvn spring-boot:run
# Terminal 2: Run the frontend dev server
cd frontend
npm install
npm run devFrontend dev server runs on http://localhost:5173 with API proxy to the backend.
Before chatting, you need to ingest documents into the search index:
# Using curl
curl -X POST http://localhost:8080/ingest
# Or visit the ingestion endpoint in your browser (requires authentication)This processes all files in the data/ directory and indexes them for semantic search.
- Open http://localhost:8080 in your browser
- Log in with your credentials (configure in Spring Security)
- Ask questions about your documents:
- "What is Embabel?"
- "Tell me about building agents"
- "Explain the RAG implementation"
The AI assistant will search your documents and provide contextually relevant answers, addressing you by your username.
The application includes DICE (Dynamic Insight Capture Engine) for extracting and managing structured knowledge propositions from conversations.
What are Propositions?
Propositions are structured, factual statements extracted from text that capture knowledge about entities and their relationships. Each proposition includes:
- Text: The factual statement (e.g., "Jettro asked for information about blogs about Embabel")
- Mentions: Entities involved with their roles (SUBJECT, OBJECT, etc.)
- Confidence: AI's confidence in the proposition accuracy (0-1)
- Decay: Memory decay factor for time-based relevance
- Reasoning: Explanation of why the proposition was extracted
- Status: ACTIVE, INACTIVE, or DEPRECATED
Using the Propositions Interface:
- Navigate to the Propositions tab in the application
- View all extracted propositions with their metadata
- Use the Test Extraction Pipeline to test proposition extraction on custom text
- Delete propositions that are no longer relevant
Propositions enable the agent to build long-term memory of user preferences, behaviors, and relationships, making conversations more personalized over time.
Key files to understand the implementation:
ChatActions.java- Core AI action that handles user messages and orchestrates RAGIngestController.java- Document ingestion and indexing logicChatConfiguration.java- Embabel agent configurationPropositionCard.tsx- Reusable component for displaying propositionsPropositionList.tsx- Proposition management interfacePropositionExtractor.tsx- Test extraction pipeline UIApp.tsx- Frontend application and chat interfaceapplication.yml- Model configuration (GPT models, embeddings)
Edit agent/src/main/resources/application.yml:
embabel:
models:
default-llm: gpt-5-mini
default-embedding-model: text-embedding-3-small
llms:
CHEAPEST: gpt-5-mini
standard: gpt-5-mini
best: gpt-5By default, Lucene creates an index at ./.lucene-index. This can be customized via Embabel configuration.
This project demonstrates several Embabel concepts:
- Actions - Event-driven AI behaviors triggered by user messages
- RAG (Retrieval-Augmented Generation) - Using
ToolishRagto ground AI responses in your documents - Conversation Management - Maintaining chat history and context
- Output Channels - Streaming responses via SSE
- Security Integration - User-aware AI agents with Spring Security
For more examples and detailed documentation, visit the Embabel documentation.
This is a personal demonstration project, but feel free to:
- Fork and experiment with your own enhancements
- Use it as a template for your own Embabel projects
- Share feedback and ideas
This project is provided as-is for educational and demonstration purposes.
- Explore the sample documents in the
data/directory for examples - Check out the blog posts about Embabel and agent development
- Review the code comments for implementation details
Ready to build your own AI agent? Start by adding your documents to the data/ directory, running the ingestion endpoint, and asking questions! π