This is a focused MVP for the flow you described:
- upload an arXiv-style PDF
- generate an anime-style guided lesson from the paper
- show Airi's transcript on screen
- ask questions by typing
- tell her to carry on through the next lesson beat
- Next.js App Router
- OpenAI Responses API for PDF-aware analysis
- OpenAI TTS for spoken playback
- Web Audio API loudness tracking for VRM mouth animation
- Three.js +
@pixiv/three-vrmfor the browser avatar
- Install dependencies:
npm install- Add your API key:
copy .env.example .env.local- Start the app:
npm run dev-
Open
http://localhost:3000 -
Click
Build lessonafter choosing a PDF.
- The backend uploads the PDF to OpenAI with
purpose: "user_data"and asksgpt-5to plan and teach the lesson. - The initial response creates the study plan only. Each
Carry ongenerates the next spoken beat so the lesson length stays controllable. - Spoken playback uses OpenAI TTS via
/api/tts, with a feminineshimmerdefault voice and browser loudness analysis driving VRM mouth expressions in the canvas viewer. - You can change the TTS voice with
OPENAI_TTS_VOICEin.env.local. - You can enable a branch-specific paywall by setting
NEXT_PUBLIC_ENABLE_PAYWALL=true. When enabled, uploads remain available and theBuild lessonaction opens a pricing modal instead of starting lesson generation. SetNEXT_PUBLIC_PAYWALL_CTA_URLto send users to checkout, and optionally setNEXT_PUBLIC_PAYWALL_PRICEto override the default$99 / mthlabel. - A bundled sample girl model is available at
public/vrm/AvatarSample_A.vrmand loads by default. AvatarSample_A.vrmis sourced from themadjin/vrm-samplesrepository; keep that attribution with the sample asset and follow the usage terms noted in that project README.- The avatar uses a small procedural idle pose in the browser: gentle sway, breathing, and arm settling on top of the neutral pose.
- This is an AIRI-style tutor shell, not the full
moeru-ai/airiapplication.
To move this MVP closer to a full tutor experience, the next iterations should focus on:
- Stability: harden upload and lesson generation flows, improve retry handling, surface clearer errors, add request timeouts, and cover key flows with integration tests.
- Expressiveness: expand avatar animation beyond mouth movement with gestures, head turns, emotion states, better timing between speech and motion, and more natural lesson pacing.
- Quizzes: generate checkpoint questions after each lesson segment, support multiple-choice and short-answer formats, grade responses, and adapt the next explanation based on mistakes.
- Images: extract or generate supporting visuals for paper concepts such as diagrams, figures, summaries, and step-by-step breakdowns that appear alongside the tutor.
- Videos: add short visual explainers for dynamic concepts, timeline-style lesson playback, and synchronized audio, captions, and scene changes for a more complete teaching flow.
- Full tutor features: introduce memory across sessions, progress tracking, lesson objectives, study recommendations, and a richer back-and-forth teaching loop that feels closer to a complete personal tutor.
