nntrainer · jayden0701 · Jun 4, 2026 · Jun 5, 2026 · Jun 5, 2026 · Jun 4, 2026
diff --git a/.github/actions/override-submodules-from-pr/action.yml b/.github/actions/override-submodules-from-pr/action.yml
diff --git a/.github/codeql/codeql-config.yml b/.github/codeql/codeql-config.yml
diff --git a/.gitignore b/.gitignore
@@ -1,43 +1,44 @@
-# Build outputs
-build/
-builddir/
-*.o
-*.a
-*.so
-*.dylib
-*.dll
-*.exe
+# AI Agent directories
+.opencode/plans/
 
-# Meson subproject cache / auto-generated wrap redirects
-subprojects/packagecache/
-subprojects/.wraplock
-subprojects/*.wrap
-!subprojects/gtest.wrap
-# Wrap-extracted source trees (fetched by `meson setup`, never checked in)
-subprojects/googletest-*/
+# Proprietary model sources — internal only; must never be committed to the
+# public remote. The public tree ships a fixed allow-list of model directories;
+# every OTHER model subdirectory is ignored so a new proprietary model dropped
+# into the tree cannot be `git add`ed by accident. (gitignore never hides files
+# that are already tracked, so the allow-listed dirs below stay tracked.)
+/src/models/*/
+!/src/models/qnn/
+/src/models/qnn/*/
+!/src/models/qnn/gemma4-e2b-qnn/
 
-# Test staging (download_qwen3_0.6b.sh)
-.test_cache/
-models/qwen3-0.6b-w16a16/
+# Build directories (meson)
+builddir/
+builddir_x86/
+builddir_android/
 
-# NNTrainer runtime log directory, created at test time
-logs/
+# NDK build outputs
+src/jni/libs/
+src/jni/obj/
+api/jni/libs/
+api/jni/obj/
+api-app/jni/libs/
+api-app/jni/obj/
 
-# Encoder staging
-encoder/
-encoder-*.tar.gz
-json.hpp
+# Copied headers
+include/
+!qnn/jni/qnn/PAL/include/
 
-# Android build outputs
-jni/libs/
-jni/obj/
+# res (model/QNN resource dirs)
+res/
+!Android/SampleTestAPP/src/main/res/
 
-# Editor junk
-.vscode/
-.idea/
-.DS_Store
-*.swp
+cross/android-aarch64.cross
 
-# Python
-__pycache__/
-*.pyc
+*.dex
+*.bin
+*.json
+*.so
+*.log
+# Internal-only docs — must NOT ship in public releases (kept locally only)
+docs/tasks/
+docs/superpowers/
diff --git a/.gitmodules b/.gitmodules
@@ -1,3 +1,6 @@
-[submodule "subprojects/nntrainer"]
-	path = subprojects/nntrainer
-	url = https://github.com/eunjuyang/nntrainer.git
+[submodule "nntrainer"]
+	path = nntrainer
+	url = https://github.sec.samsung.net/j2z0-lee/nntrainer
+[submodule "xgrammar"]
+	path = xgrammar
+	url = https://github.com/mlc-ai/xgrammar.git
diff --git a/Android/.gitignore b/Android/.gitignore
@@ -0,0 +1,15 @@
+*.iml
+.gradle
+/local.properties
+/.idea/caches
+/.idea/libraries
+/.idea/modules.xml
+/.idea/workspace.xml
+/.idea/navEditor.xml
+/.idea/assetWizardSettings.xml
+.DS_Store
+/build
+/captures
+.externalNativeBuild
+.cxx
+local.properties
diff --git a/Android/Architecture.md b/Android/Architecture.md
@@ -0,0 +1,196 @@
+# Android Architecture 📱
+
+This document describes the current Android state of Quick.AI and separates it
+from the planned REST/foreground-service layer that older documents described
+as if it already existed.
+
+## ✅ Current Gradle Modules
+
+The Android build currently includes:
+
+```text
+Android/
+├── QuickDotAI/       # AAR module
+└── SampleTestAPP/    # Direct sample app using the AAR
+```
+
+`Android/settings.gradle.kts` includes only `:QuickDotAI` and
+`:SampleTestAPP`.
+
+## 🧱 QuickDotAI AAR
+
+`QuickDotAI` exposes the public Kotlin API in
+`com.example.quickdotai`.
+
+Key files:
+
+| File | Role |
+|---|---|
+| `QuickDotAI.kt` | Public interface and `BackendResult` / `StreamSink` contracts |
+| `Types.kt` | Serializable request/response DTOs, model enums, errors, metrics |
+| `NativeQuickDotAI.kt` | Kotlin wrapper around one native `CausalLmHandle` |
+| `NativeCausalLm.kt` | Low-level JNI declarations |
+| `LiteRTLm.kt` | LiteRT-LM engine wrapper for the `gemma4` (`ModelIds.GEMMA4`) model |
+| `NativeChatSession.kt` | Native chat-session helper |
+| `LiteRTLmChatSession.kt` | LiteRT-LM chat-session helper |
+| `ImageStore.kt` | Per-session image cache |
+| `LlavaNextImageProcessor.kt` | Native multimodal preprocessing helper |
+| `SigLipNaFlexImageProcessor.kt` | SigLIP/LFM2 fixed-size native preprocessing |
+| `JepaImageProcessor.kt` | JEPA/QNN native preprocessing |
+| `src/main/cpp/quickai_jni.cpp` | JNI bridge to `quick_dot_ai_api.h` |
+| `src/main/cpp/CMakeLists.txt` | Builds `libquickai_jni.so` and links `libquick_dot_ai_api.so` |
+
+## 🔌 Native Path
+
+`NativeQuickDotAI` owns one native handle:
+
+```text
+NativeQuickDotAI
+  └── NativeCausalLm.ensureLoaded()
+      ├── System.loadLibrary("qnn_context")
+      └── System.loadLibrary("quickai_jni")
+            └── links/calls libquick_dot_ai_api.so
+```
+
+The native API surface is declared in `api/quick_dot_ai_api.h`.
+The preferred calls are handle-based:
+
+- `loadModelHandle`
+- `runModelHandleWithMessagesStreaming`
+- `runModelHandleWithJsonStreaming`
+- `runMultimodalHandleStreaming`
+- `loadMultimodalCompositionJsonNative`
+- `cancelModelHandle`
+- `destroyModelHandle`
+
+For descriptor-driven multimodal composition requests, `NativeQuickDotAI`
+builds JSON from the `LoadModelRequest` composition fields and calls
+`loadMultimodalCompositionJsonNative`, which forwards to the C API entry point
+`loadMultimodalCompositionJson()`.
+
+## ModelCatalog
+
+Model selection in the AAR is driven by the `ModelCatalog` singleton. Models
+are identified by string ids rather than an enum.
+
+### Seeding
+
+`ModelCatalog` is seeded on first access by calling `nativeQueryCatalog()`
+through JNI, which delegates to `getModelCatalogJson()` in
+`libquick_dot_ai_api.so`. Hardcoded LiteRT descriptors (e.g., `gemma4`) are
+merged in at the Kotlin layer.
+
+### Key types
+
+| Type | Role |
+|---|---|
+| `enum class RuntimeKind { NATIVE, LITERT }` | Selects the engine path |
+| `enum class Capability { STREAMING, MESSAGES_API, MULTIMODAL, TOOL_USE, EMBEDDING, MULTI_IMAGE, VISION_ENCODER }` | Per-model feature flags |
+| `enum class ModelRole { UNKNOWN, TEXT_LLM, VISION_ENCODER, CONNECTOR, COMPOSITION }` | Component/composition role |
+| `data class ModelDescriptor(id, family, displayName, runtime, backends, capabilities, role, embeddingDim, compatibleWith)` | Descriptor from the catalog |
+| `object ModelIds` | String constants for well-known model ids |
+| `object ModelCatalog` | Singleton: `all()`, `families()`, `selectable()`, `selectableFamilies()`, `runtimesFor(family)`, `backendsFor(family, rt)`, `resolve(family, rt, backend)`, `byId(id)` |
+
+### 3-axis cascading UI
+
+`SampleTestAPP` presents a 3-axis cascading UI:
+
+1. **Family** — populated from `ModelCatalog.selectableFamilies()`
+2. **Runtime chip row** — populated from `ModelCatalog.runtimesFor(selectedFamily)`
+3. **Backend chip row** — populated from `ModelCatalog.backendsFor(selectedFamily, selectedRuntime)`
+
+The app lists only **selectable** (generative) models. Embedding-only models
+such as `tiny-bert` — which expose only the `EMBEDDING` capability and have no
+public output path — are filtered out by `selectableFamilies()`. They remain in
+the AAR catalog and are still reachable through `ModelCatalog.all()` /
+`ModelCatalog.byId(...)`.
+
+The resolved descriptor is obtained via `ModelCatalog.resolve(family, runtime, backend)`
+and passed directly to `createEngine()`.
+
+### Engine factory
+
+```kotlin
+QuickDotAI.createEngine(context, descriptor: ModelDescriptor): QuickDotAI
+```
+
+`createEngine` dispatches to `NativeQuickDotAI` (for `RuntimeKind.NATIVE`) or
+`LiteRTLm` (for `RuntimeKind.LITERT`) based on `descriptor.runtime`.
+
+### LoadModelRequest
+
+`LoadModelRequest.modelId` is a `String` catalog id. The cache key is
+`"$modelId:${quantization.name}"` for legacy single-model loads. For
+composition loads, the key includes `compositionId`, `llmModelId`,
+`llmBackend`, `visionModelId`, `visionBackend`, `connectorModelId`,
+`connectorBackend`, and quantization so one process cannot accidentally reuse a
+handle loaded with a different component/backend tuple.
+
+The composition fields are:
+
+```kotlin
+compositionId
+llmModelId
+llmBackend
+visionModelId
+visionBackend
+connectorModelId
+connectorBackend
+```
+
+When `compositionId` is null, `NativeQuickDotAI.load()` dispatches
+`loadModelHandleByNameNative`. When `compositionId` is set, it dispatches
+`loadMultimodalCompositionJsonNative`.
+
+## 🌗 LiteRT Runtime Path
+
+`LiteRTLm` is selected for the `gemma4` (`ModelIds.GEMMA4`) model and takes a `.litertlm` file path
+through `LoadModelRequest.modelPath`. `visionBackend != null` enables
+multimodal calls for engines/models that support image input.
+
+## 🧵 Threading Model
+
+A `QuickDotAI` instance is not internally thread-safe. Host apps should drive a
+loaded engine from one worker thread. `SampleTestAPP` follows this pattern with
+a background dispatcher.
+
+Streaming callbacks are delivered to the caller-provided `StreamSink`.
+Apps that update UI must marshal callbacks to the main thread.
+
+## 🧪 SampleTestAPP
+
+`SampleTestAPP` is the current runnable Android sample. It links the
+`:QuickDotAI` module directly; it does not start a REST service and does not
+communicate over sockets.
+
+## 🗺️ Planned Service Layer
+
+The following pieces are design targets, not current Gradle modules:
+
+| Planned component | Status |
+|---|---|
+| `LauncherApp` foreground-service bootstrap UI | Planned |
+| `QuickAIService` remote foreground service | Planned |
+| NanoHTTPD loopback REST server | Planned |
+| `RequestDispatcher`, `ModelRegistry`, `ModelWorker` | Planned |
+| Standalone REST client app | Planned |
+
+When implemented, the service layer should wrap the same `QuickDotAI` AAR
+contract rather than inventing a separate model API.
+
+## 📦 Packaging
+
+`apk-build-install.sh` performs the current full Android workflow:
+
+1. Build native libraries with `./build.sh --platform=android --enable-qnn --clean`.
+2. Install/copy native shared libraries through `apk_install_android.sh`.
+3. Copy `.so` files into `Android/QuickDotAI/prebuilt_libs/`.
+4. Run Gradle install for `:SampleTestAPP`.
+
+Set `NDK_ROOT` inside `apk-build-install.sh` before using it on a new machine.
+
+## 📎 Related Docs
+
+- [QuickDotAI AAR API](QuickDotAI/README.md)
+- [Android Native Async & Streaming](AsyncAndStreaming.md)
+- [Main README](../README.md)