Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
58 changes: 0 additions & 58 deletions .github/actions/override-submodules-from-pr/action.yml

This file was deleted.

11 changes: 0 additions & 11 deletions .github/codeql/codeql-config.yml

This file was deleted.

73 changes: 37 additions & 36 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,43 +1,44 @@
# Build outputs
build/
builddir/
*.o
*.a
*.so
*.dylib
*.dll
*.exe
# AI Agent directories
.opencode/plans/

# Meson subproject cache / auto-generated wrap redirects
subprojects/packagecache/
subprojects/.wraplock
subprojects/*.wrap
!subprojects/gtest.wrap
# Wrap-extracted source trees (fetched by `meson setup`, never checked in)
subprojects/googletest-*/
# Proprietary model sources — internal only; must never be committed to the
# public remote. The public tree ships a fixed allow-list of model directories;
# every OTHER model subdirectory is ignored so a new proprietary model dropped
# into the tree cannot be `git add`ed by accident. (gitignore never hides files
# that are already tracked, so the allow-listed dirs below stay tracked.)
/src/models/*/
!/src/models/qnn/
/src/models/qnn/*/
!/src/models/qnn/gemma4-e2b-qnn/

# Test staging (download_qwen3_0.6b.sh)
.test_cache/
models/qwen3-0.6b-w16a16/
# Build directories (meson)
builddir/
builddir_x86/
builddir_android/

# NNTrainer runtime log directory, created at test time
logs/
# NDK build outputs
src/jni/libs/
src/jni/obj/
api/jni/libs/
api/jni/obj/
api-app/jni/libs/
api-app/jni/obj/

# Encoder staging
encoder/
encoder-*.tar.gz
json.hpp
# Copied headers
include/
!qnn/jni/qnn/PAL/include/

# Android build outputs
jni/libs/
jni/obj/
# res (model/QNN resource dirs)
res/
!Android/SampleTestAPP/src/main/res/

# Editor junk
.vscode/
.idea/
.DS_Store
*.swp
cross/android-aarch64.cross

# Python
__pycache__/
*.pyc
*.dex
*.bin
*.json
*.so
*.log
# Internal-only docs — must NOT ship in public releases (kept locally only)
docs/tasks/
docs/superpowers/
9 changes: 6 additions & 3 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
[submodule "subprojects/nntrainer"]
path = subprojects/nntrainer
url = https://github.com/eunjuyang/nntrainer.git
[submodule "nntrainer"]
path = nntrainer
url = https://github.sec.samsung.net/j2z0-lee/nntrainer
[submodule "xgrammar"]
path = xgrammar
url = https://github.com/mlc-ai/xgrammar.git
15 changes: 15 additions & 0 deletions Android/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
*.iml
.gradle
/local.properties
/.idea/caches
/.idea/libraries
/.idea/modules.xml
/.idea/workspace.xml
/.idea/navEditor.xml
/.idea/assetWizardSettings.xml
.DS_Store
/build
/captures
.externalNativeBuild
.cxx
local.properties
196 changes: 196 additions & 0 deletions Android/Architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
# Android Architecture 📱

This document describes the current Android state of Quick.AI and separates it
from the planned REST/foreground-service layer that older documents described
as if it already existed.

## ✅ Current Gradle Modules

The Android build currently includes:

```text
Android/
├── QuickDotAI/ # AAR module
└── SampleTestAPP/ # Direct sample app using the AAR
```

`Android/settings.gradle.kts` includes only `:QuickDotAI` and
`:SampleTestAPP`.

## 🧱 QuickDotAI AAR

`QuickDotAI` exposes the public Kotlin API in
`com.example.quickdotai`.

Key files:

| File | Role |
|---|---|
| `QuickDotAI.kt` | Public interface and `BackendResult` / `StreamSink` contracts |
| `Types.kt` | Serializable request/response DTOs, model enums, errors, metrics |
| `NativeQuickDotAI.kt` | Kotlin wrapper around one native `CausalLmHandle` |
| `NativeCausalLm.kt` | Low-level JNI declarations |
| `LiteRTLm.kt` | LiteRT-LM engine wrapper for the `gemma4` (`ModelIds.GEMMA4`) model |
| `NativeChatSession.kt` | Native chat-session helper |
| `LiteRTLmChatSession.kt` | LiteRT-LM chat-session helper |
| `ImageStore.kt` | Per-session image cache |
| `LlavaNextImageProcessor.kt` | Native multimodal preprocessing helper |
| `SigLipNaFlexImageProcessor.kt` | SigLIP/LFM2 fixed-size native preprocessing |
| `JepaImageProcessor.kt` | JEPA/QNN native preprocessing |
| `src/main/cpp/quickai_jni.cpp` | JNI bridge to `quick_dot_ai_api.h` |
| `src/main/cpp/CMakeLists.txt` | Builds `libquickai_jni.so` and links `libquick_dot_ai_api.so` |

## 🔌 Native Path

`NativeQuickDotAI` owns one native handle:

```text
NativeQuickDotAI
└── NativeCausalLm.ensureLoaded()
├── System.loadLibrary("qnn_context")
└── System.loadLibrary("quickai_jni")
└── links/calls libquick_dot_ai_api.so
```

The native API surface is declared in `api/quick_dot_ai_api.h`.
The preferred calls are handle-based:

- `loadModelHandle`
- `runModelHandleWithMessagesStreaming`
- `runModelHandleWithJsonStreaming`
- `runMultimodalHandleStreaming`
- `loadMultimodalCompositionJsonNative`
- `cancelModelHandle`
- `destroyModelHandle`

For descriptor-driven multimodal composition requests, `NativeQuickDotAI`
builds JSON from the `LoadModelRequest` composition fields and calls
`loadMultimodalCompositionJsonNative`, which forwards to the C API entry point
`loadMultimodalCompositionJson()`.

## ModelCatalog

Model selection in the AAR is driven by the `ModelCatalog` singleton. Models
are identified by string ids rather than an enum.

### Seeding

`ModelCatalog` is seeded on first access by calling `nativeQueryCatalog()`
through JNI, which delegates to `getModelCatalogJson()` in
`libquick_dot_ai_api.so`. Hardcoded LiteRT descriptors (e.g., `gemma4`) are
merged in at the Kotlin layer.

### Key types

| Type | Role |
|---|---|
| `enum class RuntimeKind { NATIVE, LITERT }` | Selects the engine path |
| `enum class Capability { STREAMING, MESSAGES_API, MULTIMODAL, TOOL_USE, EMBEDDING, MULTI_IMAGE, VISION_ENCODER }` | Per-model feature flags |
| `enum class ModelRole { UNKNOWN, TEXT_LLM, VISION_ENCODER, CONNECTOR, COMPOSITION }` | Component/composition role |
| `data class ModelDescriptor(id, family, displayName, runtime, backends, capabilities, role, embeddingDim, compatibleWith)` | Descriptor from the catalog |
| `object ModelIds` | String constants for well-known model ids |
| `object ModelCatalog` | Singleton: `all()`, `families()`, `selectable()`, `selectableFamilies()`, `runtimesFor(family)`, `backendsFor(family, rt)`, `resolve(family, rt, backend)`, `byId(id)` |

### 3-axis cascading UI

`SampleTestAPP` presents a 3-axis cascading UI:

1. **Family** — populated from `ModelCatalog.selectableFamilies()`
2. **Runtime chip row** — populated from `ModelCatalog.runtimesFor(selectedFamily)`
3. **Backend chip row** — populated from `ModelCatalog.backendsFor(selectedFamily, selectedRuntime)`

The app lists only **selectable** (generative) models. Embedding-only models
such as `tiny-bert` — which expose only the `EMBEDDING` capability and have no
public output path — are filtered out by `selectableFamilies()`. They remain in
the AAR catalog and are still reachable through `ModelCatalog.all()` /
`ModelCatalog.byId(...)`.

The resolved descriptor is obtained via `ModelCatalog.resolve(family, runtime, backend)`
and passed directly to `createEngine()`.

### Engine factory

```kotlin
QuickDotAI.createEngine(context, descriptor: ModelDescriptor): QuickDotAI
```

`createEngine` dispatches to `NativeQuickDotAI` (for `RuntimeKind.NATIVE`) or
`LiteRTLm` (for `RuntimeKind.LITERT`) based on `descriptor.runtime`.

### LoadModelRequest

`LoadModelRequest.modelId` is a `String` catalog id. The cache key is
`"$modelId:${quantization.name}"` for legacy single-model loads. For
composition loads, the key includes `compositionId`, `llmModelId`,
`llmBackend`, `visionModelId`, `visionBackend`, `connectorModelId`,
`connectorBackend`, and quantization so one process cannot accidentally reuse a
handle loaded with a different component/backend tuple.

The composition fields are:

```kotlin
compositionId
llmModelId
llmBackend
visionModelId
visionBackend
connectorModelId
connectorBackend
```

When `compositionId` is null, `NativeQuickDotAI.load()` dispatches
`loadModelHandleByNameNative`. When `compositionId` is set, it dispatches
`loadMultimodalCompositionJsonNative`.

## 🌗 LiteRT Runtime Path

`LiteRTLm` is selected for the `gemma4` (`ModelIds.GEMMA4`) model and takes a `.litertlm` file path
through `LoadModelRequest.modelPath`. `visionBackend != null` enables
multimodal calls for engines/models that support image input.

## 🧵 Threading Model

A `QuickDotAI` instance is not internally thread-safe. Host apps should drive a
loaded engine from one worker thread. `SampleTestAPP` follows this pattern with
a background dispatcher.

Streaming callbacks are delivered to the caller-provided `StreamSink`.
Apps that update UI must marshal callbacks to the main thread.

## 🧪 SampleTestAPP

`SampleTestAPP` is the current runnable Android sample. It links the
`:QuickDotAI` module directly; it does not start a REST service and does not
communicate over sockets.

## 🗺️ Planned Service Layer

The following pieces are design targets, not current Gradle modules:

| Planned component | Status |
|---|---|
| `LauncherApp` foreground-service bootstrap UI | Planned |
| `QuickAIService` remote foreground service | Planned |
| NanoHTTPD loopback REST server | Planned |
| `RequestDispatcher`, `ModelRegistry`, `ModelWorker` | Planned |
| Standalone REST client app | Planned |

When implemented, the service layer should wrap the same `QuickDotAI` AAR
contract rather than inventing a separate model API.

## 📦 Packaging

`apk-build-install.sh` performs the current full Android workflow:

1. Build native libraries with `./build.sh --platform=android --enable-qnn --clean`.
2. Install/copy native shared libraries through `apk_install_android.sh`.
3. Copy `.so` files into `Android/QuickDotAI/prebuilt_libs/`.
4. Run Gradle install for `:SampleTestAPP`.

Set `NDK_ROOT` inside `apk-build-install.sh` before using it on a new machine.

## 📎 Related Docs

- [QuickDotAI AAR API](QuickDotAI/README.md)
- [Android Native Async & Streaming](AsyncAndStreaming.md)
- [Main README](../README.md)
Loading
Loading