gjuha

gjuha (Albanian) — the language

The most modern Albanian language learning experience ever built.

iOS-native · Offline-first · Culturally immersive · A1 → B2

What is Gjuha?

Albanian is spoken by 8 million people. It has no major learning app. No Duolingo course. No Rosetta Stone. No Babbel.

Gjuha is here to change that.

Built for the foreign partners of Albanians, the diaspora reconnecting with their roots, the expats living in Tirana, and the language nerds drawn to one of the oldest surviving Indo-European languages on earth.

This is not a hobby project. It is a long-term platform designed to reach Duolingo-scale depth — A1 through B2 — with better grammar transparency, real cultural immersion, and an engine that generates exercises rather than hardcoding sentences.

Features

✦  Skill tree home screen          ✦  Verb conjugation exercises
✦  Lesson session engine           ✦  Noun declension system
✦  Multiple choice exercises       ✦  Grammar reference tables
✦  Translation input               ✦  Fill-in-the-blank
✦  Word matching                   ✦  Spaced repetition (planned)
✦  Streak + XP system              ✦  Cultural immersion modules (planned)
✦  Offline-first — always works    ✦  Audio pronunciation (planned)
✦  Dark mode adaptive UI           ✦  Gheg dialect mode (planned)

Tech Stack

Layer	Choice	Why
UI	SwiftUI	Native, declarative, fast
Architecture	TCA 1.23	Unidirectional, testable, composable
Persistence	SwiftData	First-class iOS 17 ORM
DI	swift-dependencies	Mockable, testable, TCA-native
Content	Custom engine + JSON seed	Generated exercises, not hardcoded
Distribution	App Store	iOS 17+

Architecture

The project uses The Composable Architecture (TCA) with strict feature isolation. Every screen is a self-contained reducer module.

Gjuha/
├── App/                        # @main, AppFeature, root NavigationStack
│
├── Features/                   # Self-contained TCA modules
│   ├── Onboarding/             # Goal setting, reason selection
│   ├── Home/                   # Skill tree, streak header, lesson nodes
│   ├── Lesson/                 # Session coordinator (hearts, progress, XP)
│   ├── Exercise/               # MCQ, translate, fill-blank, match
│   ├── Vocabulary/             # Browser with CEFR filter + search
│   ├── Grammar/                # Reference tables + conjugation viewer
│   ├── Streak/                 # Streak logic + XP tracking
│   └── Profile/                # Stats dashboard, daily goal picker
│
├── Core/
│   ├── DesignSystem/           # GjuhaColors, GjuhaFonts, shared components
│   ├── Engine/                 # Exercise generator, scoring, curriculum
│   ├── Extensions/             # Swift + SwiftUI extensions
│   └── Utilities/              # Constants, helpers
│
├── Data/
│   ├── Models/                 # SwiftData @Model classes (Word, Lesson)
│   ├── Repositories/           # Protocol + Live + Mock implementations
│   ├── SwiftData/              # Schema, container, migrations
│   └── Seed/                   # JSON datasets — the content engine fuel
│       ├── Vocabulary/         # word_frequency_top6000.csv + curated entries
│       ├── Lessons/            # Unit + lesson definitions
│       └── Grammar/            # Morphology rules, conjugation tables
│
└── Resources/
    ├── Assets.xcassets/        # 14 adaptive color pairs (light + dark)
    ├── Audio/                  # Pronunciation files (planned)
    └── Fonts/                  # Custom typeface (planned)

Key Architectural Decisions

TCA everywhere. No MVVM, no Combine, no ViewModels. Every screen is a @Reducer with typed State, Action, and an Effect-based body.

Value types in State. SwiftData @Model classes never enter TCA State. LessonSummary is a Sendable value type that mirrors Lesson for use in reducers — keeping Swift 6 strict concurrency clean throughout.

Protocol-based repositories. Every data source is behind a protocol: Sendable with a Live implementation and a Mock for tests and previews. Wired via @Dependency.

Content engine, not hardcoded sentences. All exercises are generated from seed data (vocabulary + templates + morphology rules). No exercise is written by hand.

Content Dataset

The content pipeline is already running:

Dataset	Status	Size
Curated vocabulary seed	✅ Done	600 entries (A1–A2)
Word frequency list	✅ Generated	Top 6,000 from Tatoeba corpus
Albanian sentence corpus	✅ Generated	~2,600 sentences (sqi)
Sentence templates	✅ Done	300 templates
Exercise seed	✅ Done	1,000 exercises
Morphology rules	✅ Done	Verb + noun patterns
A1–B1 curriculum plan	✅ Done	120 lessons across 12 units
Grammar topic dataset	🔄 In progress	Conjugation tables, case rules
Audio recordings	📋 Planned	Phase 2

Target vocabulary: 4,000–6,000 entries covering A1–B2. Target exercises: 30,000–60,000 generated variations.

Content Pipeline

# Pull open datasets (Tatoeba + Wiktionary + Leipzig)
bash Scripts/tools/fetch_sources.sh

# Generate frequency list + sentence corpus
python3 Scripts/tools/build_from_sources.py

# Output:
#   Gjuha/Data/Seed/Vocabulary/word_frequency_top6000.csv
#   Gjuha/Data/Seed/sentences_sqi_50k.csv

Albanian — Why It's Interesting

Albanian is one of the oldest surviving branches of Indo-European — an entire branch unto itself, not a sub-family. It has no close living relatives.

mirëdita        →  good day (mirë + ditë — "good" + "day")
faleminderit    →  thank you (via Ottoman Turkish, from Arabic)
ju lutem        →  please / you're welcome
shtëpi          →  house (one of the oldest IE root words still in use)

It features postposed definite articles (the article attaches to the end of the noun), five grammatical cases, two major dialects (Tosk in the south, Gheg in the north), and verb moods that most Western learners have never encountered.

Gjuha teaches all of it — clearly, progressively, with cultural context.

Curriculum Plan

12 units · 120 lessons · A1 → B2

Unit 01  — Survival Basics        [A1]  Greetings, pronouns, to be/have
Unit 02  — Everyday Life          [A1]  Work, transport, health, market
Unit 03  — Talking About You      [A1]  People, adjectives, past tense
Unit 04  — Getting Around         [A2]  Directions, places, plans
Unit 05  — People & Relationships [A2]  Family, feelings, social life
Unit 06  — Language in Action     [A2]  Opinions, requests, comparison
Unit 07  — Albanian Society       [B1]  Culture, media, current affairs
Unit 08  — Work & Ambition        [B1]  Career, formal register
Unit 09  — Nature & Environment   [B1]  Climate, geography, seasons
Unit 10  — Deeper Connections     [B1]  Abstract emotions, storytelling
Unit 11  — Fluency Building       [B2]  Idioms, nuance, debate
Unit 12  — Cultural Mastery       [B2]  Literature, history, identity

Design System

A semantic token system built around Albanian character — not borrowed from Duolingo.

// Colors
Color.gjuha.accent          // Deep Albanian red — primary actions
Color.gjuha.streak          // Flame orange — motivation layer
Color.gjuha.xp              // Gold — reward moments
Color.gjuha.background      // Off-white / near-black (adaptive)
Color.gjuha.surface         // Card background (adaptive)

// Typography
Font.gjuha.displayLarge     // 48pt Black Rounded — hero moments
Font.gjuha.headingLarge     // 28pt Bold Rounded — screen titles
Font.gjuha.exercisePrompt   // 26pt Semibold — the question
Font.gjuha.answerOption     // 18pt Medium — tappable answers
Font.gjuha.bodyRegular      // 16pt Regular — body copy

All 14 colors are defined as named asset pairs with automatic light/dark variants. All spacing follows a strict 4pt grid (4, 8, 12, 16, 24, 32, 48).

Roadmap

Phase 1 — Foundation (current)

Phase 2 — Content Depth

Phase 3 — Engagement

Spaced repetition vocabulary review
Streak mechanics + daily notifications
Achievements system
Cultural immersion modules

Phase 4 — Scale

A2 full curriculum
Remote content packs (downloadable A2, B1)
User accounts + sync
Gheg dialect mode

Getting Started

# Requirements: Xcode 26+, iOS 17+ simulator

git clone https://github.com/pdhespollari/gjuha.git
cd gjuha
open Gjuha.xcodeproj

On first build, Xcode will ask to trust TCA macros — click Trust & Enable Macros, then run.

Regenerate the Xcode project

The project is spec-managed with XcodeGen:

brew install xcodegen
xcodegen generate --spec project.yml

Run the content pipeline

# 1. Download open datasets (~1.8 GB total — Tatoeba + Wiktionary)
bash Scripts/tools/fetch_sources.sh

# 2. Build frequency list + sentence corpus
python3 Scripts/tools/build_from_sources.py

Design Principles

Grammar transparency. Albanian grammar is complex and systematic. Gjuha explains the why — declension cases, verb moods, definiteness — not just the surface forms. Better than Duolingo's surface-level approach.

Cultural authenticity. Real Albanian life: coffee culture, besa (the code of honour), xhiro (the evening walk), regional identity, hospitality. Not textbook sentences about going to the library.

Minimal but energetic. Premium aesthetic. Not childish. Not a Duolingo reskin. A distinct identity built for adults who take language seriously.

Offline-first. The full core course lives in the app bundle. Works on a plane. Works in Shkodër with no signal. Always.

Content engine. One vocabulary entry powers dozens of exercise variations. The engine generates from templates × morphology rules × vocabulary. No content is written by hand.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Documentation		Documentation
Gjuha.xcodeproj		Gjuha.xcodeproj
Gjuha		Gjuha
GjuhaTests		GjuhaTests
GjuhaUITests		GjuhaUITests
Scripts		Scripts
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Package.swift		Package.swift
README.md		README.md
project.yml		project.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gjuha

gjuha (Albanian) — the language

What is Gjuha?

Features

Tech Stack

Architecture

Key Architectural Decisions

Content Dataset

Content Pipeline

Albanian — Why It's Interesting

Curriculum Plan

Design System

Roadmap

Phase 1 — Foundation (current)

Phase 2 — Content Depth

Phase 3 — Engagement

Phase 4 — Scale

Getting Started

Regenerate the Xcode project

Run the content pipeline

Design Principles

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

gjuha

gjuha (Albanian) — the language

What is Gjuha?

Features

Tech Stack

Architecture

Key Architectural Decisions

Content Dataset

Content Pipeline

Albanian — Why It's Interesting

Curriculum Plan

Design System

Roadmap

Phase 1 — Foundation (current)

Phase 2 — Content Depth

Phase 3 — Engagement

Phase 4 — Scale

Getting Started

Regenerate the Xcode project

Run the content pipeline

Design Principles

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages