ukazure.llm

A small C# console project that teaches how a large language model is built, one step at a time.

The repo is aimed at software developers rather than data scientists. Each lesson keeps the mechanics visible and uses tiny Azure and programming-flavoured examples so you can see what the code is doing without needing a GPU cluster or an existential crisis.

What is in here

The interactive lessons currently cover:

lesson1 - a count-based bigram model
lesson2 - replacing counts with learnable weights
lesson3 - sequence windows and ordered context
lesson4 - attention
lesson5 - a tiny transformer-style block
lesson6 - training, inference, and sampling
lesson7 - prompting, retrieval, and grounded answers
nanogpt - a TorchSharp-backed nanoGPT track using a training document
nanogptv2 - the nanoGPT track with embeddings, attention, residuals, feed-forward blocks, and per-position logits

Repository layout

src/ukazure.llm.cli/   Console application source, lessons, nanoGPT tracks, and training data
docs/                  Supporting walkthroughs and architecture notes
README.md              How to run the project and what each command demonstrates

Requirements

.NET 10 SDK
a terminal that supports interactive console input
macOS users running nanogpt may need brew install libomp for TorchSharp's CPU backend

This project uses:

Spectre.Console for the interactive lesson UI
the official OpenAI .NET SDK for the optional live model call in lesson 7
TorchSharp-cpu for the practical nanoGPT command

How to run

From the repo root:

dotnet run --project ukazure.llm.cli.csproj -- lesson1

Replace lesson1 with any lesson name from lesson1 to nanogptv2.

If you run the app without a valid lesson argument, it prints the available lessons:

dotnet run --project ukazure.llm.cli.csproj

Example output:

Usage: dotnet run lesson1|lesson2|lesson3|lesson4|lesson5|lesson6|lesson7|nanogpt|nanogptv2

Available commands:
  lesson1  A tiny bigram model built from counts
  lesson2  Replace counts with learnable weights
  lesson3  Model sequencing with context windows
  lesson4  Attention lets the model choose what to focus on
  lesson5  A tiny transformer-style block
  lesson6  Training, inference, and sampling
  lesson7  Prompting, retrieval, and grounded answers
  nanogpt    nanoGPT in C# with TorchSharp
  nanogptv2  nanoGPT v2: per-position logits

Example lesson flow

Lesson 1

Lesson 1 is interactive. You choose the training sentences, then pick a starting token for generation.

Example run:

Choose the training sentences for lesson 1
  [x] azure deploys to the cloud
  [x] azure scales in the cloud
  [x] dotnet builds in the cloud
  [ ] dotnet runs in containers

Choose the starting token for generation
  azure

Representative output:

Lesson 1: A tiny bigram model built from counts

Seed: azure
Current output: azure

Step 1: azure -> deploys
Current output: azure deploys

Step 2: deploys -> to
Current output: azure deploys to

Final output: azure deploys to the cloud

Lesson 6

Lesson 6 uses the tiny transformer-style block from lesson 5 and shows the difference between training behaviour and inference behaviour.

Representative output:

Lesson 6: Training, inference, and sampling

Top predictions after "az deployment group create":
  with         94.8%
  to            2.1%
  deploy        1.0%

Greedy output:
az deployment group create with

Temperature 0.7, top-k 3:
az deployment group create with bicep

Temperature 1.2, top-k 5:
az deployment group create with json

Lesson 7

Lesson 7 moves from model internals to application architecture. You choose a developer question, the lesson retrieves relevant documents, builds a grounded prompt, and compares answers with and without retrieval.

Representative output:

Choose the developer question for lesson 7
  How should I store secrets for my Azure app without hard-coding credentials?

Retrieved:
  Azure Key Vault
  Managed Identities
  Azure App Service

Without retrieval:
You should use a secure service for secrets, avoid hard-coded credentials, and prefer platform features that reduce direct secret handling.

With retrieval:
Store secrets in Azure Key Vault instead of appsettings files or source code. Use managed identities so the app can authenticate without storing passwords or client secrets. If the app runs on Azure App Service, configure app settings to reference Key Vault secrets.

nanoGPT

The nanogpt command is separate from the numbered lessons. It is a C# rewrite track inspired by Andrej Karpathy's nanoGPT repo, using TorchSharp rather than hand-written arrays for the practical training machinery.

It reads a condensed training document from src/ukazure.llm.cli/data/nanogpt-training.txt. The document is based on the Azure developer guide:

https://docs.azure.cn/en-us/guides/developer/azure-developer-guide

TorchSharp provides tensors, automatic gradients, cross-entropy, and AdamW. The current implementation is intentionally small: it is a character-level model trained on Azure developer text, plus an interactive document-grounded question loop.

Representative output:

nanoGPT: C# with TorchSharp

Path: data/nanogpt-training.txt

Vocabulary size: 60
Block size:     32
Embedding size: 64
Hidden size:    128

Step   1: loss = 4.1052
Step  20: loss = 3.1908
Step  40: loss = 3.0859
Step  60: loss = 2.8595
Step  80: loss = 2.5768

Prompt: az
Sample: az ...

Ask questions about data/nanogpt-training.txt.
Try: What is App Service useful for? or Why would I use Bicep or ARM templates?

Question: What is App Service useful for?
Answer: App Service is useful when a team wants a fast path to publish web projects. App Service for Linux can run custom container images for web applications. Hybrid Connections can connect an App Service application to on premises resources.
Evidence:
  App Service is useful when a team wants a fast path to publish web projects. (score 3)
  App Service for Linux can run custom container images for web applications. (score 2)
  Hybrid Connections can connect an App Service application to on premises resources. (score 2)

Submit a blank question to leave the question loop and finish the command. The generated sample still comes from the tiny TorchSharp character model; the question loop is deliberately grounded in the training document so the demo can answer useful questions without pretending that a small character model has suddenly become a semantic assistant.

Useful demo questions:

What is App Service useful for?
When should I use Azure Functions?
How can developers manage Azure resources?
What does Azure Monitor help with?
Why would I use Bicep or ARM templates?

The evidence score is a simple lexical overlap score from the local document retriever. It is intentionally visible and imperfect, which makes it useful for explaining why production retrieval systems often add embeddings, chunking, reranking, and a final LLM answer-generation step.

For a deeper technical walkthrough of the current implementation, see docs/nanogpt-technical-walkthrough.md.

nanoGPT v2

The nanogptv2 command keeps the same training document and interactive flow as nanogpt, but changes the first part of the model.

Version 1 manually expands token IDs into one-hot vectors and feeds the flattened result into a Linear layer. Version 2 sends integer token IDs into a TorchSharp Embedding layer first. The embedding table learns a dense vector for each character token during training.

Version 2 also adds positional embeddings. A second embedding table learns a vector for each position in the context window. The model adds token vector plus position vector before processing the sequence.

The current v2 step adds multi-head causal self-attention. The attention layer builds query, key, and value projections from the embedded sequence, splits them into multiple heads, runs causal attention independently in each head, concatenates the results, and projects them back to the original embedding width.

The model now also uses layer normalisation before attention and before the feed-forward network. Layer normalisation keeps each token vector in a steadier numerical range, which helps the later layers train more predictably.

The current v2 step adds residual connections. Attention no longer replaces the embedded sequence; it produces an update that is added back to the original sequence. The feed-forward block follows the same pattern. In code terms, the model now follows x = x + attention(layerNorm(x)) and then x = x + feedForward(layerNorm(x)).

The feed-forward block is now explicit. It transforms each token vector independently with embedding -> hidden -> embedding, preserving the sequence shape so the result can be added back through a residual connection. This differs from v1, where the flattened context went straight through one feed-forward-style network to produce logits.

The language-model head now produces logits for every position in the context window. Instead of flattening the whole sequence before scoring, v2 applies a final embedding -> vocabulary projection to each position, creating batch x blockSize x vocabularySize logits. The training loop still uses the final position logits for now so the behaviour remains comparable with v1.

That gives the next architecture steps somewhere sensible to attach:

sequence-level logits
sequence-level loss
stacking the block
checkpoint save/load

Lesson 7 configuration

Lesson 7 can optionally call a live model through the OpenAI .NET SDK.

By default, the lesson still works without any configuration. If no API key is present, it falls back to the locally composed grounded answer so the demo remains runnable.

The easiest way to enable the live call is to edit the local config file in the repo root:

{
  "OpenAiApiKey": "your-api-key-here"
}

The file name is:

lesson7.config.json

This file is ignored by git, so you can keep your local key there without committing it.

If you prefer, you can still use an environment variable instead:

export OPENAI_API_KEY="your-api-key-here"

Then run lesson 7:

dotnet run --project ukazure.llm.cli.csproj -- lesson7

If no key is configured, lesson 7 will show a message like:

No API key configured. Set OPENAI_API_KEY or update lesson7.config.json.

Notes

The lessons are intentionally tiny and simplified.
The goal is clarity, not scale or performance.
Later lessons reuse ideas from earlier ones, so they work best as a sequence.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
docs		docs
src/ukazure.llm.cli		src/ukazure.llm.cli
.gitignore		.gitignore
README.md		README.md
global.json		global.json
ukazure.llm.cli.csproj		ukazure.llm.cli.csproj
ukazure.llm.sln		ukazure.llm.sln

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ukazure.llm

What is in here

Repository layout

Requirements

How to run

Example lesson flow

Lesson 1

Lesson 6

Lesson 7

nanoGPT

nanoGPT v2

Lesson 7 configuration

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ukazure.llm

What is in here

Repository layout

Requirements

How to run

Example lesson flow

Lesson 1

Lesson 6

Lesson 7

nanoGPT

nanoGPT v2

Lesson 7 configuration

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages