Transformer Language Modeling with KV Caching and RoPE

This repository contains my implementation of a GPT-style Transformer trained for causal language modeling on a public domain text corpus ("Twenty Thousand Leagues Under the Sea"). The goal was to understand the inner workings of Transformer architecture, sampling strategies, and inference optimizations like KV caching.

🔧 Features

Custom GPT-style Transformer built in PyTorch
Causal self-attention with rotary positional embeddings (RoPE)
Support for:
- Temperature scaling
- Top-k sampling
- Key-Value (KV) caching for faster autoregressive generation
Training on GPU using Slurm-compatible scripts

🚀 Training: Example Commands

python my_gpt.py

python generate.py --temperature 1.0 --top_k 40

python generate.py --temperature 0.1 --top_k 5

python generate.py --use_kv_cache --temperature 1.0 --top_k 40

python generate.py --use_kv_cache --temperature 0.1 --top_k 5

By Zoha Khan

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.DS_Store		.DS_Store
README.md		README.md
generate.py		generate.py
input.txt		input.txt
my_gpt.py		my_gpt.py
scholar.sh		scholar.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformer Language Modeling with KV Caching and RoPE

🔧 Features

🚀 Training: Example Commands

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Transformer Language Modeling with KV Caching and RoPE

🔧 Features

🚀 Training: Example Commands

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages