Skip to content

khan-zoha/Transformer-from-Scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Transformer Language Modeling with KV Caching and RoPE

This repository contains my implementation of a GPT-style Transformer trained for causal language modeling on a public domain text corpus ("Twenty Thousand Leagues Under the Sea"). The goal was to understand the inner workings of Transformer architecture, sampling strategies, and inference optimizations like KV caching.

๐Ÿ”ง Features

  • Custom GPT-style Transformer built in PyTorch
  • Causal self-attention with rotary positional embeddings (RoPE)
  • Support for:
    • Temperature scaling
    • Top-k sampling
    • Key-Value (KV) caching for faster autoregressive generation
  • Training on GPU using Slurm-compatible scripts

๐Ÿš€ Training: Example Commands

python my_gpt.py

python generate.py --temperature 1.0 --top_k 40

python generate.py --temperature 0.1 --top_k 5

python generate.py --use_kv_cache --temperature 1.0 --top_k 40

python generate.py --use_kv_cache --temperature 0.1 --top_k 5

By Zoha Khan

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors