Skip to content

Add Parameter Golf submission: Vocab768_LinearPhaseInit_GatedXSA_EMA_…#1015

Closed
shram86 wants to merge 2 commits intoopenai:mainfrom
shram86:my-submission
Closed

Add Parameter Golf submission: Vocab768_LinearPhaseInit_GatedXSA_EMA_…#1015
shram86 wants to merge 2 commits intoopenai:mainfrom
shram86:my-submission

Conversation

@shram86
Copy link
Copy Markdown

@shram86 shram86 commented Mar 28, 2026

Adds a new track_10min_16mb submission record based on a custom sp768 tokenizer export.

Result:

  • final_int6_roundtrip val_loss: 1.87181421
  • final_int6_roundtrip val_bpb: 1.21149167
  • total submission size int6+zstd: 15082805 bytes

Included:

  • record README
  • submission.json
  • native train_gpt.py
  • run log

Main ingredients:

  • vocab_size=768
  • phase-mix init
  • gated XSA on last 2 layers
  • EMA
  • late matrix-only QAT
  • FlashAttention 3

Tokenizer/data export:

  • the sp768 tokenizer artifacts and retokenized dataset export were published to the Hugging Face dataset repo shramoff/golf768
  • locally, I used a small extension of cached_challenge_fineweb.py to support a custom repo id and root-level remote layout for sp768

@shram86 shram86 closed this Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant