Skip to content

Amaan9136/code-llm-training

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CodeLLM Training System

Production-ready end-to-end LLM training system for coding data from GitHub repositories.

Architecture

GitHub Repos → Ingestion → Preprocessing → Dataset → Training → Fine-tuning → Inference API

Quick Start

pip install -r requirements.txt
python scripts/ingest.py --repo https://github.com/Amaan9136/devlabs
python scripts/train.py --config config/training.yaml
python scripts/serve.py --model-path outputs/model

Components

  • core/ — Repository ingestion, code extraction, tokenization
  • pipeline/ — Data pipeline, dataset creation, preprocessing
  • training/ — Training loop, fine-tuning, checkpointing
  • inference/ — Model serving, inference engine
  • api/ — REST API for inference and management
  • ui/ — Web dashboard
  • config/ — Configuration files
  • scripts/ — CLI entry points

About

Training and fine-tuning LLMs from GitHub code repositories with automated ingestion, preprocessing, dataset building, checkpointing, and inference pipeline.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages