Skip to content

SanBast/faster_ml

Repository files navigation

Machine Learning Algorithms in C

A comprehensive, production-ready implementation of popular machine learning algorithms in C, featuring robust error handling, performance optimizations, and best practices.

🚀 Features

Implemented Algorithms

  1. Linear Regression - Gradient descent with early stopping
  2. Ridge Regression - L2 regularization with gradient descent
  3. K-Nearest Neighbors - Optimized for both regression and classification
  4. Random Forest - Ensemble of decision trees with bootstrap sampling
  5. XGBoost - Simplified gradient boosting implementation

📁 Project Structure

faster_ml/
├── algorithms.c      # Main implementation file
├── Makefile         # Build configuration
├── README.md        # This file
└── .gitignore       # Git ignore patterns

🛠️ Building and Running

Prerequisites

  • GCC compiler (version 4.9 or higher)
  • Make utility
  • Math library (linked automatically)

Quick Start

# Build release version
make

# Run the program
make test

# Or run directly
./ml_algorithms

Build Options

# Debug build with extra warnings
make debug
make test-debug

# Build with memory sanitizer (for debugging)
make sanitize
make test-sanitize

# Memory leak checking with valgrind
make memcheck

# Performance profiling
make profile

# Clean build artifacts
make clean

# Show all available targets
make help

📊 Algorithm Details

1. Linear Regression

  • Method: Gradient descent with early stopping
  • Features: Automatic learning rate, progress monitoring
  • Optimizations: Loop unrolling, early stopping with patience

2. Ridge Regression

  • Method: L2-regularized gradient descent
  • Features: Configurable regularization strength
  • Optimizations: Same as linear regression plus regularization

3. K-Nearest Neighbors

  • Method: Lazy learning with optimized distance calculation
  • Features: Supports both regression and classification
  • Optimizations:
    • Loop unrolling in distance calculation
    • Partial sorting for finding k nearest neighbors
    • Configurable sorting threshold

4. Random Forest

  • Method: Ensemble of decision trees with bootstrap sampling
  • Features:
    • Feature subsampling for diversity
    • Configurable tree depth and minimum samples
    • Progress reporting during training
  • Optimizations: Limited feature search for better performance

5. XGBoost

  • Method: Gradient boosting with decision trees
  • Features:
    • Configurable learning rate
    • Residual-based tree building
    • Progress monitoring
  • Optimizations: Efficient residual calculation and tree building

🔧 API Usage

Basic Usage Example

#include <stdio.h>
#include <stdlib.h>

// Create dataset
int n_samples = 100;
int n_features = 2;
Dataset *ds = create_dataset(n_samples, n_features);

// Fill dataset with your data
// ... populate ds->data and ds->target ...

// Train Linear Regression
LinearRegression *lr = create_linear_regression(n_features);
train_linear_regression(lr, ds, 0.01, 1000);

// Make prediction
double test_sample[2] = {5.0, 3.0};
double prediction = predict_linear_regression(lr, test_sample);
printf("Prediction: %.2f\n", prediction);

// Cleanup
free_linear_regression(lr);
free_dataset(ds);

Error Handling

All functions include comprehensive error checking:

// Functions return NULL on error and print error messages
LinearRegression *lr = create_linear_regression(-1);  // Will return NULL
if (lr == NULL) {
    // Handle error
    return -1;
}

About

repository with standard ML algorithms developed in C

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors