A comprehensive, production-ready implementation of popular machine learning algorithms in C, featuring robust error handling, performance optimizations, and best practices.
- Linear Regression - Gradient descent with early stopping
- Ridge Regression - L2 regularization with gradient descent
- K-Nearest Neighbors - Optimized for both regression and classification
- Random Forest - Ensemble of decision trees with bootstrap sampling
- XGBoost - Simplified gradient boosting implementation
faster_ml/
├── algorithms.c # Main implementation file
├── Makefile # Build configuration
├── README.md # This file
└── .gitignore # Git ignore patterns
- GCC compiler (version 4.9 or higher)
- Make utility
- Math library (linked automatically)
# Build release version
make
# Run the program
make test
# Or run directly
./ml_algorithms# Debug build with extra warnings
make debug
make test-debug
# Build with memory sanitizer (for debugging)
make sanitize
make test-sanitize
# Memory leak checking with valgrind
make memcheck
# Performance profiling
make profile
# Clean build artifacts
make clean
# Show all available targets
make help- Method: Gradient descent with early stopping
- Features: Automatic learning rate, progress monitoring
- Optimizations: Loop unrolling, early stopping with patience
- Method: L2-regularized gradient descent
- Features: Configurable regularization strength
- Optimizations: Same as linear regression plus regularization
- Method: Lazy learning with optimized distance calculation
- Features: Supports both regression and classification
- Optimizations:
- Loop unrolling in distance calculation
- Partial sorting for finding k nearest neighbors
- Configurable sorting threshold
- Method: Ensemble of decision trees with bootstrap sampling
- Features:
- Feature subsampling for diversity
- Configurable tree depth and minimum samples
- Progress reporting during training
- Optimizations: Limited feature search for better performance
- Method: Gradient boosting with decision trees
- Features:
- Configurable learning rate
- Residual-based tree building
- Progress monitoring
- Optimizations: Efficient residual calculation and tree building
#include <stdio.h>
#include <stdlib.h>
// Create dataset
int n_samples = 100;
int n_features = 2;
Dataset *ds = create_dataset(n_samples, n_features);
// Fill dataset with your data
// ... populate ds->data and ds->target ...
// Train Linear Regression
LinearRegression *lr = create_linear_regression(n_features);
train_linear_regression(lr, ds, 0.01, 1000);
// Make prediction
double test_sample[2] = {5.0, 3.0};
double prediction = predict_linear_regression(lr, test_sample);
printf("Prediction: %.2f\n", prediction);
// Cleanup
free_linear_regression(lr);
free_dataset(ds);All functions include comprehensive error checking:
// Functions return NULL on error and print error messages
LinearRegression *lr = create_linear_regression(-1); // Will return NULL
if (lr == NULL) {
// Handle error
return -1;
}