This repository contains a simple implementation of a Linear Regression model, designed to predict car prices based on their mileage. The model is trained using the Gradient Descent optimization algorithm and includes important techniques like feature scaling to ensure stable and efficient learning.
The project is organized into the following directories and files:
callmemehdy-linear_regression/ ├── README.md (This file) ├── data.csv (The dataset used for training, containing 'km' and 'price' data) ├── Makefile (Contains convenient commands for setting up the environment and running the training script) ├── myreadme.md (A supplementary markdown file explaining basic calculus differentiation rules) ├── requirements.txt (Lists all Python dependencies required for this project) └── src/ ├── linear_regression.py (The core Python class that implements the Linear Regression model, including gradient descent and plotting functions) └── main.py (The main script used to parse arguments, load data, train the model, and make predictions)
code Code download content_copy expand_less
- Linear Regression Implementation: A foundational machine learning algorithm for modeling the linear relationship between car mileage and price.
- Gradient Descent Optimization: Employs gradient descent to iteratively adjust the model's parameters (slope and intercept) to minimize prediction errors.
- Feature Scaling (Normalization): Automatically scales the mileage and price data to a common range. This is crucial for gradient descent to converge quickly and prevent issues where large input values could destabilize the training process.
- Parameter Denormalization: After training on scaled data, the model's parameters are transformed back to the original data scale, allowing for straightforward predictions with unscaled mileage inputs.
- Divergence Detection: The training process includes checks to detect if the model's parameters are diverging (e.g., due to an overly aggressive learning rate). If divergence occurs, training is halted with a helpful message.
- Data Visualization: Upon completion of training, the model generates plots to visualize:
- The original data points along with the fitted regression line.
- The cost function's value over each training iteration, providing insight into the learning process.
- Model Persistence: The learned model parameters (slope and intercept) are saved to a
theta.csvfile, enabling the model to be reused for predictions without needing to retrain it. - Command-Line Interface: The
main.pyscript supports command-line arguments to customize key training parameters like the learning rate and the number of iterations. It also offers an--infoflag to display dataset statistics.
To get this project up and running on your local machine, follow these steps.
You will need Python 3 installed.
- Clone the repository to your local machine:
git clone https://github.com/your-username/callmemehdy-linear_regression.git cd callmemehdy-linear_regression - Create and activate a Python virtual environment (recommended to manage dependencies):
python3 -m venv env # For Linux/macOS: source env/bin/activate # For Windows: .\env\Scripts\activate
- Install the required Python packages using pip:
pip install -r requirements.txt
The data.csv file included in this repository serves as a sample dataset. It contains two columns:
km: Represents the mileage of a car in kilometers.price: Represents the price of the car.
Feel free to replace data.csv with your own dataset, ensuring it has these two columns for compatibility with the model.
The main.py script handles the training and evaluation of the linear regression model.
You can initiate the training process either through the provided Makefile or by directly executing main.py.
Using the Makefile: Got it, you want the Usage section of the README.md formatted correctly, with proper code blocks and headings. Here it is:
code Markdown download content_copy expand_less
The main.py script handles the training and evaluation of the linear regression model.
You can initiate the training process either through the provided Makefile or by directly executing main.py.
Using the Makefile:
The Makefile includes a default target, all, which runs the training with a set of predefined parameters (a learning rate of 0.001 and 3000 iterations).