Skip to content

Latest commit

 

History

History
114 lines (57 loc) · 2.24 KB

File metadata and controls

114 lines (57 loc) · 2.24 KB

GUI-data-cleaning-automation-tool-python

  1. Project Overview

This project is a GUI-based Data Cleaning Automation Tool built in Python to simplify and automate common preprocessing tasks for Excel and CSV files.

The system allows non-technical users to upload a raw dataset, select cleaning options via checkboxes, and generate a cleaned output file — all without writing code.

This project simulates how real-world Data Analysts automate repetitive data-cleaning workflows to improve efficiency and data quality.

  1. Problem Statement

Raw datasets often contain:

~ Duplicate rows

~ Blank rows

~ Inconsistent text formatting

~ Leading and trailing spaces

~ Missing or null values

~ Data inconsistencies and errors

Manual cleaning in Excel is:

~ Time-consuming

~ Error-prone

~ Not scalable

  1. Solution

This tool provides a Graphical User Interface (GUI) that enables users to:

✔ Browse and upload Excel/CSV files

✔ Remove duplicate rows

✔ Remove blank rows

✔ Trim leading & trailing spaces (text columns only)

✔ Convert text columns to Title Case

✔ Handle null and missing values

✔ Debug common data errors

All through a simple, user-friendly interface.

  1. How It Works

4.1 User selects a raw Excel/CSV file

4.2 Chooses cleaning options via checkboxes

4.3 The system applies selected preprocessing steps

4.4 A cleaned output file is generated automatically

The tool applies cleaning logic programmatically while preserving dataset integrity.

  1. Key Concepts & Skills Applied

~ Data preprocessing principles

~ Automation of repetitive workflows

~ GUI-based user interaction

~ Conditional logic & validation

~ Error handling

~ Data quality improvement strategies

  1. Technologies Used

6.1 Python

6.2 File handling (Excel/CSV)

6.3 Data cleaning logic

6.4 GUI framework

6.5 Automation workflow design

  1. Project Outcome

~ Reduced manual cleaning time

~ Improved dataset consistency

~ Automated repetitive preprocessing tasks

~ Built a reusable analyst-focused cleaning tool

  1. Future Enhancements

~ Add preview window before export

~ Add data profiling summary (basic statistics)

~ Integrate logging system

~ Add automated report generation

~ Convert into standalone executable (.exe)