Skip to content

hersheys-ITA/Predictive-Maintenance

Repository files navigation

πŸš€ Predictive Maintenance of Aircraft Engines using LSTM for RUL Prediction

πŸ“‹ Table of Contents

  1. Project Overview
  2. Dataset Explanation
  3. Task Breakdown
  4. Getting Started
  5. Detailed Explanations
  6. Results & Evaluation

🎯 Project Overview

What is This Project?

This project implements a Long Short-Term Memory (LSTM) neural network to predict the Remaining Useful Life (RUL) of aircraft turbofan engines before they fail.

Real-world Application:

  • Airlines operate thousands of engines across their fleet
  • Unexpected engine failures β†’ expensive maintenance + flight delays
  • Predictive maintenance β†’ schedule repairs proactively
  • Your model predicts: "In how many operational cycles will this engine fail?"

Why Predictive Maintenance?

Scenario Cost Impact
Unplanned Failure $500K-$2M per incident Safety hazard, customer distrust
Predictive Schedule $100K-$300K (planned) Safety, revenue protection
Savings 50-70% cost reduction Better fleet utilization

πŸ“Š Dataset Explanation: CMAPSS (Commercial Modular Aero-Propulsion System Simulation)

Dataset Overview

NASA provides 4 increasingly complex datasets (FD001, FD002, FD003, FD004):

Aspect FD001 FD002 FD003 FD004
Train Engines 100 260 100 248
Test Engines 100 259 100 249
Op. Conditions 1 (Sea Level) 6 (Variable) 1 (Sea Level) 6 (Variable)
Fault Modes 1 (HPC) 1 (HPC) 2 (HPC + Fan) 2 (HPC + Fan)
Difficulty 🟒 Easy 🟑 Medium 🟑 Medium πŸ”΄ Hard

Data Structure: What Each Column Means

Each row represents one engine snapshot during one operational cycle:

Column  |  Description
--------|--------------------------------------------------
1       |  Unit/Engine ID (1-100 for FD001, 1-260 for FD002)
2       |  Time in cycles (how many cycles this engine has run)
3-5     |  Operational Settings (altitude, throttle, temperature)
6-26    |  Sensor Measurements (21 different sensor readings)

Example Data

Engine 1, Cycle 1:
1 1 -0.0007 -0.0004 0.0000 100.00 518.67 641.82 ... [18 more sensor values]

Engine 1, Cycle 2:
1 2 -0.0007 -0.0004 0.0000 100.00 518.67 642.15 ...

... (cycles 3 to 192 for Engine 1)

Engine 1, Cycle 192 (FAILURE):
1 192 0.0411 0.0440 0.0000 100.00 518.67 2388.04 ...

Key Data Characteristics

1. Time-Series Nature

  • Each engine generates a sequence of sensor readings
  • Length varies: Engine A fails after 100 cycles, Engine B after 300 cycles
  • Challenge: Model must handle variable-length sequences

2. Multivariate (Multiple Sensors)

Each cycle has 21 sensor readings:
s_1, s_2, ..., s_21

This is NOT a univariate problem (single time series).
It's a multivariate problem (multiple interconnected time series).

3. Degradation Pattern

Cycle:   1     50    100   150   192 (failure)
Sensor: 100.0  100.5 101.2 102.1 103.8

Pattern: Slow drift over time, accelerates near failure

4. Operating Conditions Matter

  • FD001: Only sea-level (simple)
  • FD002: 6 different altitude + throttle combinations (complex)

Example:

Setting 1 = -0.0007 β†’ High altitude operation
Setting 2 = -0.0004 β†’ Different throttle position
Same engine at different conditions = different degradation rates

πŸ“ˆ RUL (Remaining Useful Life) Concept

What is RUL?

RUL = Number of cycles remaining before engine failure

How It's Calculated

Training Data (we know when each engine fails):

Engine 1: Fails at cycle 192

At cycle 1:   RUL = 192 - 1 = 191 cycles remaining
At cycle 100: RUL = 192 - 100 = 92 cycles remaining
At cycle 192: RUL = 192 - 192 = 0 cycles remaining (FAILURE)

General Formula:

RUL(t) = max_cycle - current_cycle

RUL Capping (Important!)

Why cap RUL at 130 cycles?

Problem: Some engines live very long (250+ cycles)
         Without capping, model learns: "RUL = 250"
         This becomes the "easy" answer for healthy engines

Solution: Cap RUL at 130
          - Engine with 200 cycles β†’ label as 130
          - Engine with 100 cycles β†’ label as 100
          - Prevents extreme values, improves generalization

Effect:  ~40% of training samples hit the cap in both domains

Visualization Example

Engine Lifecycle:
|---Healthy Degrading--------Near Failure--|
RUL: 130 -> 120 -> 90 -> 30 -> 10 -> 0
Cycle: 1 -> 50 -> 100 -> 150 -> 180 -> 192

πŸ”§ Task Breakdown & Rubrics

TASK 1: Problem Understanding & Dataset Analysis (5 points)

Deliverables:

  • βœ… Load CMAPSS data (FD001, FD002)
  • βœ… Exploratory Data Analysis with 5+ visualizations
  • βœ… Sensor trend analysis (degradation curves)
  • βœ… RUL label generation and visualization
  • βœ… Domain shift analysis (FD001 vs FD002)
  • βœ… Summary of key insights

Key Questions to Answer:

  1. How many cycles does a typical engine run?
  2. Which sensors show the clearest degradation patterns?
  3. What's the difference between FD001 and FD002?
  4. Why is RUL capping necessary?

TASK 2: Model Design & Justification (5 points)

What to do:

  • Design LSTM architecture with:
    • Embedding or input layer
    • 2 LSTM layers with dropout
    • Dense output layer (regression head)
  • Justify each design choice
  • Explain why LSTM > Feedforward > Linear Regression

Example Architecture:

Input (batch, 30, 24)
  ↓
LSTM Layer 1 (96 units, dropout=0.3)
  ↓
LSTM Layer 2 (96 units, dropout=0.3)
  ↓
Dense Layer (1 unit) β†’ RUL prediction

TASK 3: Data Preprocessing & Feature Engineering (5 points)

Steps:

  1. Imputation: Handle missing values (forward-fill, backfill)
  2. Scaling: StandardScaler (zero mean, unit variance)
  3. Sequence Windowing: Create fixed-size time windows
  4. Train/Validation Split: Temporal split (respect causality)

Example Windowing:

Original sequence: [s1, s2, s3, s4, s5, ..., s192]

Window size = 30, Stride = 1:
Window 1: [s1:s30]   β†’ RUL at s30
Window 2: [s2:s31]   β†’ RUL at s31
Window 3: [s3:s32]   β†’ RUL at s32
...
Window 163: [s163:s192] β†’ RUL at s192 (failure)

TASK 4: Model Training & Evaluation (5 points)

Training:

  • Loss function: Mean Squared Error (MSE)
  • Optimizer: Adam
  • Early stopping (patience=10)
  • Learning rate scheduling

Evaluation Metrics:

RMSE = sqrt(mean((predicted_RUL - actual_RUL)^2))
MAE = mean(|predicted_RUL - actual_RUL|)

Good RMSE: < 15 cycles
Great RMSE: < 10 cycles

TASK 5: Interpretability & Visualization (5 points)

Visualizations:

  1. Loss curves (training vs validation)
  2. Prediction vs actual scatter plots
  3. Residual analysis
  4. Sensor importance (permutation feature importance)

TASK 6: Deployment & Application Insight (5 points)

Real-world aspects:

  • How to use the model in production?
  • Edge computing considerations
  • Maintenance scheduling decisions
  • Cost-benefit analysis

πŸš€ Getting Started

Environment Setup

# Activate the conda environment
conda activate pythonenv

# Install additional packages if needed
pip install numpy pandas scikit-learn torch matplotlib seaborn

Running the Notebook

# Navigate to project directory
cd d:\Piyush\College\AI\EST_Project

# Open Jupyter Notebook
jupyter notebook Predictive_Maintenance_RUL_LSTM.ipynb

File Structure

EST_Project/
β”œβ”€β”€ Predictive_Maintenance_RUL_LSTM.ipynb   (Main notebook β€” fully executed)
β”œβ”€β”€ dashboard.html                            (Interactive visualization dashboard)
β”œβ”€β”€ README.md                                 (This file)
β”œβ”€β”€ CMaps/                                    (NASA C-MAPSS dataset)
β”‚   β”œβ”€β”€ train_FD001.txt / test_FD001.txt / RUL_FD001.txt
β”‚   β”œβ”€β”€ train_FD002.txt / test_FD002.txt / RUL_FD002.txt
β”‚   β”œβ”€β”€ train_FD003.txt / test_FD003.txt / RUL_FD003.txt
β”‚   β”œβ”€β”€ train_FD004.txt / test_FD004.txt / RUL_FD004.txt
β”‚   └── readme.txt / Damage Propagation Modeling.pdf
β”œβ”€β”€ api/
β”‚   β”œβ”€β”€ rul_service.py                        (FastAPI REST API for real-time RUL prediction)
β”‚   └── simulator.py                          (Sensor data streaming simulator)
β”œβ”€β”€ artifacts/deployment/
β”‚   β”œβ”€β”€ rul_lstm_fd001.pt                     (Trained LSTM model weights)
β”‚   β”œβ”€β”€ feature_scaler.pkl                    (StandardScaler for feature normalization)
β”‚   β”œβ”€β”€ deployment_metadata.json              (Model metadata for production)
β”‚   β”œβ”€β”€ maintenance_plan_top20.csv            (Top 20 maintenance priority engines)
β”‚   β”œβ”€β”€ optimization_eval_metrics.csv         (Pruning/quantization evaluation)
β”‚   └── optimization_eval_comparison.png      (Optimization comparison chart)
└── tools/
    └── evaluate_optimizations.py              (Model optimization evaluation script)

πŸ“š Detailed Explanations

Why LSTM for RUL Prediction?

1. Temporal Dependencies

Traditional ML (Random Forest, SVM):
Input: [s1, s2, ..., s21]  (single cycle)
Problem: Ignores time order

LSTM:
Input: [Cycle1=[s1...], Cycle2=[s1...], ..., Cycle30=[s1...]]
Advantage: Captures "how sensors are trending"

2. Varying Sequence Lengths

Engine A: 192 cycles
Engine B: 245 cycles
Engine C: 156 cycles

LSTM: Handles with padding/windowing βœ“
Feedforward: Requires fixed input size βœ—

3. Long-Range Dependencies

Is the engine about to fail?
Answer depends on:
- Recent sensor trends (last 10 cycles)
- Overall degradation pattern (all 192 cycles)
- Rate of change (comparing cycle 50 vs cycle 190)

LSTM remembers long-range context via hidden state

Data Preprocessing Deep Dive

Standardization (Z-score Normalization)

# Before standardization:
sensor_s2 = [100.5, 101.2, 102.1, ..., 650.3]
# Range: 100 to 650

# After standardization:
sensor_s2 = [-1.5, -1.3, -1.1, ..., 2.8]
# Mean: 0, Std: 1

Why?
1. All sensors on same scale
2. Prevents numerical instability
3. Improves gradient flow during backprop

Sequence Windowing Example

Engine 1 runs 192 cycles with 24 features per cycle

Raw shape: (192, 24)

Window size = 30, Stride = 1:
  Sample 1: X[0:30], y[RUL at cycle 30]
  Sample 2: X[1:31], y[RUL at cycle 31]
  ...
  Sample 163: X[162:192], y[RUL at cycle 192]

Result: 163 training samples from 1 engine
100 engines β†’ 16,300 training samples

LSTM Architecture Explanation

Input: (batch_size=256, seq_len=30, n_features=24)

LSTM Cell:
  - Cell State (memory): captures long-term dependencies
  - Hidden State: short-term memory
  - Gates:
    * Input gate: what to add to memory?
    * Forget gate: what to forget from memory?
    * Output gate: what to output?

Layer 1 LSTM (96 units):
  Input: (256, 30, 24) β†’ Output: (256, 30, 96)
  
Dropout (0.3):
  Randomly zeroes 30% of outputs
  Prevents overfitting

Layer 2 LSTM (96 units):
  Input: (256, 30, 96) β†’ Output: (256, 30, 96)

Global Average Pooling:
  (256, 30, 96) β†’ (256, 96)
  Summarize the whole sequence

Dense Layer (regression head):
  (256, 96) β†’ (256, 1)
  Final RUL prediction

Training Loop Explanation

for epoch in range(45):
    # Training phase
    for batch in train_loader:
        X, y = batch
        
        # Forward pass
        pred = model(X)  # Shape: (256, 1)
        loss = criterion(pred, y)  # MSE loss
        
        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    
    # Validation phase
    with torch.no_grad():
        for batch in val_loader:
            X_val, y_val = batch
            pred_val = model(X_val)
            val_loss = criterion(pred_val, y_val)
    
    # Early stopping: if validation loss doesn't improve for 10 epochs, stop
    if val_loss > best_val_loss:
        patience_counter += 1
        if patience_counter >= 10:
            break
    else:
        best_val_loss = val_loss
        patience_counter = 0

πŸ“Š Actual Results & Evaluation

Performance Metrics (Our Model)

Dataset RMSE (cycles) MAE (cycles) Test Engines
FD001 (In-Domain) 15.87 11.46 100
FD002 (Cross-Domain) 45.37 39.02 259

Benchmark Comparison

Model RMSE (FD001) MAE (FD001) Notes
Linear Baseline ~35 ~28 No temporal modeling
Random Forest ~18 ~12 No sequence awareness
Our LSTM 15.87 11.46 Sequence modeling + dropout
State-of-the-art (literature) ~10 ~7 Complex architectures + attention

Note: Our RMSE of 15.87 is competitive for a 2-layer LSTM with early stopping at epoch 12. State-of-the-art results (~10 RMSE) typically require deeper architectures, attention mechanisms, or ensemble methods.

Interpreting Results

RMSE = 15.87 means:
- On average, predictions are off by Β±16 cycles
- If actual RUL = 50, prediction β‰ˆ 34–66 (acceptable for scheduling)
- If actual RUL = 10, prediction β‰ˆ 0–26 (critical β€” use MC Dropout confidence)

This is why uncertainty estimation (MC Dropout) matters!
MC Dropout RMSE: 15.84 (slightly better due to ensemble effect)

Domain Adaptation Performance

Model trained on FD001 (single operating condition):
- FD001 β†’ FD001 test RMSE: 15.87 (in-domain β€” good)
- FD001 β†’ FD002 test RMSE: 45.37 (cross-domain β€” significant drop!)
- FD001 β†’ FD002 (mean-std adaptation) RMSE: 51.48 (simple adaptation insufficient)

Reason: FD002 has 6 different operating conditions vs FD001's 1
Key insight: Domain shift causes ~3x RMSE increase
Solution needed: Fine-tuning on FD002 data or advanced domain adaptation

πŸŽ“ Key Concepts Summary

Why Different Datasets Have Different Difficulties

FD001 (Easy):

Same engine, same altitude, same temperature
Degradation pattern: Consistent, predictable
Model task: Learn one degradation curve pattern

FD002 (Hard):

Same engine, 6 different operating conditions
Degradation pattern: Varies by condition
Model task: Learn 6 different degradation patterns simultaneously

Degradation Dynamics

Healthy phase:  Constant sensor values
Degradation phase: Slow drift, then acceleration
Failure phase:   Sudden changes, system shutdown

LSTM captures all three phases differently

Why Sensor Selection Matters

High Information Gain Sensors:

  • s_2, s_11, s_12, s_15, s_20, s_21
  • Show clear degradation trends
  • High variance across engine lifetime

Low Information Gain Sensors:

  • s_4, s_10, s_16, s_18, s_19
  • Remain mostly constant
  • Don't correlate with failure

Feature selection could improve model, but all 21 sensors are included for completeness.


πŸ” Domain Shift Challenge

FD001 vs FD002: What's Different?

Operating Settings (Altitude Γ— Throttle):

FD001:

Setting 1, 2, 3 are constant
Engines always operate at same conditions

FD002:

Six different condition combinations:
  - Sea level, 60% throttle
  - Sea level, 80% throttle
  - Sea level, 100% throttle
  - 35K ft, 60% throttle
  - 35K ft, 80% throttle
  - 35K ft, 100% throttle

Same engine at different conditions = different sensor readings!

Domain Shift Impact

Sensor s_2:
  FD001 mean: 100.5
  FD002 mean: 98.2
  Difference: 2.3 units (2.3% shift)

This shift applies to all sensors!

Model trained on FD001 sees values like 100.5
Model tested on FD002 sees values like 98.2
Mismatch β†’ Prediction errors

Solution: Domain adaptation techniques (TBD in advanced tasks)

πŸ“ Notebook Structure

 1. Setup & Imports (device, seeds, libraries)
 2. Configuration (hyperparameters, window size, RUL cap)
 3. Data Loading & Domain Setup (FD001 + FD002)
 4. ============= TASK 1: EDA =============
    - Data loading & schema validation
    - Sensor trend analysis (6 key sensors)
    - Distribution analysis & correlation heatmap
    - Degradation trajectories (multi-engine overlay)
    - RUL label generation & visualization
    - Domain comparison (FD001 vs FD002 shift analysis)
    - EDA key insights summary
 5. ============= TASK 2: MODEL DESIGN =============
    - LSTM architecture justification (why LSTM > FF > LR)
    - Design decisions (layers, hidden size, dropout)
    - Model verification & forward pass test
    - Data flow visualization
 6. ============= TASK 3: PREPROCESSING =============
    - Feature scaling (StandardScaler)
    - Sequence windowing (window=30, stride=1)
    - Train/validation/test split (80/20)
    - Sanity checks
 7. ============= TASK 4: TRAINING =============
    - Training loop (MSE loss, Adam optimizer)
    - Early stopping (patience=10) & LR scheduling
    - Loss curves (training vs validation)
    - Evaluation: RMSE & MAE on FD001 + FD002
    - Predicted vs Actual scatter plots
 8. ============= TASK 5: INTERPRETABILITY =============
    - Permutation feature importance
    - Residual analysis & error distribution
 9. ============= TASK 6: DEPLOYMENT =============
    - Deployment plan (edge/cloud/API)
    - FastAPI REST service demo
    - Maintenance scheduling (top 20 priority engines)
    - Ethical considerations
10. ============= BONUS: SHAP XAI =============
    - SHAP KernelExplainer (global + local explanations)
11. ============= BONUS: MC DROPOUT =============
    - Uncertainty estimation (50 forward passes)
    - 95% confidence intervals
12. ============= BONUS: DOMAIN ADAPTATION =============
    - FD001 -> FD002 transfer analysis
    - Mean-std feature alignment
13. ============= BONUS: DASHBOARD =============
    - Interactive Plotly visualization
14. ============= BONUS: OPTIMIZATION =============
    - Pruning (L1 unstructured)
    - Dynamic quantization (int8)
    - Optimization evaluation harness
15. Final Conclusions

πŸŽ“ Learning Outcomes

After completing this project, you'll understand:

βœ… Predictive maintenance use cases and real-world impact βœ… CMAPSS dataset structure and characteristics βœ… RUL concept and label generation βœ… LSTM architecture and why it's suitable for time-series βœ… Data preprocessing for sequence modeling βœ… Training, validation, and early stopping βœ… Model evaluation with domain shift considerations βœ… Interpretability and feature importance βœ… Deployment readiness assessment


πŸ“š References

  • Dataset Paper: Saxena, K. et al. (2008). "Damage Propagation Modeling for Aircraft Engine Run-to-Failure Simulation"
  • CMAPSS Homepage: https://www.nasa.gov/intelligent-systems-division/
  • LSTM Paper: Hochreiter & Schmidhuber (1997). "Long Short-Term Memory"

πŸ’‘ Tips for Success

  1. Start with visualization: Before coding, understand your data visually
  2. Check data quality: Missing values, outliers, scaling issues
  3. Validate domain differences: FD001 and FD002 are NOT interchangeable
  4. Monitor metrics carefully: RMSE alone isn't enough; plot predictions
  5. Document assumptions: Why window size = 30? Why RUL_CAP = 130?
  6. Test gradually: Build and test each component independently

πŸ“Š Interactive Visualization Dashboard

The project includes a self-contained interactive dashboard (dashboard.html) built with HTML, CSS, and Chart.js. Simply open it in any browser β€” no server required.

Dashboard Tabs

Tab Description
Overview KPI cards (RMSE, MAE, parameters), scatter plot, error distribution, lifecycle histogram, model config
Predictions Per-engine bar charts comparing predicted vs actual RUL for FD001 (100 engines) and FD002 (60 engines)
Uncertainty MC Dropout confidence intervals (95% CI) for all test engines with actual values overlay
Feature Analysis Horizontal bar chart of permutation feature importance β€” shows RMSE increase when each feature is shuffled
Sensor Trends Multi-line degradation curves for 6 key sensors across the longest-lived engine's full lifecycle
Maintenance Plan Priority queue table of the 20 highest-risk engines with color-coded risk badges (CRITICAL/HIGH/MEDIUM/LOW)
Domain Adaptation Side-by-side RMSE/MAE comparison between FD001 (in-domain) and FD002 (cross-domain) performance

How to View

# Simply open in any browser
start dashboard.html

❓ FAQ

Q: Why are engine lifespans different? A: Manufacturing variations, initial wear, operating history, maintenance history all affect how long an engine lasts.

Q: Can we predict the exact failure cycle? A: No, degradation is stochastic. We estimate RUL with uncertainty bounds.

Q: Why not use all 26 columns as features? A: Unit_id and Cycle are indices, not features. We use columns 3-26 (settings + sensors).

Q: Is domain adaptation necessary? A: For production deployment from FD001 β†’ FD002, yes. Without it, performance degrades significantly.

Q: What's a good RMSE for RUL prediction? A: < 15 cycles is decent, < 10 is excellent, < 5 is state-of-the-art.


🌐 Real Sensors/API Integration

This project now includes a runnable API demo for real-time integration.

Files Added

  • api/rul_service.py: FastAPI service that loads model + scaler from artifacts/deployment/
  • api/simulator.py: Streams CMAPSS cycles as pseudo live sensor packets

API Endpoints

  • GET /health β†’ service status
  • GET /schema/features β†’ expected feature list and payload shape
  • POST /ingest β†’ ingest one sensor packet and (if window ready) return prediction
  • GET /predict/latest?engine_id=<id> β†’ latest prediction for an engine
  • POST /predict/batch β†’ ingest/predict for multiple packets

Start API Server

D:\Downloads\miniconda3\envs\pythonenv\python.exe -m uvicorn api.rul_service:app --host 127.0.0.1 --port 8000

Run Streaming Simulator

D:\Downloads\miniconda3\envs\pythonenv\python.exe api\simulator.py --fd FD001 --engine-id 1 --max-cycles 40 --sleep-sec 0.01

Example Ingest Payload

{
  "engine_id": 34,
  "timestamp": "2026-04-29T12:30:00Z",
  "data": {
    "setting_1": -0.0012,
    "setting_2": 0.0003,
    "setting_3": 100.0,
    "s_1": 518.67,
    "s_2": 641.90,
    "s_3": 1585.4,
    "s_4": 1400.2,
    "s_5": 14.62,
    "s_6": 21.61,
    "s_7": 553.2,
    "s_8": 2388.1,
    "s_9": 9047.1,
    "s_10": 1.3,
    "s_11": 47.3,
    "s_12": 521.0,
    "s_13": 2388.2,
    "s_14": 8125.2,
    "s_15": 8.40,
    "s_16": 0.03,
    "s_17": 392,
    "s_18": 2388,
    "s_19": 100.0,
    "s_20": 38.98,
    "s_21": 23.35
  }
}

Integration Notes

  • The API keeps a rolling 30-cycle buffer per engine.
  • Predictions start after the buffer is full (status=warming_up before that).
  • Output includes uncertainty (uncertainty_std) using MC Dropout and a recommended maintenance action.

Last Updated: April 29, 2026 Author: Piyush Status: All Tasks Complete (Tasks 1–6 + All Bonus Tasks) βœ…


πŸ”§ Model Optimization: Pruning & Quantization

This project includes lightweight helpers and guidance for compressing the trained LSTM model used for RUL inference. Use these to reduce disk size and improve CPU inference latency.

  1. Pruning (unstructured L1)
  • What: Remove a fraction of smallest-magnitude weights from nn.Linear / nn.Conv2d layers.
  • When to use: Quick parameter reduction when you need smaller checkpoints. Expect to fine-tune afterwards for recovery.
  • How to run (notebook cells): open Predictive_Maintenance_RUL_LSTM.ipynb and run the "Pruning helper" cell. Example usage is commented in the cell.
  1. Dynamic Quantization
  • What: Convert selected layers (LSTM, Linear) to use int8 weights at inference using PyTorch quantize_dynamic.
  • Benefit: Smaller on-disk model and faster CPU inference with minimal accuracy loss in many cases.
  • How to run: run the "Dynamic quantization helper" cell in the notebook. Example usage is commented in the cell.
  1. Smoke Tests
  • A smoke-test cell tries to torch.load artifacts/deployment/rul_lstm_fd001.pt and run a dummy forward pass if the file contains a full nn.Module. If the file is a state_dict, load it into your model class instead before testing.
  1. Recommended workflow
  • Save a copy of the original model: rul_lstm_fd001_orig.pt
  • Run pruning with amount=0.1..0.3, then evaluate validation RMSE/MAE.
  • If pruning reduces accuracy beyond acceptable limits, fine-tune the pruned model for a few epochs.
  • Apply dynamic quantization to the (fine-tuned) model and re-evaluate.
  • Keep both *_pruned.pt and *_quantized.pt for comparison.
  1. Commands (examples)
# Prune (run in python REPL or notebook cell)
# from the notebook: uncomment example_prune(...) call and run the cell

# Quantize (run in python REPL or notebook cell)
# from the notebook: uncomment example_quantize(...) call and run the cell

# Evaluate baseline vs optimized models (FD001 + FD002 combined RMSE/MAE comparison)
D:\Downloads\miniconda3\envs\pythonenv\python.exe tools\evaluate_optimizations.py --subsets FD001 FD002 --baseline artifacts\deployment\rul_lstm_fd001.pt --pruned artifacts\deployment\rul_lstm_fd001_pruned.pt --quantized artifacts\deployment\rul_lstm_fd001_quantized.pt

Evaluation output:

  • Console table with RMSE/MAE and deltas versus baseline
  • CSV file (long format): artifacts/deployment/optimization_eval_metrics.csv
  • CSV file (combined summary): artifacts/deployment/optimization_eval_summary.csv
  • Comparison chart (before vs after): artifacts/deployment/optimization_eval_comparison.png

Notes:

  • If pruned/quantized files do not exist yet, omit those flags; baseline-only evaluation still works.
  • The evaluator supports both saved state_dict checkpoints and full nn.Module objects.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors