Skip to content

PACELab/carma_framework

Repository files navigation

CARMA Framework

A clean framework for running GPU workloads with real-time power monitoring, analysis, and carbon footprint estimation.

Directory Structure

carma_framework/
├── carma.sh                    # Main entry point script
├── env                          # Configuration file
├── analysis.sh                  # Real-time analysis script
├── pstree_logger.sh            # Process tree logger
├── cleanup_all.sh              # Cleanup script
├── sample_experiment_ft_small_ft_large.json  # Example experiment config
├── monitors/                    # Monitoring scripts
│   ├── nvidia_smi_power.sh     # NVIDIA GPU power monitor
│   └── perf_monitor.sh         # CPU performance monitor
├── wattsup_logger/              # Power meter logger
│   ├── wattsup_logger.sh       # Main logger script
│   └── pm.py                    # Python power meter interface
├── workloads/                   # Workload execution scripts
│   ├── run_workloads.py         # Workload orchestrator
│   ├── finetune_model.py        # Model finetuning workload
│   ├── inference_workload.sh    # Inference workload
│   ├── requirements.txt         # Python dependencies
│   └── configs/                 # Workload configuration files
├── models/                      # ML models for power prediction
│   ├── baseline/                # Baseline power prediction model
│   │   └── baseline.joblib
│   └── delta/                   # Delta power prediction model
│       └── delta.joblib
└── runs/                        # Output directory (created automatically)

Usage

Running an Experiment

cd /home/dhalder/sassy/gpu/carma_framework
./carma.sh sample_experiment_ft_small_ft_large.json

Experiment Configuration Format

The experiment JSON file should have the following structure:

{
  "experiment_name": "experiment_name",
  "description": "Description of the experiment",
  "workloads": [
    {
      "name": "workload_name",
      "type": "finetune",
      "gpu": true,
      "script": "python3",
      "args": [
        "workloads/finetune_model.py",
        "--model_name", "model_name",
        "--max_steps", "60"
      ],
      "cwd": ".",
      "env": {"CUDA_VISIBLE_DEVICES": "0"}
    }
  ]
}

How It Works

  1. carma.sh is the main orchestrator that:

    • Starts monitoring scripts (nvidia-smi, perf, wattsup)
    • Launches workloads via run_workloads.py
    • Starts process tree logging
    • Runs real-time analysis during experiment execution
    • After workloads complete, runs post-processing pipeline:
      • create_dataset.sh: Processes nsys profiles and GPU data
      • predict_power.py: Predicts power consumption from processed data
      • analysis.sh: Calculates final carbon emissions for each job
    • Handles cleanup on exit
  2. Monitors collect data in parallel:

    • nvidia_smi_power.sh: GPU power consumption
    • perf_monitor.sh: CPU performance counters
    • wattsup_logger.sh: System-wide power consumption
  3. Real-time analysis.sh processes data during experiment:

    • Correlates process trees with performance counters
    • Apportions power consumption to individual workloads
    • Calculates operational and embodied carbon in real-time
  4. Post-processing pipeline (runs automatically after experiment):

    • create_dataset.sh: Extracts GPU metrics from nsys profiles (.sqlite files) and processes nvidia-smi data
    • predict_power.py: Uses trained ML models (from models/ directory) to predict power consumption per job
    • analysis.sh: Calculates final carbon emissions using predicted power data
  5. Output is written to runs/<experiment_name>_<timestamp>/:

    • workload_pids.json: Mapping of workload names to PIDs
    • pstree.log: Process tree snapshots
    • perf.log: CPU performance counters
    • nvidia_smi_power.csv: GPU power data
    • wattsup_log.csv: System power data
    • event_summaries.jsonl: Real-time analysis results
    • event_counts/: Processed GPU metrics from nsys profiles
    • Job-specific CSV files with predicted power and final carbon calculations

Configuration

Edit env to configure:

  • Monitor scripts to run
  • NSYS profiler settings
  • Sudo password file location (optional)

Dependencies

  • Python 3 with required packages (see workloads/requirements.txt)
  • NVIDIA drivers and nvidia-smi
  • perf (Linux performance monitoring)
  • nsys (NVIDIA Nsight Systems, optional)
  • jq (JSON processor)
  • wattsup power meter (optional, for physical power measurement)

Notes

  • All paths in the framework are relative to the carma_framework directory
  • The tool should be run from the carma_framework directory
  • Workload scripts in JSON configs should use paths relative to the framework root (e.g., workloads/finetune_model.py)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published