DEEPMETAL

Transforming AI into Seamless Embedded Powerhouse

Compiler for high-level ML libraries to run your models on edge

Built with:

Complete pipeline for converting PyTorch neural networks to optimized C, C++, and LLVM code for deployment on embedded systems.

Quick Start

# 1. create and train a model
python export_model.py --model-type hybrid --epochs 5

# 2. test all converters
./test_conversion.sh

# 3. validate entire workflow
python test_complete_workflow.py

Components

1. Model Export (`export_model.py`)

Creates PyTorch models compatible with the conversion pipeline.

Supported architectures:

linear: Fully connected layers only (784→128→64→10)
conv: Convolutional layers + linear classifier
hybrid: Mixed conv + linear layers (recommended)

Usage:

# train a hybrid model for 5 epochs
python export_model.py --model-type hybrid --epochs 5 --batch-size 64

# create model without training (for testing)
python export_model.py --model-type linear --no-train

# train on gpu if available
python export_model.py --model-type conv --device cuda --epochs 10

Output:

models/mnist_hybrid_model.pth - Complete model
models/mnist_hybrid_model_state_dict.pth - State dict only
test_conversion.sh - Script to test all converters

2. Dynamic C Converter (`converter.py`)

Generates pure C code optimized for ARM Cortex-M4 microcontrollers.

Features:

Static memory allocation
Ping-pong buffer optimization
ARM Cortex-M4 compilation
Minimal dependencies

Usage:

python converter.py models/mnist_hybrid_model.pth

Output:

output/
├── model.h          # header with declarations
├── model.c          # implementation
└── model.o          # compiled ARM object file

Generated API:

int predict(const float *input, int input_h, int input_w, int input_ch);

3. LLVM IR Converter (`llvm.py`)

Generates LLVM intermediate representation with advanced optimizations.

Features:

Cross-platform target support
Advanced optimization passes
Loop unrolling and vectorization
Multiple architecture support

Usage:

python llvm.py models/mnist_hybrid_model.pth

Output:

output/
├── model.ll         # llvm ir code
└── model_llvm.o     # optimized object file

4. C++ Template Converter (`pytoc.py`)

Generates modern C++ code with STL containers and type safety.

Features:

Template-based architecture
STL containers for safety
Easy debugging and modification
JSON configuration export

Usage:

python pytoc.py models/mnist_hybrid_model.pth

Output:

output/
├── dynamic_model.cpp     # complete c++ implementation
├── dynamic_model         # compiled executable
└── model_config.json     # architecture metadata

Supported Layer Types

Linear Layers (`torch.nn.Linear`)

nn.Linear(in_features, out_features, bias=True)

Fully connected transformation
Optional bias terms
Efficient matrix multiplication

Convolutional Layers (`torch.nn.Conv2d`)

nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0)

2D convolution with configurable parameters
Stride and padding support
Boundary condition handling

ReLU Activation (`torch.nn.ReLU`)

nn.ReLU()

Element-wise max(0, x) operation
Hardware-optimized implementation

Memory Layout and Optimization

Data Organization

// weights stored as [output_neurons][input_features]
const float w0[128][784] = {...};

// ping-pong buffers for layer outputs
float buf1[MAX_BUFFER_SIZE], buf2[MAX_BUFFER_SIZE];

Convolution Memory Access

// input indexed as [channel][height][width]
int input_idx = ic * input_h * input_w + ih * input_w + iw;

// output organized as [channel][height][width] 
int output_idx = oc * out_h * out_w + oh * out_w + ow;

Buffer Management

Ping-pong buffers: Alternate between buf1 and buf2 for each layer
Static allocation: No dynamic memory allocation for embedded safety
Size optimization: Reuse buffers across layers

Target Platform Configuration

ARM Cortex-M4 (Default)

clang --target=armv7em-none-eabi \
      -mcpu=cortex-m4 \
      -mthumb \
      -mfloat-abi=hard \
      -mfpu=fpv4-sp-d16 \
      -O3

Custom Targets

Modify compiler flags in each converter for different targets:

x86: --target=x86_64-linux-gnu
ARM64: --target=aarch64-linux-gnu
RISC-V: --target=riscv32-unknown-elf

Performance Characteristics

Model Size vs. Accuracy Trade-offs

Linear (784→128→64→10):  ~107K parameters, ~95% accuracy
Conv (1×3×3→16×3×3→32): ~23K parameters, ~98% accuracy  
Hybrid (optimized):      ~15K parameters, ~97% accuracy

Memory Requirements

Flash (weights):  15KB - 400KB depending on architecture
RAM (buffers):    2KB - 64KB for intermediate computations
Stack:           <1KB for local variables

Inference Speed (Cortex-M4 @ 80MHz)

Linear model:    ~2ms per inference
Conv model:      ~8ms per inference
Hybrid model:    ~5ms per inference

Advanced Usage

Custom Model Architecture

# create your own sequential model
custom_model = nn.Sequential(
    nn.Conv2d(1, 16, 3, padding=1),
    nn.ReLU(),
    nn.Conv2d(16, 32, 3, stride=2),
    nn.ReLU(),
    nn.Flatten(),
    nn.Linear(32 * 14 * 14, 64),
    nn.ReLU(),
    nn.Linear(64, 10)
)

# export for conversion
torch.save(custom_model, 'custom_model.pth')

Integration with Embedded Systems

// example main.c for microcontroller
#include "model.h"
#include <stdio.h>

float sensor_data[784];  // input from sensors

int main() {
    // collect sensor data
    read_sensors(sensor_data);
    
    // run inference  
    int prediction = predict(sensor_data, 28, 28, 1);
    
    // act on prediction
    handle_prediction(prediction);
    
    return 0;
}

Debugging and Validation

# validate complete workflow
python test_complete_workflow.py

# compare outputs between pytorch and c
python validate_conversion.py model.pth

# profile performance
python benchmark_inference.py model.pth

Troubleshooting

Common Issues

1. Unsupported layer types

Error: Layer type 'BatchNorm2d' not supported

Solution: Remove or replace unsupported layers. Currently supported: Linear, Conv2d, ReLU.

2. Memory buffer overflow

Error: Layer output size exceeds buffer capacity

Solution: Increase MAX_BUFFER_SIZE in converter or reduce model size.

3. LLVM compilation failures

Error: llvmlite not installed

Solution: pip install llvmlite or use C/C++ converters instead.

4. ARM toolchain missing

Error: clang: command not found

Solution: Install ARM GCC toolchain or use x86 targets for testing.

Model Architecture Guidelines

For embedded deployment:

Keep total parameters under 100K
Avoid large convolutional layers
Use stride > 1 to reduce spatial dimensions quickly
Prefer ReLU over other activations

For best converter compatibility:

Use nn.Sequential models
Avoid custom layers or complex control flow
Keep all operations differentiable
Save complete models, not just state dicts

Dependencies

Python packages:

pip install torch torchvision numpy llvmlite

System tools:

# ubuntu/debian
sudo apt install clang gcc-arm-none-eabi

# macos
brew install llvm arm-none-eabi-gcc

# or use conda
conda install pytorch torchvision llvmlite

Next Steps

Create your model: python export_model.py --model-type hybrid
Test conversion: ./test_conversion.sh
Integrate with firmware: Use generated .o files in your embedded project
Optimize further: Profile and adjust model architecture for your constraints
Deploy: Flash to target hardware and validate real-world performance

For more advanced use cases, see the individual converter documentation and consider extending the layer support for your specific requirements.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.vscode		.vscode
archive		archive
backend		backend
frontend		frontend
lib		lib
public		public
src		src
.gitignore		.gitignore
README.md		README.md
commands.txt		commands.txt
eslint.config.js		eslint.config.js
flask_app.py		flask_app.py
index.html		index.html
mlir.txt		mlir.txt
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
test.txt		test.txt
vite.config.js		vite.config.js

Luthiraa/DeepMetal

Folders and files

Latest commit

History

Repository files navigation

DEEPMETAL

Built with:

Quick Start

Components

1. Model Export (export_model.py)

2. Dynamic C Converter (converter.py)

3. LLVM IR Converter (llvm.py)

4. C++ Template Converter (pytoc.py)

Supported Layer Types

Linear Layers (torch.nn.Linear)

Convolutional Layers (torch.nn.Conv2d)

ReLU Activation (torch.nn.ReLU)

Memory Layout and Optimization

Data Organization

Convolution Memory Access

Buffer Management

Target Platform Configuration

ARM Cortex-M4 (Default)

Custom Targets

Performance Characteristics

Model Size vs. Accuracy Trade-offs

Memory Requirements

Inference Speed (Cortex-M4 @ 80MHz)

Advanced Usage

Custom Model Architecture

Integration with Embedded Systems

Debugging and Validation

Troubleshooting

Common Issues

Model Architecture Guidelines

Dependencies

Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

1. Model Export (`export_model.py`)

2. Dynamic C Converter (`converter.py`)

3. LLVM IR Converter (`llvm.py`)

4. C++ Template Converter (`pytoc.py`)

Linear Layers (`torch.nn.Linear`)

Convolutional Layers (`torch.nn.Conv2d`)

ReLU Activation (`torch.nn.ReLU`)

Packages