A simulation of a GPU (Graphics Processing Unit) architecture developed in Python. This project emulates the execution flow of parallel kernels across multiple Streaming Multiprocessors (SMs) and cores.
The simulation replicates the hardware-software interface of modern GPUs:
- Streaming Multiprocessors (SM): Implemented using
multiprocessing.Processto simulate independent hardware units. - CUDA Cores: Emulated via
threading.Threadwithin each SM to handle concurrent execution of threads. - Memory Hierarchy: * Global Memory (VRAM): Shared across all SMs through the
GPUMemoryclass.- Shared/Local Memory: Fast, per-SM memory implemented in
SMMemory.
- Shared/Local Memory: Fast, per-SM memory implemented in
- Synchronization: Uses
threading.Barrierto coordinate SIMT (Single Instruction, Multiple Threads) execution and prevent race conditions.
The simulator includes several parallel algorithms to demonstrate different memory access patterns:
- Arithmetic Kernels: Vector Increment and Vector Sum.
- Generalized Blur: A stencil-based operation using a configurable radius and shared memory optimization.
- Dot Product (Reduction): Uses atomic-style locking (
multiprocessing.Value.get_lock()) to aggregate partial sums into a global result.
To run the simulation and test different kernels, execute the main entry point:
python src/gpu.pyNote: You can switch between kernels (INCR, SUMAR, DIFUMINAR, ESCALAR) by modifying the
KERNEL_A_PROBARvariable ingpu.py.
Developed by Alberto Peña and Fabio Torres (March 2026).