-
Notifications
You must be signed in to change notification settings - Fork 45
Performance of phasicFlow
The benchmark for performance test is the simulation of a drum rotating at 12 rpm. Since particles are moving in this drum and some part of the drum is empty, so this benchmark can resemble most of the typical granular flows. The number of particles varied between 100K and 11M. Around 30-45 vol% of the drum was full of particles in all simulations. The time step for integration was 0.00001 s, so each second of simulation took 100K iteration to complete. In all evaluations, we excluded the time for writing the results to the disk.
The execution time for one second of simulation was measured for various number of particles in the simulation. Both CPU and GPU builds showed almost a linear behavior with respect to number of particles, which is perfect for DEM simulation.

For three different problem sizes, we tested the performance of PhasicFlow with respect to number of CPU cores. A linear relation was found in between the execution time and number of cores in the log-log plot.

It is important for us to know at which problem size we are using the highest computational capacity of the system. The figure bellow shows the execution time per particle (for one second of simulation). For small size problems, this value is high, since each kernel lunch (entering parallel zone) has some overhead and the amount of computations in the parallel zone with respect to this overhead is low for small size problems. But for larger problems, these overheads become more negligible with respect to amount of computations, and we see almost constant execution time per particles for larger problems.

This benchmark compares the performance of phasicFlow with a well-stablished commercial DEM software for simulating a rotating drum with varying particle counts (250k to 8M particles). The benchmark measures both computational efficiency and memory usage across different hardware configurations.
Figure 1. Commercial DEM simulation snapshot
Figure 2. phasicFlow simulation snapshot
| System | CPU | GPU | Operating System |
|---|---|---|---|
| Laptop | Intel i9-13900HX 2.2 GHz | NVIDIA GeForce RTX 4050Ti 6G | Windows 11 24H2 |
| Workstation | Intel Xeon 4210 2.2 GHz | NVIDIA RTX A4000 16G | Ubuntu 22.04 |
| Case | Particle Diameter | Particle Count | Drum Length | Drum Radius |
|---|---|---|---|---|
| 250k | 6 mm | 250,000 | 0.8 m | 0.2 m |
| 500k | 5 mm | 500,000 | 0.8 m | 0.2 m |
| 1M | 4 mm | 1,000,000 | 0.8 m | 0.2 m |
| 2M | 3 mm | 2,000,000 | 1.2 m | 0.2 m |
| 4M | 3 mm | 4,000,000 | 1.6 m | 0.2 m |
| 8M | 2 mm | 8,000,000 | 1.6 m | 0.2 m |
The time step for all simulations was set to 1.0e-5 seconds and the simulation ran for 4 seconds.
| Software | 250k | 500k | 1M | 2M | 4M | 8M |
|---|---|---|---|---|---|---|
| phasicFlow-4050Ti | 54 min | 111 min | 216 min | 432 min | - | - |
| Commercial DEM-4050Ti | 68 min | 136 min | 275 min | 570 min | - | - |
| phasicFlow-A4000 | 38 min | 73 min | 146 min | 293 min | 589 min | 1188 min |
The execution time scales linearly with particle count. phasicFlow demonstrates approximately:
- 20% faster calculation than the well-established commercial DEM software on the same hardware
- 30% performance improvement when using the NVIDIA RTX A4000 compared to the RTX 4050Ti
Figure 3. Calculation time comparison between phasicFlow and the well-established commercial DEM software.
| Software | 250k | 500k | 1M | 2M | 4M | 8M |
|---|---|---|---|---|---|---|
| phasicFlow-4050Ti | 252 MB | 412 MB | 710 MB | 1292 MB | - | - |
| Commercial DEM-4050Ti | 485 MB | 897 MB | 1525 MB | 2724 MB | - | - |
| phasicFlow-A4000 | 344 MB | 480 MB | 802 MB | 1386 MB | 2590 MB | 4966 MB |
Memory efficiency comparison:
- phasicFlow uses approximately 0.7 GB of memory per million particles
- Commercial DEM software uses approximately 1.2 GB of memory per million particles
- phasicFlow shows ~42% lower memory consumption compared to the commercial alternative
- The memory usage scales linearly with particle count in both software packages. But due to memory limitations on GPUs, it is possible to run larger simulation on GPUs with phasicFlow.
The simulation case setup files are available in this folder for users interested in performing similar benchmarks on their own hardware. These files can be used to reproduce the tests and compare performance across different systems.
This benchmark compares the performance of phasicFlow with a well-stablished commercial DEM software for simulating a rotating drum with varying particle counts (250k to 8M particles). The benchmark measures both computational efficiency and memory usage across different hardware configurations.
Figure 1. Commercial DEM simulation snapshot
Figure 2. phasicFlow simulation snapshot
| System | CPU | GPU | Operating System |
|---|---|---|---|
| Laptop | Intel i9-13900HX 2.2 GHz | NVIDIA GeForce RTX 4050Ti 6G | Windows 11 24H2 |
| Workstation | Intel Xeon 4210 2.2 GHz | NVIDIA RTX A4000 16G | Ubuntu 22.04 |
| Case | Particle Diameter | Particle Count | Drum Length | Drum Radius |
|---|---|---|---|---|
| 250k | 6 mm | 250,000 | 0.8 m | 0.2 m |
| 500k | 5 mm | 500,000 | 0.8 m | 0.2 m |
| 1M | 4 mm | 1,000,000 | 0.8 m | 0.2 m |
| 2M | 3 mm | 2,000,000 | 1.2 m | 0.2 m |
| 4M | 3 mm | 4,000,000 | 1.6 m | 0.2 m |
| 8M | 2 mm | 8,000,000 | 1.6 m | 0.2 m |
The time step for all simulations was set to 1.0e-5 seconds and the simulation ran for 4 seconds.
| Software | 250k | 500k | 1M | 2M | 4M | 8M |
|---|---|---|---|---|---|---|
| phasicFlow-4050Ti | 54 min | 111 min | 216 min | 432 min | - | - |
| Commercial DEM-4050Ti | 68 min | 136 min | 275 min | 570 min | - | - |
| phasicFlow-A4000 | 38 min | 73 min | 146 min | 293 min | 589 min | 1188 min |
The execution time scales linearly with particle count. phasicFlow demonstrates approximately:
- 20% faster calculation than the well-established commercial DEM software on the same hardware
- 30% performance improvement when using the NVIDIA RTX A4000 compared to the RTX 4050Ti
Figure 3. Calculation time comparison between phasicFlow and the well-established commercial DEM software.
| Software | 250k | 500k | 1M | 2M | 4M | 8M |
|---|---|---|---|---|---|---|
| phasicFlow-4050Ti | 252 MB | 412 MB | 710 MB | 1292 MB | - | - |
| Commercial DEM-4050Ti | 485 MB | 897 MB | 1525 MB | 2724 MB | - | - |
| phasicFlow-A4000 | 344 MB | 480 MB | 802 MB | 1386 MB | 2590 MB | 4966 MB |
Memory efficiency comparison:
- phasicFlow uses approximately 0.7 GB of memory per million particles
- Commercial DEM software uses approximately 1.2 GB of memory per million particles
- phasicFlow shows ~42% lower memory consumption compared to the commercial alternative
- The memory usage scales linearly with particle count in both software packages. But due to memory limitations on GPUs, it is possible to run larger simulation on GPUs with phasicFlow.
The simulation case setup files are available in this folder for users interested in performing similar benchmarks on their own hardware. These files can be used to reproduce the tests and compare performance across different systems.
- Features
- How to build
- Tutorials (compatible with v-1.0)
- Performance Test
- How to contribute to PhasicFlow
- Coding Style Guidelines
- About PhasicFlow