Merzbild.jl benchmarks

Various benchmarks and comparisons to other open-source codes are provided here for reference test cases.

Couette flow, serial

Comparison with SPARTA are provided for a single-species (argon) Couette flow test case with 50000 particles and 50 cells (averaging over 36k timesteps after t>14000). The computation is serial. Timing in Merzbild.jl providedd by TimerOutputs.jl, timing in SPARTA provided by the inbuilt timers.

Merzbild.jl version 0.6.2, run with --check-bounds=no -O3.

SPARTA version 4Sep2024, compiled with -O3.

Intel Core i9-13900K, 128 GB RAM

Ubuntu 22.04.5, Julia version 1.11.2, gcc version 11.4.0.

Merzbild.jl

 ──────────────────────────────────────────────────────────────────────────
                                  Time                    Allocations      
                         ───────────────────────   ────────────────────────
    Tot / % measured:         30.1s /  95.8%            697MiB /   0.4%    

 Section         ncalls     time    %tot     avg     alloc    %tot      avg
 ──────────────────────────────────────────────────────────────────────────
 sort             50.0k    10.7s   37.0%   213μs     0.00B    0.0%    0.00B
 convect          50.0k    6.61s   22.9%   132μs     0.00B    0.0%    0.00B
 collide          2.50M    6.33s   22.0%  2.53μs     0.00B    0.0%    0.00B
 props compute    36.0k    5.19s   18.0%   144μs     0.00B    0.0%    0.00B
 sampling             1   2.61ms    0.0%  2.61ms   3.05MiB  100.0%  3.05MiB
 I/O                  1    369μs    0.0%   369μs      240B    0.0%     240B
 ──────────────────────────────────────────────────────────────────────────

SPARTA

Loop time of 31.1462 on 1 procs for 50000 steps with 50000 particles

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Move    | 8.1848     | 8.1848     | 8.1848     |   0.0 | 26.28
Coll    | 11.155     | 11.155     | 11.155     |   0.0 | 35.81
Sort    | 2.8435     | 2.8435     | 2.8435     |   0.0 |  9.13
Comm    | 0.0032952  | 0.0032952  | 0.0032952  |   0.0 |  0.01
Modify  | 8.9578     | 8.9578     | 8.9578     |   0.0 | 28.76
Output  | 0.00046062 | 0.00046062 | 0.00046062 |   0.0 |  0.00
Other   |            | 0.001451   |            |       |  0.00

M1 Pro (Macbook Pro), 32 GB RAM

MacOS 12.7.5, Julia version 1.11.2, Apple clang version 14.0.0.

Merzbild.jl

──────────────────────────────────────────────────────────────────────────
                                 Time                    Allocations      
                        ───────────────────────   ────────────────────────
   Tot / % measured:         33.8s /  96.3%            693MiB /   0.4%    

Section         ncalls     time    %tot     avg     alloc    %tot      avg
──────────────────────────────────────────────────────────────────────────
sort             50.0k    11.6s   35.7%   233μs     0.00B    0.0%    0.00B
collide          2.50M    8.57s   26.3%  3.43μs     0.00B    0.0%    0.00B
convect          50.0k    6.86s   21.0%   137μs     0.00B    0.0%    0.00B
props compute    36.0k    5.51s   16.9%   153μs     0.00B    0.0%    0.00B
sampling             1   2.64ms    0.0%  2.64ms   3.05MiB  100.0%  3.05MiB
I/O                  1    139μs    0.0%   139μs      240B    0.0%     240B
──────────────────────────────────────────────────────────────────────────

SPARTA

Loop time of 51.1924 on 1 procs for 50000 steps with 50000 particles

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Move    | 13.622     | 13.622     | 13.622     |   0.0 | 26.61
Coll    | 13.132     | 13.132     | 13.132     |   0.0 | 25.65
Sort    | 2.8252     | 2.8252     | 2.8252     |   0.0 |  5.52
Comm    | 0.002234   | 0.002234   | 0.002234   |   0.0 |  0.00
Modify  | 21.607     | 21.607     | 21.607     |   0.0 | 42.21
Output  | 0.0025806  | 0.0025806  | 0.0025806  |   0.0 |  0.01
Other   |            | 0.0009735  |            |       |  0.00