Tiger Team: tt-palabos-multinode
Scalability of Lattice Boltzmann simulations of electrochemical systems using Palabos
The scope of this project was to enable a massively parallel processing of Palabos on the bwForCluster JUSTUS 2 to study complex electrochemical systems using the well-known Lattice Boltzmann (LB) method. Palabos is a software library for general-purpose computational fluid dynamics (CFD), with a kernel based on the LB method. The library is written in C++, is based on MPI for parallel executions, and uses C++ templates and forms of object-oriented polymorphism to support a broad range of LB models . Its development team showed that Palabos scales well (70%) up to 16,000 cores on recent high-performance computing (HPC) clusters. Therefore, Palabos was selected by us for microstructure resolved LB simulations of transport processes and multi-component flow in innovative battery technologies and fuel cells . Due to its extensive parallelisation, large simulation domains and computationally challenging couplings of multiple transport equations for sound and physically realistic results should now be feasible. However, preliminary performance tests of Palabos on JUSTUS 2 showed an unreasonable scaling as soon as more than one compute node was used. To fix this issue, MPI communication between nodes must use the PSM2 library since JUSTUS 2 relies on Intel's Omni-Path as high-performance communication architecture. This finding led to adjustments in our Open MPI settings, in which two environment variables were set. Furthermore, we studied domain decomposition, compile options, CPU pinning, and caching-size effects with respect to scalability and performance of Palabos. A wisely chosen domain decomposition significantly increases the performance of Palabos and is key to efficient usage of HPC resources for CFD applications. While CPU pinning may also be useful to further increase the performance of Palabos, compile options showed rather negligible effects on code performance. A degraded performance of Palabos resulting from cache-size effects as observed in  on older CPU generations could be ruled out on JUSTUS 2.
 J. Latt et al. (2021): Palabos: Parallel Lattice Boltzmann Solver, Comput. Math. with Appl., vol. 81, pp. 334–350, doi: 10.1016/j.camwa.2020.03.022.
 T. Danner et al. (2016): Characterization of gas diffusion electrodes for metal-air batteries, J. Power Sources, vol. 324, pp. 646–656, doi: 10.1016/j.jpowsour.2016.05.108.
 P. Kopta et al. (2011): Parallel application benchmarks and performance evaluation of the Intel Xeon 7500 family processors, Procedia Comput. Sci., vol. 4, pp. 372–381, doi: 10.1016/j.procs.2011.04.039.
Mitglieder des Tiger-Teams:
DLR Institut für Technische Thermodynamik, Helmholtz-Institut Ulm (HIU) für Elektrochemische Energiespeicherung; HPC-Kompetenzzentrum für computergestützte Chemie und Quantenwissenschaften, Universität Ulm