Literature by the same author
plus at Google Scholar

Bibliografische Daten exportieren
 

MPI-based multi-GPU extension of the Lattice Boltzmann Method

Title data

Häusl, Fabian:
MPI-based multi-GPU extension of the Lattice Boltzmann Method.
Bayreuth , 2019 . - 60 p.
( Bachelor thesis, 2019 , Universität Bayreuth, Fakultät für Mathematik, Physik und Informatik)
DOI: https://doi.org/10.15495/EPub_UBT_00005689

Official URL: Volltext

Abstract in another language

This bachelor thesis presents a method to parallelise the Lattice Boltzmann method on several graphics processing units by coupling it with the Message Passing Interface (MPI). This task is mainly related to the limited on-board memory of a single graphics processing unit and the memory intensity of the Lattice Boltzmann method. A concrete algorithm for the simulation software FluidX3D is shown and validated. This has the flexibility to work with different extensions of the Lattice Boltzmann method: besides complex geometric boundary conditions, heat flux and condensation processes, the simulation of free surfaces is also possible. A special challenge is to combine the function-oriented MPI communication with the object-oriented approach of FluidX3D.
This thesis wil explain various optimizations of the multi-GPU extension: on the one hand, they rely on knowledge about the programming language OpenCL and the hardware of GPUs, on the other hand, the algorithm itself is extended in such a way that an overlap of calculation and memory transfer can take place. The optimizations will be confirmed by runtime measurements on two different clusters with up to 4 GPUs at the same time. The multi-GPU algorithm reaches 95% of its theoretical optimum in weak-scaling almost independent of the number of GPUs used. In strong-scaling the efficiency of 4 GPUs is 77%. Up to 13600 MLUPs when using 4 Radeon VII GPUs were achieved for a cubic benchmark setup.

Abstract in another language

In dieser Bachelorarbeit wird eine Methode vorgestellt, um die Lattice Boltzmann Methode durch Kopplung mit dem Message Passing Interface (MPI) auf mehreren Grafikprozessoren zu parallelisieren. Diese Aufgabe stellt sich v.a. im Hinblick auf den begrenzten Speicherplatz eines einzelnen Grafikprozessors und der Speicherintensität der Lattice Boltzmann Methode. Es wird ein konkreter Algorithmus für die Simulationssoftware FluidX3D gezeigt und validiert. Dieser weist die Flexibilität auf, mit verschiedenen Erweiterungen der Lattice Boltzmann Methode zu funktionieren: neben komplexen geometrischen Randbedingungen, Wärmeflüssen und Kondensationsprozessen ist auch die Simulation von freien Oberflächen möglich. Eine besondere Herausforderung ist dabei, die funktionsorientierte MPI-Kommunikation mit dem objektorientierten Ansatz von FluidX3D sinnvoll zu vereinen.
Im Verlauf der Arbeit werden verschiedene Optimierungen der multi-GPU Erweiterung erläutert: einerseits greifen diese auf Wissen über die Programmiersprache OpenCL und die Hardware von GPUs zurück, andererseits wird der Algorithmus selbst dahingehend erweitert, dass ein Überlapp von Berechnung und Speichertransfer stattfinden kann. Die Optimierungen werden dabei durch Laufzeitmessungen auf zwei verschiedenen Clustern mit bis zu 4 GPUs gleichzeitig bestätigt. Der multi-GPU Algrithmus erreicht fast unabhängig von der Anzahl verwendeter GPUs 95% seines theoretischen Optimums im weak-scaling, im strong-scaling ergibt sich bei 4 GPUs einer Effizienz von 77%. Es wurden bis zu 13600 MLUPs mittels 4 Radeon VII GPUs für ein würfelförmiges benchmark setup erreicht.

Further data

Item Type: Bachelor thesis
Additional notes: The Lattice Bolzmann Method (LBM) is a quite novel approach in computational fluid dynamics (CFD), that is gaining increasing popularity [1]. This is especially due to its properties: as opposed to classical Navier-Stokes Solvers, the LBM can easily handle complex boundary conditions and is compatible with a large amount of application-oriented extensions (e.g. heat diffusion, multiphase and multicomponent flows, immersed boundary methods for simulation of deformable boundaries [2]). Furthermore, its algorithm is highly parallelizable, so for large and runtime intensive setups speedup can easily be reached by computing on multiple CPU (central processing unit) cores or by using GPUs (graphics processing units). Because GPUs were originally developed to process geometric data in parallel and quickly, their architecture superior to CPUs in computing power and memory bandwidth. However, since accuracy does not play such a major role in graphics processing, in contrast to CPUs they are only optimized for 32-bit floating point values. Fortunately, float accuracy is usually sufficient for the LBM. Another difference to CPUs is the amount of available memory: while the RAM available toCPU can be several hundred gigabytes, the memory space on GPUs is very limited. The GPUs used for this thesis had 16 GB (Radeon VII) or 12 GB (Nvidia Tesla P100) available. Because the LBM is memory-intensive while at the same time the memory on GPUs is quite limited, this directly restricts the size of the simulation box. Multi-GPU implementations of the LBM are able to widen this limitation as well as to gain even bigger speedup.

Many multi-GPU implementations of the LBM have been realized, trying to reduce its communication overhead to a negligible size. In the weak scaling, the efficiency reaches from 69.9 % using 32 GPUs with a total performance of about 2000 MLUPs (see [3]) to 98.5 % using 64 GPUs with a total performance of about 35000 MLUPs (see [4]). However, the latter implementation - as well as [1], [5], [6], [7], [8] - is limited to a subdivision of the domain along a preferred axis. [4] does not provide a strong scalability test for cubic domains, while [1] reaches an efficiency of 91 % with 8 GPUs and a total performance of about 3000 MLUPs here. [3] is the only one of the references mentioned above that can subdivide the domain along all three spatial directions. All of them use halo nodes to organize the data to be transferred between the GPUs. While [4] runs with CUDA as well as OpenCL via the wrappers PyCUDA and PyOpenCL, all other references only work with CUDA. In order to manage the different computational devices, POSIX, MPI or ZeroMQ is used in the different implementations.

For Bayreuths new LBM software - FluidX3D - multi-GPU support is provided and evaluated together with this thesis. That therefore larger systems can be simulated is particularly important for a project in the context of the Collaborative Research Centre 1357 Microplastics (german: Sonderforschungsbereich Mikroplastik). In this project, the exchange of microplastic particles at the air-water interface is studied. Because of the buoyancy many microplastic particles might swim directly under the water surface. Processes such as the impact of a raindrop on the surface of the water cause small droplets to be thrown into the air. These could serve as carriers for the micropastic particles and thus explain a potential migration path from the hydro- to the atmosphere. In order to understand the properties of this complicated fluid mechanical mechanism, the LBM is right now extended by the immersed boundary method and a module for the simulation of free surfaces.
Keywords: Lattice Boltzmann; High-performance Multi-GPU; MPI-communication; Fluid Dynamics
Institutions of the University: Faculties > Faculty of Mathematics, Physics und Computer Science > Department of Physics > Professor Theoretical Physics VI - Simulation and Modelling of Biofluids > Professor Theoretical Physics VI - Simulation and Modelling of Biofluids - Univ.-Prof. Dr. Stephan Gekle
Faculties
Faculties > Faculty of Mathematics, Physics und Computer Science
Faculties > Faculty of Mathematics, Physics und Computer Science > Department of Physics
Faculties > Faculty of Mathematics, Physics und Computer Science > Department of Physics > Professor Theoretical Physics VI - Simulation and Modelling of Biofluids
Result of work at the UBT: Yes
DDC Subjects: 000 Computer Science, information, general works > 004 Computer science
500 Science > 530 Physics
Date Deposited: 03 Jul 2021 21:00
Last Modified: 22 Feb 2022 08:37
URI: https://eref.uni-bayreuth.de/id/eprint/66430