Literatur vom gleichen Autor/der gleichen Autor*in
plus bei Google Scholar

Bibliografische Daten exportieren
 

Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines

Titelangaben

Isenko, Alexander ; Mayer, Ruben ; Jedele, Jeffrey ; Jacobsen, Hans-Arno:
Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines.
In: Proceedings of the 2022 International Conference on Management of Data. - New York : Association for Computing Machinery , 2022 . - S. 1825-1839
ISBN 978-1-4503-9249-5
DOI: https://doi.org/10.1145/3514221.3517848

Abstract

Preprocessing pipelines in deep learning aim to provide sufficient data throughput to keep the training processes busy. Maximizing resource utilization is becoming more challenging as the throughput of training processes increases with hardware innovations (e.g., faster GPUs, TPUs, and inter-connects) and advanced parallelization techniques that yield better scalability. At the same time, the amount of training data needed in order to train increasingly complex models is growing. As a consequence of this development, data preprocessing and provisioning are becoming a severe bottleneck in end-to-end deep learning pipelines.

In this paper, we provide an in-depth analysis of data preprocessing pipelines from four different machine learning domains. We introduce a new perspective on efficiently preparing datasets for end-to-end deep learning pipelines and extract individual trade-offs to optimize throughput, preprocessing time, and storage consumption. Additionally, we provide an open-source profiling library that can automatically decide on a suitable preprocessing strategy to maximize throughput. By applying our generated insights to real-world use-cases, we obtain an increased throughput of 3x to 13x compared to an untuned system while keeping the pipeline functionally identical. These findings show the enormous potential of data pipeline tuning.

Weitere Angaben

Publikationsform: Aufsatz in einem Buch
Begutachteter Beitrag: Ja
Institutionen der Universität: Fakultäten
Fakultäten > Fakultät für Mathematik, Physik und Informatik
Fakultäten > Fakultät für Mathematik, Physik und Informatik > Institut für Informatik > Lehrstuhl Data Systems
Fakultäten > Fakultät für Mathematik, Physik und Informatik > Institut für Informatik > Lehrstuhl Data Systems > Lehrstuhl Data Systems - Univ.-Prof. Dr. Ruben Mayer
Fakultäten > Fakultät für Mathematik, Physik und Informatik > Institut für Informatik
Titel an der UBT entstanden: Nein
Themengebiete aus DDC: 000 Informatik,Informationswissenschaft, allgemeine Werke > 004 Informatik
Eingestellt am: 26 Apr 2023 11:38
Letzte Änderung: 05 Feb 2024 07:32
URI: https://eref.uni-bayreuth.de/id/eprint/76044