Titelangaben
Mayer, Ruben ; Orujzade, Kamil ; Jacobsen, Hans-Arno:
2ps : High-quality edge partitioning with two-phase streaming.
arXiv
,
2020
DOI: https://doi.org/10.48550/arXiv.2001.07086
Abstract
Graph partitioning is an important preprocessing step to distributed graph processing. In edge partitioning, the edge set of a given graph is split into k equally-sized partitions, such that the replication of vertices across partitions is minimized. Streaming is a viable approach to partition graphs that exceed the memory capacities of a single server. The graph is ingested as a stream of edges, and one edge at a time is immediately and irrevocably assigned to a partition based on a scoring function. However, streaming partitioning suffers from the uninformed assignment problem: At the time of partitioning early edges in the stream, there is no information available about the rest of the edges. As a consequence, edge assignments are often driven by balancing considerations, and the achieved replication factor is comparably high. In this paper, we propose 2PS, a novel two-phase streaming algorithm for high-quality edge partitioning. In the first phase, vertices are separated into clusters by a lightweight streaming clustering algorithm. In the second phase, the graph is re-streamed and edge partitioning is performed while taking into account the clustering of the vertices from the first phase. Our evaluations show that 2PS can achieve a replication factor that is comparable to heavy-weight random access partitioners while inducing orders of magnitude lower memory overhead.
Weitere Angaben
Publikationsform: | Preprint, Postprint |
---|---|
Institutionen der Universität: | Fakultäten Fakultäten > Fakultät für Mathematik, Physik und Informatik Fakultäten > Fakultät für Mathematik, Physik und Informatik > Institut für Informatik > Lehrstuhl Data Systems Fakultäten > Fakultät für Mathematik, Physik und Informatik > Institut für Informatik > Lehrstuhl Data Systems > Lehrstuhl Data Systems - Univ.-Prof. Dr. Ruben Mayer Fakultäten > Fakultät für Mathematik, Physik und Informatik > Institut für Informatik |
Titel an der UBT entstanden: | Nein |
Themengebiete aus DDC: | 000 Informatik,Informationswissenschaft, allgemeine Werke > 004 Informatik |
Eingestellt am: | 25 Apr 2023 10:15 |
Letzte Änderung: | 05 Feb 2024 07:48 |
URI: | https://eref.uni-bayreuth.de/id/eprint/76037 |