Literatur vom gleichen Autor/der gleichen Autor*in
plus bei Google Scholar

Bibliografische Daten exportieren
 

Embedding Semantic Anchors to Guide Topic Models on Short Text Corpora

Titelangaben

Steuber, Florian ; Schneider, Sinclair ; Schönfeld, Mirco:
Embedding Semantic Anchors to Guide Topic Models on Short Text Corpora.
In: Big Data Research. Bd. 27 (2022) . - 100293.
ISSN 2214-5796
DOI: https://doi.org/10.1016/j.bdr.2021.100293

Angaben zu Projekten

Projekttitel:
Offizieller Projekttitel
Projekt-ID
Africa Multiple Cluster of Excellence at the University of Bayreuth
EXC 2052/1 – 390713894

Projektfinanzierung: Deutsche Forschungsgemeinschaft

Abstract

Documents on the social media platform Twitter are formulated in short and simple style, instead of being written extensively and elaborately. Further, the core message of a post is often encoded into characteristic phrases called hashtags. These hashtags illustrate the semantics of a post or tie it to a specific topic. In this paper, we propose multiple approaches of using hashtags and their surrounding texts to improve topic modeling of short texts. We use transfer learning by applying a pre-trained word embedding of hashtags to derive preliminary topics. These function as supervising information, or seed topics and are passed to Archetypal LDA (A-LDA), a recent variant of Latent Dirichlet Allocation. We demonstrate the effectiveness of our approach using a large corpus of posts exemplarily on Twitter. Our approaches improve the topic model's qualities in terms of various quantitative metrics. Moreover, the presented algorithms used to extract seed topics can be utilized as form of lightweight topic model by themselves. Hence, our approaches create additional analytical opportunities and can help to gain a more detailed understanding of what people are talking about on social media. By using big data in terms of millions of tweets for preprocessing and fine-tuning, we enable the classification algorithm to produce topics that are very coherent to the reader.

Weitere Angaben

Publikationsform: Artikel in einer Zeitschrift
Begutachteter Beitrag: Ja
Keywords: Topic modeling; Short text; Word embedding; Transfer learning; Big data
Institutionen der Universität: Fakultäten > Sprach- und Literaturwissenschaftliche Fakultät
Fakultäten > Sprach- und Literaturwissenschaftliche Fakultät > Juniorprofessur Datenmodellierung und interdisziplinäre Wissensgenerierung
Fakultäten > Sprach- und Literaturwissenschaftliche Fakultät > Juniorprofessur Datenmodellierung und interdisziplinäre Wissensgenerierung > Juniorprofessur Datenmodellierung und interdisziplinäre Wissensgenerierung - Juniorprof. Dr. Mirco Schönfeld
Fakultäten
Titel an der UBT entstanden: Ja
Themengebiete aus DDC: 000 Informatik,Informationswissenschaft, allgemeine Werke > 004 Informatik
Eingestellt am: 18 Nov 2021 09:38
Letzte Änderung: 30 Mai 2023 11:01
URI: https://eref.uni-bayreuth.de/id/eprint/67879