Literatur vom gleichen Autor/der gleichen Autor*in
plus bei Google Scholar

Bibliografische Daten exportieren
 

Topic Modeling of Short Texts Using Anchor Words

Titelangaben

Steuber, Florian ; Schönfeld, Mirco ; Dreo Rodosek, Gabi:
Topic Modeling of Short Texts Using Anchor Words.
In: Proceedings of 10th International Conference on Web Intelligence, Mining and Semantics (WIMS 2020). - New York : Association for Computing Machinery , 2020 . - S. 210-219
ISBN 978-1-4503-7542-9
DOI: https://doi.org/10.1145/3405962.3405968

Abstract

We present Archetypal LDA or short A-LDA, a topic model tailored to short texts containing semantic anchors which convey a certain meaning or implicitly build on discussions beyond their mere presence. A-LDA is an extension to Latent Dirichlet Allocation in that we guide the process of topic inference by these semantic anchors as seed words to the LDA. We identify these seed words unsupervised from the documents and evaluate their co-occurrences using archetypal analysis, a geometric approximation problem that aims for finding k points that best approximate the data set's convex hull. These so called archetypes are considered as latent topics and used to guide the LDA. We demonstrate the effectiveness of our approach using Twitter, where semantic anchor words are the hashtags assigned to tweets by users. In direct comparison to LDA, A-LDA achieves 10-13% better results. We find that representing topics in terms of hashtags corresponding to calculated archetypes alone already results in interpretable topics and the model's performance peaks for seed confidence values ranging from 0.7 to 0.9.

Weitere Angaben

Publikationsform: Aufsatz in einem Buch
Begutachteter Beitrag: Ja
Institutionen der Universität: Fakultäten > Sprach- und Literaturwissenschaftliche Fakultät
Fakultäten > Sprach- und Literaturwissenschaftliche Fakultät > Juniorprofessur Datenmodellierung und interdisziplinäre Wissensgenerierung > Juniorprofessur Datenmodellierung und interdisziplinäre Wissensgenerierung - Juniorprof. Dr. Mirco Schönfeld
Fakultäten
Fakultäten > Sprach- und Literaturwissenschaftliche Fakultät > Juniorprofessur Datenmodellierung und interdisziplinäre Wissensgenerierung
Titel an der UBT entstanden: Nein
Themengebiete aus DDC: 000 Informatik,Informationswissenschaft, allgemeine Werke > 004 Informatik
Eingestellt am: 26 Aug 2020 09:44
Letzte Änderung: 26 Aug 2020 09:44
URI: https://eref.uni-bayreuth.de/id/eprint/56607