105. A Scalable Approach for Topic Modeling with R
Authors: Tiffany A. Connors (Texas State University)Ritu Arora (University of Texas at Austin)
Abstract: Topic Modeling (TM) helps in automatically classifying documents under different topics and is especially useful for exploring a large corpus of documents to discover new relationships. The R programming language has a TM library that is easy to install and use. However, due to its interpreted nature, the performance of TM code written in R is poor as compared to the same code rewritten in C/C++/Fortran. Despite its poor performance, R is a high-productivity language that is commonly used by non-traditional HPC users to do TM and other similar data analyses. Many such users do not have access to expertise for rewriting their R code in C/C++/Fortran but have large datasets to analyze. With TM as an example, we demonstrate that such end-users can reduce the time-to-results (sometimes, up to a factor of 23) by running their R code in High-Throughput Computing (HTC) mode on HPC resources.
Two-page extended abstract: pdf