SC16 Salt Lake City, UT

25. Big Data Helps Particle Physicists to Concentrate on Science

Authors: Saba Sehrish (Fermi National Laboratory)Jim Kowalkowski (Fermi National Laboratory)Oliver Gutsche (Fermi National Laboratory)Matteo Cremonesi (Fermi National Laboratory)Bo Jayatilaka (Fermi National Laboratory)Cristina Mantilla (Fermi National Laboratory)Jim Pivarski (Princeton University)Alexy Svyatkovskiy (Princeton University)

Abstract: In this poster, we evaluate Apache Spark for High Energy Physics (HEP) analyses using an example from the CMS experiment at the Large Hadron Collider (LHC) in Geneva, Switzerland. HEP deals with the understanding of fundamental particles and the interactions between them and is a very compute- and data-intensive statistical science. Our goal is to understand how well this technology performs for HEP-like analyses. Our use case focuses on searching for new types of elementary particles explaining Dark Matter in the universe. We provide different implementations of this analysis workflow; one using Spark on the Hadoop ecosystem, and the other using Spark on high performance computing platforms. The analysis workflow uses official experiment data formats as input and produces publication level physics plots. We compare the performance and productivity of the current analysis with the two above-mentioned approaches and discuss their respective advantages and disadvantages.

Poster: pdf
Two-page extended abstract: pdf

Poster Index