Presentation

· Presenter IndexPresenters · Organization IndexOrganizations · Search Program · Flagged · Happening Now · QRCode Reader

ACM Student Research Competition

Poster

: SRC18. Analysis of Variable Selection Methods on Scientific Cluster Measurement Data

ask a question

give feedback

Author

Jonathan Wang

Event Type

ACM Student Research Competition

Poster

TimeTuesday, November 15th10am - 5:15pm

LocationExhibit Hall E, Booth #104

DescriptionThe goal of the project was to use parallelized variable selection methods to improve the performance of machine learning models on the PTF astrophysics dataset by reducing model training time and removing disruptive variables. Several methods were implemented in Spark to utilize high performance computing and tested on the PTF data. The results from the PTF data tests showed that Sequential Backward Selection was able to approximate the optimal subset relatively quickly. This subset took significantly less time to train on and had higher accuracy than the full feature set. We also experimented with correlation-based grouping to take advantage of feature correlations in the PTF data. This method allows large correlated datasets to be handled more efficiently. We were able to further improve the performance of Sequential Backward Selection on this dataset without significant loss in accuracy.

Navigation