Mathematik  |  Informatik

 

Isabelle Reek, 2006 | Aarau, AG

 

The Sloan Digital Sky survey collects multiband photometric data of celestial objects that have to be classified into stars, galaxies, and quasars. The data is collected in the Stellar Classification Dataset SDSS17 with class labels, five photometric filters, and the spectroscopic redshift. Spectroscopic redshift calculations are very reliable but time-intensive; using the photometric data to predict redshift would be far more efficient. Thus, I first predicted the photometric redshifts for Galaxies and Quasars individually.
Secondly, I compared the performance of different state-of-the-art machine learning algorithms at classifying the objects. The highest accuracy of 98% was reached with a Random Forest. An important predictor in the classification is redshift. Therefore, I predicted the photometric redshifts of the unclassified dataset and used these to classify the objects. Here, I was able to classify 87% of the dataset with an accuracy of 96%.
This project gives a thorough overview of how different machine learning algorithms can classify three frequently observed object types in our night skies. Furthermore, it suggests a way to reduce the time requirement of calculating spectroscopic redshifts for the purpose of classification by making use of photometric redshift predictions.

Introduction

The dataset I used looks at three types of objects: stars, galaxies, and quasars. These must initially be accurately described and classified. Firstly, I wanted to see whether I could predict the redshift (the speed at which the object moves away from us) of these objects photometrically. Secondly, I compared how well different machine learning algorithms could classify the three classes. Thirdly, I wanted to see whether the more time-efficient photometric redshifts could be used for classification instead of the usual spectroscopic ones.

Methods

I wrote all models in Python using scikit-learn for the Machine Learning libraries. Firstly, I predicted photometric redshifts using the photometric filters u,g,r,i,z (filters at different wavelengths) for galaxies and quasars individually (stars have too little variance) using a Random Forest.
Secondly, I compared different Machine Learning Algorithms for the classification of stars, galaxies, and quasars using the photometric features and a spectroscopic redshift. In addition to the different algorithms, I also compared the effects of different preprocessing methods.
Lastly, I predicted the photometric redshift for unclassified objects and then used those values as a redshift basis for classifying the objects.

Results

The photometric redshift prediction for galaxies and quasars individually had a mean absolute error MAE = 0.0576 and MAE = 0.458, and a mean absolute scaled error MASE = 19.6% and MASE = 48.6%, respectively.
Random Forests performed best at classifying stars, galaxies and quasars with an accuracy of 98%. For all objects the redshift was the most relevant feature.
Finally, the objects where the photometric redshift prediction had an uncertainty below 0.7 (87% of the objects) could be classified (using the photometric redshift) with an accuracy of 96%.

Discussion

Using only photometric data from the Sloan Digital Sky Server, I was able to reach a high classification accuracy based on the labels created by the original ML classification on a wider range of features. The redshift value emerged as the most important predictor of object class. This can be explained physically as it describes the speed of an object’s movement away from us, which is correlated to its distance. This distance is a strong differentiator between observable stars and objects outside the Milky Way (galaxies and quasars). The redshift can be accurately determined using the spectra; however, this is a time-consuming process. Thus, classifying the majority of the objects using the photometric redshift presents a very time-efficient alternative for classification. The rest (around 13%) could then be further analysed, for example using a spectroscopic analysis.

Conclusions

I was able to gain a thorough overview of different machine learning algorithms, how they compare with each other and their potential uses in astronomy. My findings that many objects can be classified accurately using photometric redshifts suggests options for accelerating the classification of astronomical objects. In the future, I would be interested to include data from different telescopes, and see which features allow for the best classification while maintaining efficiency.

 

 

Würdigung durch den Experten

Dr. Richard Stotz

Die Arbeit analysiert zwei interessante Einsatzfelder von Machine Learning in der Astrophysik. Zum einen werden die Stärken und Schwächen unterschiedlicher Machine-Learning-Techniken bei der Klassifikation von Himmelsobjekten herausgearbeitet. Diese Ergebnisse werden auch im Kontext astrophysikalischer Grundlagen diskutiert. Weiterhin wird erfolgreich demonstriert, dass die aufwändigere spektroskopische Messung der Rotverschiebung oft durch photometrische Daten ersetzt werden kann.

Prädikat:

sehr gut

Sonderpreis der Schweizerischen Physikalischen Gesellschaft (SPG)

 

 

 

Alte Kantonsschule Aarau
Lehrerin: Nina Fernandez