Page 248 - Special Topic Session (STS) - Volume 1
P. 248

STS426 Didier Fraix-Burnet



                              Unsupervised classification of galaxy spectra and
                                                interpretability
                                              Didier Fraix-Burnet
                                  Univ. Grenoble Alpes / CNRS / IPAG, Grenoble, France

                  Abstract
                  Dealing with large amount of data is a new problematic task in astrophysics.
                  One may distinguish the management of these data (astroinformatics) and
                  their  scientific  use  (astrostatistics)  even  if  the  border  is  rather  fuzzy.
                  Dimensionality reduction in both the number of observations and the number
                  of variables (observables) is necessary for an easier physical understanding.
                  This is the purpose of classification which has been traditionally eye-based and
                  essentially still is, but this becomes not possible anymore. In this talk, I present
                  an  unsupervised  classification  of  700  000  spectra  of  galaxies  of  1500
                  wavelengths each, with a model-based subspace clustering algorithm (Fisher-
                  EM). I also show some preliminary results on the interpretation of the classes
                  using data bases of modelled spectra.

                  Keywords
                  Unsupervised classification; machine learning; spectra; astrophysics; galaxies

                  1.  Introduction
                      Astrophysics has now entered the era of Big Data and the new telescopes
                  and instruments that will come into operation in the next few years (EUCLID,
                  VLT/MOONS, LSST, SKA...) face technological challenges for the management
                  and the analysis of the data. Spectra are particularly spectacular since they
                  contain several thousands of wavelengths making matrices of about a million
                  observations described by thousands of parameters.
                      These spectra contain all the astrophysical information that an astronomer
                  can dream of, apart from the morphological structure: the composition of the
                  stellar populations, the history of the stellar formation events, the content in
                  gas and its physical conditions, the presence of hot regions such star forming
                  regions, hot nebulae, active galactic nuclei hosting black holes, and the global
                  kinematics  of  the  galaxy.  Basically  the  spectrum  of  a  galaxy  is  made  of  a
                  continuum due to the thermal emission from the stars, plus some absorption
                  features due to the cold gas, and emission lines due to hot gas. An atlas of
                  typical galaxy spectra is provided in Kennicutt (1992) or Dobos et al. (2012).
                      Classification  in  astrophysics  traditionally  uses  an  eye-based  approach
                  which also serves as  the bases for  supervised learning studies. In contrast,
                  unsupervised learning is not common (Fraix-Burnet et al. 2015). In the case of


                                                                     237 | I S I   W S C   2 0 1 9
   243   244   245   246   247   248   249   250   251   252   253