Page 317 - Invited Paper Session (IPS) - Volume 2
P. 317

IPS254 Thaddeus Tarpey et al.
            we “learn” particular linear trans-formations of the data as a pre-conditioning
            before we run the clustering algorithm. This approach will lead to an iterative
            algorithm. The underlying idea is that if two diagnosis categories exist for a
            particular mental disorder, then typically there will be a strong overlap in their
            feature space. In order to illustrate this point, we introduce another index for
            cluster  quality:  the  variation  of  information  (VI)  (Meilá,  2007),  which  is  a
            measure of how well two clusterings of a data set coincide with each other.
            This  measure  is  particularly  useful  in  simulations  where  we  know  the  true
            cluster memberships of data points and we can then use VI to determine how
            well a clustering result coincides with the another clustering. The idea here is
            described by the following algorithm:
                0.  Form an initial clustering of the features using k-means clustering.
                1.  Compute the VI measuring agreement between the k-means clustering
                    and the clinician-based diagnoses.
                2.  Use Newton’s method to estimate an optimal direction to “stretch” (via
                    a linear transformation) the features.
                3.  Re-run the clustering algorithm on this pre-conditioned data.

            3.  Results
            The results will follow soon.

            4.  Discussion and Conclusion
                We have proposed an unsupervised learning approach to the problem of
            psychiatric nosol-ogy that implements a semi-supervised clustering algorithm.
            The  supervision  comes  from  clinician-informed  diagnosis  decisions  which
            leads  to  linear  transformations  that  are  then  used  to  optimize  clustering
            criteria.

            References
            1.  American Psychiatric Association. (2013). Diagnostic and Statistical
                 Manual of Mental Disorders. American Psychiatric Publishing, Arlington,
                 VA, fifth edition.
            2.  Bruni, C. and Koch, G. (1985). Identifiability of continuous mixtures of
                 unknown gaussian distributions. Annals of Probability 13:1341–1357.
            3.  Clementz, B. A., Sweeney, J. A., Hamm, J. P., Ivleva, E. I., Ethridge, L. E.,
                 Pearlson, G. D., Keshavan, M. S., and Tamminga, C. A. (2016).
                 Identification of distinct psychosis biotypes using brain-based
                 biomarkers. American Journal of Psychiatry 173:373–383.
            4.  Diaconis, P. and Freedman, D. (1984). Asymptotics of graphical
                 projection pursuit. Annals of Statistics 12:793–815.
            5.  Grzadzinski, R., Martino, A. D., Brady, A., Mairena, M. A., O’Neale, M.,
                 Petkova, E., Lord, C., and Castellanos, F. X. (2011). Examining autistic

                                                               304 | I S I   W S C   2 0 1 9
   312   313   314   315   316   317   318   319   320   321   322