Page 186 - Contributed Paper Session (CPS) - Volume 2
P. 186

CPS1820 Shuichi S.

                                 High-dimensional microarray data analysis
                              - First success of cancer gene analysis and cancer
                                               gene diagnosis -
                                               Shuichi Shinmura
                                         Emeritus Professor, Seikei University

                  Abstract
                  From October 28 to December 20, 2015, we discriminated six microarrays by
                  Revised IP-OLDF (RIP) based on the minimum number of misclassifications
                  (Minimum  NM,  MNM)  criterion.  All  data  are  linear  separable  data  (LSD,
                  MNM=0). We call linearly separable space and subspaces as Matryoshka. LSD
                  has  the  Matryoshka  structure  that  includes  smaller  Matryoshka  in  it.  We
                  developed  a  Matryoshka  feature  selection  method  (Method2)  that  could
                  decompose each microarray into many small Matryoshkas (SMs) and noise
                  subspace (MNM>=1). Because all SMs are small samples, statistical methods
                  can analyze all SMs. However, we cannot find the linearly separable facts. Thus,
                  we make new signal data by RIP discriminant scores (RipDSs) instead of genes.
                  We think RipDSs are malignancy indexes for cancer gene diagnosis. In this
                  paper,  we  explain  the  cancer  gene  diagnosis  using  Alon’s  microarray  that
                  consists of 62 patients by 2,000 genes. Method2 decomposes Alon microarray
                  into 64 SMs. Thus, we make the new data that consists of 62 patients and 64
                  RipDSs instead of 2,000 genes. We make 64 RipDSs signal data from 2,000
                  high-dimensional gene data. If we analyze this signal data by Ward cluster,
                  two classes become two clusters. Moreover, we can make cancer class into
                  over two clusters. Next, if Principal Component Analysis (PCA) analyses the
                  signal data, we can examine several clusters those explain the new sub-class
                  of cancer pointed by Golub et al. Many researchers could not solve the high-
                  dimensional microarray because of several reasons. 1) They could not find six
                  microarrays are LSD (Fact3). However, RIP and a hard-margin SVM (H-SVM)
                  find Fact3 easily.2) Method2 by RIP decomposes each microarray into many
                  SMs and other noise subspace (Fact4). However, Method2 by H-SVM cannot
                  find SMs. Although all SMs are small samples, statistical methods cannot show
                  the linear separable facts. Thus, we make a signal data using RipDSs instead
                  of genes. Cluster analysis shows two classes become two clear clusters and
                  PCA shows two classes are separate on the first principal component (Prin1)
                  that  becomes  another  malignancy  index  in  addition  to  many  RipDSs.  Our
                  approach is beneficial for cancer gene diagnosis by microarray.

                  Keywords
                  Linear Separable Data (LSD); Small Matryoshka (SM); Revised IP-OLDF (RIP);
                  Malignancy Index; Matryoshka Feature Selection Method (Method2)


                                                                     175 | I S I   W S C   2 0 1 9
   181   182   183   184   185   186   187   188   189   190   191