Page 192 - Contributed Paper Session (CPS) - Volume 2
P. 192

CPS1820 Shuichi S.
                      Table 2 shows that the 64 discriminations by RIP are very easy. However,
                  statistical methods are difficult to obtain the linear separable fact (Problem6).
                  This fact implies the difficulties of cancer gene analysis until now and answer
                  why researchers could not succeed cancer gene analysis from 1970 because
                  these methods are useless for microarrays and those SMs. We must choose
                  proper methods for cancer gene diagnosis as same as cancer gene analysis.

                  b.   Summary of Six Microarrays
                     Table  3  is  the  summary  of  six  microarrays.  RatioS  are  the  ratio  of  (the
                  number of genes included in all SMs / total genes). ">=5%" is the number of
                  SMs, RatioSVs of those are over than 5%. The RatioSV over 5% are 98.4%,
                  18.8%, 99.2%, 21.2%, 16.9% and 100%, respectively. Alon, Shipp, and Chiaretti
                  microarrays are 98.4% over. RatioSV of PCA is greater than the maximum of
                  RipDSs. Last three columns are the number of linearly separable SMs of QDF,
                  LDF1, and LDF2. These numbers indicate two classes are well separable in all
                  SMs as same as the trend of ">=5%." The last row shows that LINGO Program4
                  finds 130 BGSs of Alon. Because all RatioSVs are less than 1%, we judge BGSs
                  are not used for cancer gene diagnosis.

                                      Table 3 Summary of six Microarrays

                                                                  RatioSV
                            SM:Gene   RatioS   RatioSV      >=5%          QDF=0   LDF1    LDF2
                                                                                     1
                                                                                           2
                                                                  of PCA
                   Alon     64:1999    99.950   [2.4,26.8]    63(98.4%)   30.40%    64   13   12
                   Golb     69:1,238   17.366   [0.004,15.69]   13(18.8%)   34.88%    16   1   1

                   Shipp    130:3,827   53.682   [4.99,30.67]   129(99.2%)   31.69%    121   53   46

                   Singh    179:1238   9.805    [0.28,11.67]    38(21.2%)   14.35%    26   0   0

                   Tien     159:7222   57.204   [0.63,19.13]    27(16.9%)   24%    158   1   0

                   Chiaretti   95:5,162   72.422   [10.73,38.93]    95(100%)   51.46%    95   94   92

                            BGS:Gene   RatioS   RatioSV     >=5%   of PCA   QDF=0  LDF1  LDF2
                   Alon
                            130   :  99.750    [0.001,         0   4.50%     58     0    3
                            1995            0.901]
                  1: Prior probabilities are 0.5 vs. 0.5.    2: Prior probabilities are proportional to the case number.

                  c.   Ward Cluster Analysis and PCA of Signal Data made by RipDSs
                     The medical research group used a  self-organizing map (SOP),  but we
                  need to define an appropriate number of clusters. On the other hand, by using
                  signal data, Ward cluster can easily separate two classes into two clusters.


                                                                     181 | I S I   W S C   2 0 1 9
   187   188   189   190   191   192   193   194   195   196   197