Page 187 - Contributed Paper Session (CPS) - Volume 2
P. 187

CPS1820 Shuichi S.
            1.  Introduction
                Golub et al. (1999) published their paper on Science and opened their
            microarray on the Internet. Until 2004, Alon et al. (1999), Shipp et al. (2002),
            Sing et al. (2002), Tian et al. (2003) and Chiaretti et al. (2004) published their
            papers  and  opened  their  microarrays,  also.  Below  is  an  abstract  of  Golub:
            “Although cancer classification has improved over the past 30 years, there has
            been no general approach for identifying new cancer classes (class discovery)
            or for assigning tumors to known classes (class prediction). Here, a generic
            approach to cancer classification based on gene expression monitoring by
            DNA microarrays is described and applied to human acute leukemias as a test
            case.  A  class  discovery  procedure  automatically  discovered  the  distinction
            between acute myeloid leukemia  (AML)  and acute lymphoblastic leukemia
            (ALL) without previous knowledge of these classes. An automatically derived
            class predictor was able to determine the class of new leukemia cases. The
            results  demonstrate  the  feasibility  of  cancer  classification  based  solely  on
            gene expression monitoring and suggest a general strategy for discovering
            and  predicting  cancer  classes  for  other  types  of  cancer,  independent  of
            previous biological knowledge.” We can understand their research theme as
            follows.
            1)  Their research theme is to specify oncogenes from microarrays and to
                forecast the new sub-class of cancer.
            2)  Alon’s  and  Singh's  microarrays  consist  of  two  classes  such  as  normal
                patients  and  cancer  patients.  Other  four  microarrays  consist  of  two
                different types of cancer patients. Therefore, two-classes discrimination is
                the most proper method for this thema.
                Efforts of medical research are also vain, the NIH has stopped expenditure
            on research expenses, and these types of research have ended. That is, they
            could not solve the cancer gene analysis using microarray which was studied
            from around 1970. This fact indicates that the statistical discriminant function
            was entirely useless and RIP is the best LDF for high-dimensional microarray
            data analysis (Problem5).
                On  the  other  hand,  statistical  and  machine-learning  researchers
            approached this research teme as big data analysis or high-dimensional data
            analysis  because  medical  project  offered  high-quality  microarrays  after
            ending their projects. Many papers were published and pointed out three
            difficulties  or  excuses  of  Problem5  as  follows:  1)  The  difficulties  of  high-
            dimensional (small n and large p) data 2) Statistical feature selection is NP-
            hard 3) The signal is buried in noise.
                We downloaded these six microarrays on October 28, 2015 [4], and solved
            Problem5 entirely on December 20, 2015. We had already developed a new
            theory of discriminant analysis [9] and solved four problems of discriminant
            analysis. Thus, Problem5 was solved as an applied problem of the theory. RIP

                                                               176 | I S I   W S C   2 0 1 9
   182   183   184   185   186   187   188   189   190   191   192