Page 187 - Contributed Paper Session (CPS) - Volume 2
P. 187
CPS1820 Shuichi S.
1. Introduction
Golub et al. (1999) published their paper on Science and opened their
microarray on the Internet. Until 2004, Alon et al. (1999), Shipp et al. (2002),
Sing et al. (2002), Tian et al. (2003) and Chiaretti et al. (2004) published their
papers and opened their microarrays, also. Below is an abstract of Golub:
“Although cancer classification has improved over the past 30 years, there has
been no general approach for identifying new cancer classes (class discovery)
or for assigning tumors to known classes (class prediction). Here, a generic
approach to cancer classification based on gene expression monitoring by
DNA microarrays is described and applied to human acute leukemias as a test
case. A class discovery procedure automatically discovered the distinction
between acute myeloid leukemia (AML) and acute lymphoblastic leukemia
(ALL) without previous knowledge of these classes. An automatically derived
class predictor was able to determine the class of new leukemia cases. The
results demonstrate the feasibility of cancer classification based solely on
gene expression monitoring and suggest a general strategy for discovering
and predicting cancer classes for other types of cancer, independent of
previous biological knowledge.” We can understand their research theme as
follows.
1) Their research theme is to specify oncogenes from microarrays and to
forecast the new sub-class of cancer.
2) Alon’s and Singh's microarrays consist of two classes such as normal
patients and cancer patients. Other four microarrays consist of two
different types of cancer patients. Therefore, two-classes discrimination is
the most proper method for this thema.
Efforts of medical research are also vain, the NIH has stopped expenditure
on research expenses, and these types of research have ended. That is, they
could not solve the cancer gene analysis using microarray which was studied
from around 1970. This fact indicates that the statistical discriminant function
was entirely useless and RIP is the best LDF for high-dimensional microarray
data analysis (Problem5).
On the other hand, statistical and machine-learning researchers
approached this research teme as big data analysis or high-dimensional data
analysis because medical project offered high-quality microarrays after
ending their projects. Many papers were published and pointed out three
difficulties or excuses of Problem5 as follows: 1) The difficulties of high-
dimensional (small n and large p) data 2) Statistical feature selection is NP-
hard 3) The signal is buried in noise.
We downloaded these six microarrays on October 28, 2015 [4], and solved
Problem5 entirely on December 20, 2015. We had already developed a new
theory of discriminant analysis [9] and solved four problems of discriminant
analysis. Thus, Problem5 was solved as an applied problem of the theory. RIP
176 | I S I W S C 2 0 1 9