Page 188 - Contributed Paper Session (CPS) - Volume 2
P. 188
CPS1820 Shuichi S.
discriminated six microarrays and found those are LSD (Fact3). This new fact
is crucial for Problem5, and no papers pointed this vital signal. Furthermore,
Method2 decomposed microarray into many linearly separable gene spaces
(SMs) and noise gene subspace. In this paper, we introduce the cancer gene
diagnosis to analyze all SMs by standard statistical methods.
2. Methodology
At first, we introduce four problems [7][8] found through our many kinds
of research of discriminant analysis. Next, we explain the theory and Method2.
2.1 Four Problems of Discriminant Analysis
Problem1: The discrimination rule is straightforward. However, most
researchers believe in the wrong rule. The values of yi are 1 for class1 and -1
for class2. Let f(x) be LDF and yi*f(xi) be an extended discriminant score (DSs)
for xi. The following rule is correct.
1) If yi *f(xi) > 0, xi is classified to class1 /class2 correctly.
2) If yi *f(xi) < 0, xi is misclassified to class1 /class2 correctly.
3) We cannot properly discriminate xi on the discriminant hyperplane (f(xi) =
0).
Problem2: Only H-SVM [18] and RIP can recognize LSD theoretically. Other
LDFs cannot discriminate LSD correctly, and those error rates are high.
Problem3: Problem3 is the defect of the generalized inverse matrix
technique.
Problem4: Fisher never formulated the equation of SE of discriminant
coefficients and error rates. We propose the 100-fold cross validation for a
small sample (Method1). It offers the 95% CIs of error rates and discriminant
coefficients.
2.2 New Theory of Discriminant Analysis
We developed IP-OLDF and RIP by LINGO [6]. Only RIP and H-SVM are
essential in this research. Integer programming (IP) defines RIP in (1). The ei
are 0/1 decision variables. If a case is classified correctly, ei =0. Otherwise, ei
=1. Thus, object function becomes MNM. N constraints define the feasible
region.
MIN = Σei ;
yi* ( xib + b0) >= 1 - M* ei ; (1)
t
xi: n cases. b: p-coefficients. b0: free decision variables.
ei: 0/1 decision variables. M: 10,000 (Big M constant).
Quadratic programming (QP) defines H-SVM in (2). Although QP finds only
one minimum value on the whole domain, we restrict the domain to the same
177 | I S I W S C 2 0 1 9