Page 201 - Contributed Paper Session (CPS) - Volume 4
P. 201
CPS2174 Septian R. et al.
developed to use the logistic regression in the model averaging process when
the categorical scale in response variable [3].
This research foccus on constructing the model candidate of logistic
regression in case of prediction class of response variable. The model
candidate is cunstructed by selecting the predictor variables randomly to get
the prediction of response variable class. This process applied several times to
get some prediction and then the prediction would be averaged using the
specified weight. In this case, the probability form is used in the prediction of
response variable to be avaraged.
The data that used in this research is RNA-seq data set that part of a
random extraction gene expression of patients having different types of
tumor: KIRC (Kidney Renal Clear-Cell Carcinoma) and LUAD (Lung
Adenocarcinoma). This data set contains 20532 gene based on 287 patients
[4]. The number of gene selected in model candidate is 50 genes, 100 genes,
and 150 genes with number of model candidate contain 50 models. In
practices, there are selected about 40% part of patients to be the testing data
to evaluate the accuration, sensitivity, and spesificity of prediction.
2. Methodology
In this section would be described the data set that used in this research,
the model averaging concept in logistic regression approach, and also the
evaluation of the prediction.
2.1 Data
The data set that used in this research is a part of The Cancer Genome
Atlas (TCGA) Research to profile and analyze large numbers of human tumors
to discover molecular aberrations [4]. This research took the subset of this data
on patients having KIRC and LUAD based on their RNA-seq. Therefore, the
response variable of this research is class of patients based on their suffered.
The number of patients in this data is n = 287 patients with p = 20532 genes
to be the predictor variables, which is include in the high dimensional data
(p≫n). For data analysis, the class KIRC simbolized by 1 and LUAD is 0.
2.2 Model Averaging
Let × is high dimensional data with number of observations and
number of predictors ( ≫ ), and ∗ × is the subset of with number of
predictors ( < ). Let ×1 is the response variable in the case. Assume the
regression model of the subset predictor data is = ( ) + . The model
∗
averaging concept is creating some model candidates or the subset predictor
model to combine to be the representative form of final model. The number
of model candidates is which contains predictors in each model.
190 | I S I W S C 2 0 1 9