Page 200 - Contributed Paper Session (CPS) - Volume 4
P. 200
CPS2174 Septian R. et al.
Logistic model averaging for predicting type of
tumor in high dimensional data: A randomize
approach
Septian Rahardiantoro, Anang Kurnia
Department of Statistics, IPB University, Indonesia
Abstract
The main idea of model averaging is to combine some predictions of model
candidates to be the final prediction using specified weight. It is very
commonly used in high dimensional data that number of predictors more than
number of observations. In application, the model averaging concept also can
be used in the prediction of class of response variable. This research applied
the model averaging concept using logistic regression model for predicting
the class of patients having different types of tumor: KIRC and LUAD. The data
set is a part of the RNA-seq, that contain the collection a random extraction
gene expression with dimension 20532 gene belong to 287 patients. The
model candidate of logistic regression constructed by selecting randomly the
gene with size: 50, 100, and 150; to predict the class of patients. Based on the
evaluation criteria, lower value of size gene in the logistic model could reach
higher accuration, sensitivity, and spesificity of prediction.
Keywords
classification; high dimensional data; logistic model averaging; model
candidate; predictive modelling
1. Introduction
High dimensional data happens when the number of features (p) in data
exceeds the number of observations (n). In recent century, high dimensional
data is very commonly found in many part of life. It can be found in social
media data set, genomic data set, econometric data set, and also in satellite
data set. Because of the size is very big, the main challenge when deal with
this data is in prediction of responsce context. There are some methods that
often to used for handling this case, such as best subset selection, lasso
regression, and model averaging.
This research attempts to apply the model averaging approach to handle
the prediction case in high dimensional data set. The main principle of model
averaging is to construct some model candidate that would be everaged to
be the final model [1]. The model that commonly used is linear regression that
also based on the scale of response variable. There is an application of model
averaging using linear regression to predict the response variable in the
genomic data set [2]. Furthermore, the development of this method is well
189 | I S I W S C 2 0 1 9