Page 200 - Contributed Paper Session (CPS) - Volume 4
P. 200

CPS2174 Septian R. et al.

                               Logistic model averaging for predicting type of
                                tumor in high dimensional data: A randomize
                                                   approach
                                     Septian Rahardiantoro, Anang Kurnia
                                    Department of Statistics, IPB University, Indonesia

                  Abstract
                  The main idea of model averaging is to combine some predictions of model
                  candidates  to  be  the  final  prediction  using  specified  weight.  It  is  very
                  commonly used in high dimensional data that number of predictors more than
                  number of observations. In application, the model averaging concept also can
                  be used in the prediction of class of response variable. This research applied
                  the model averaging concept using logistic regression model for predicting
                  the class of patients having different types of tumor: KIRC and LUAD. The data
                  set is a part of the RNA-seq, that contain the collection a random extraction
                  gene  expression  with  dimension  20532  gene  belong  to  287  patients.  The
                  model candidate of logistic regression constructed by selecting randomly the
                  gene with size: 50, 100, and 150; to predict the class of patients. Based on the
                  evaluation criteria, lower value of size gene in the logistic model could reach
                  higher accuration, sensitivity, and spesificity of prediction.

                  Keywords
                  classification;  high  dimensional  data;  logistic  model  averaging;  model
                  candidate; predictive modelling

                  1.  Introduction
                      High dimensional data happens when the number of features (p) in data
                  exceeds the number of observations (n). In recent century, high dimensional
                  data is very commonly found in many part of life. It can be found in social
                  media data set, genomic data set, econometric data set, and also in satellite
                  data set. Because of the size is very big, the main challenge when deal with
                  this data is in prediction of responsce context. There are some methods that
                  often  to  used  for  handling  this  case,  such  as  best  subset  selection,  lasso
                  regression, and model averaging.
                      This research attempts to apply the model averaging approach to handle
                  the prediction case in high dimensional data set. The main principle of model
                  averaging is to construct some model candidate that would be everaged to
                  be the final model [1]. The model that commonly used is linear regression that
                  also based on the scale of response variable. There is an application of model
                  averaging  using  linear  regression  to  predict  the  response  variable  in  the
                  genomic data set [2]. Furthermore, the development of this method is well


                                                                     189 | I S I   W S C   2 0 1 9
   195   196   197   198   199   200   201   202   203   204   205