Page 175 - Contributed Paper Session (CPS) - Volume 4
P. 175

CPS2166 Divo Dharma Silalahi et al.
            in MCUVE threshold, then the new proposed cut-off criterion can be defined
            as
                                                                +
                                 cut - off   value =  median  ( ( c j ) artif ) k (MAD (c j ) artif  )         (6)
            where  k can be calculated as
                                              z +   z −  a b
                                                     2
                                          k =  γ     γ                                                (7)
                                                    a
            with constant parameters          z  2   and    2  −  z 2 α    r as  number of

                                                                     ;

                                        =1a  −  α        = zb  γ
                                             ( 2 r  − )1       r  
            MC random repetition,  α as a level of error, and  γ as desired proportion. The
            wavelengths with reliability  c less than the cut-off threshold criterion in (7)
                                         j
            are moved in the deleted set as  D and while the rest wavelengths which are
            the relevant wavelengths are placed in the remaining set as R . Updating the
            total OPLS-VIP score in (3) only using the remaining set  R then the new scaled

            input variable in (4) for PLSR model just follows.

            3.  Result
            3.1 Simulation Data
                 The training set uses 150 samples data and the testing set uses 50 samples
            data that both were generated randomly using uniform distribution with 0.03
            of noise was also applied. The number of input variables and output variable
            is 40 and 1, respectively. The formulation of this illustrative simulation can be
            defined as follows
                               c ~  runif  (n ,1,10)  ( = 1, 2,  , 3  , 40)
                                                     j
                                j
                               e ~  rnorm  (n )      ( =  1, , 0  2, ,  40)
                                                      j
                                j                                                            (8)
                               x =  c +  e  j
                                 j
                                     j
                               y =  c + 3c +  . 0 85c + 2c + 1. 75c +  9 . 0 c + e 0
                                         5
                                    1
                                                            22
                                                                  35
                                                    15
                                                7
            here, c and e are independent each other and are not measured variables

                           j

                    j
            while  x  and y  are illustrated as observable variables. As seen in (8), there
                    j
                                        , x
            were  6  input  variables  ( x    5 , x    7 , x    15 , x    22 , x    35  )  related  to  the  response
                                        1
            variable,  while  the  remaining  34  input  variables  were  not  used  in  the
            formulation  and  were  assumed  as  irrelevant  variables.  The  different
            coefficients value in the formulation (8) shows the contribution level of each
            relevant variable to the response variable. It should be considered if these
            relevant  variables  were  manually  selected  in  the  formulation,  in  fact  the
            importance of input variables is generally unknown. All these input variables
            are  represented  as  n x m  matrix  X  and  used  in  the  calculation  for  model
            construction. In the PLSR model, the number of latent variables (also called as
            components) is a principal indicator in the modeling since it may always be
                                                               164 | I S I   W S C   2 0 1 9
   170   171   172   173   174   175   176   177   178   179   180