Page 176 - Contributed Paper Session (CPS) - Volume 4
P. 176
CPS2166 Divo Dharma Silalahi et al.
subjective. In the study, to select the optimum number of latent variables in
the PLSR model, the result of a re-sampling procedure called cross-validation
with lowest standard error from overall best model is used.
As the number of latent variables used in the PLS model increases, the
mean and standard error of RMSEP would also decreases. The optimum
number of latent variable will depend on how well for certain numbers of
original variables have contribution to the model. Using the dataset (see Table
1), it is clear to see if the proposed Filter-Wrapper method using mod-VIP-
MCUVE need five latent variables to achieve less RMSEP than VIP scaling
method and classical PLS method with no input scaling applied. The MCUVE
input scaling method uses similar number of latent variables as like in Filter-
Wrapper method but with higher RMSEP value. With this less variable used as
predictor in the PLS model, the faster computational speed will be attained.
Here, the proposed Filter-Wrapper method has succeeded to reduce the
RMSEP and improved the accuracy of the PLSR model. The summarization of
the prediction results using training and testing dataset can be seen in Table
1.
Table 1. Statistical measures on prediction results using sine function
Dataset Methods LV RMSEP R RPD Bias SE
2
PLS 9 0.1330 0.9999 82.3860 0.0049 0.1334
VIP-PLS 9 0.1437 0.9998 76.2685 0.0052 0.1441
Training
MCUVE-PLS 5 0.1320 0.9999 83.5957 0.0044 0.1324
mod-VIP-MCUVE 5 0.1266 0.9999 87.1402 0.0041 0.1270
PLS 9 0.1547 0.9998 75.8295 0.0075 0.1563
VIP-PLS 9 0.1544 0.9998 75.9917 0.0223 0.1560
Testing
MCUVE-PLS 5 0.1410 0.9999 83.2257 0.0308 0.1424
mod-VIP-MCUVE 5 0.1311 0.9999 89.4751 0.0155 0.1325
Comparing the SE and RMSEP values (Table 1), the proposed Filter-
Wrapper method both in training and testing dataset produced slightly better
accuracy than the other methods which are 0.127 and 0.126, respectively. The
reliability on these methods also was examined using the RPD value, the
proposed method performs the reliable model compared to the others. It can
be appreciated by removing some irrelevant variables in the model the ability
of trained model on testing dataset at least comparable to the methods with
full variables involved. This shows that when the retained variables in the
model is too large, the irrelevant variables contained may influence the model
hence decrease the model accuracy.
165 | I S I W S C 2 0 1 9