Page 175 - Contributed Paper Session (CPS) - Volume 4
P. 175
CPS2166 Divo Dharma Silalahi et al.
in MCUVE threshold, then the new proposed cut-off criterion can be defined
as
+
cut - off value = median ( ( c j ) artif ) k (MAD (c j ) artif ) (6)
where k can be calculated as
z + z − a b
2
k = γ γ (7)
a
with constant parameters z 2 and 2 − z 2 α r as number of
;
=1a − α = zb γ
( 2 r − )1 r
MC random repetition, α as a level of error, and γ as desired proportion. The
wavelengths with reliability c less than the cut-off threshold criterion in (7)
j
are moved in the deleted set as D and while the rest wavelengths which are
the relevant wavelengths are placed in the remaining set as R . Updating the
total OPLS-VIP score in (3) only using the remaining set R then the new scaled
input variable in (4) for PLSR model just follows.
3. Result
3.1 Simulation Data
The training set uses 150 samples data and the testing set uses 50 samples
data that both were generated randomly using uniform distribution with 0.03
of noise was also applied. The number of input variables and output variable
is 40 and 1, respectively. The formulation of this illustrative simulation can be
defined as follows
c ~ runif (n ,1,10) ( = 1, 2, , 3 , 40)
j
j
e ~ rnorm (n ) ( = 1, , 0 2, , 40)
j
j (8)
x = c + e j
j
j
y = c + 3c + . 0 85c + 2c + 1. 75c + 9 . 0 c + e 0
5
1
22
35
15
7
here, c and e are independent each other and are not measured variables
j
j
while x and y are illustrated as observable variables. As seen in (8), there
j
, x
were 6 input variables ( x 5 , x 7 , x 15 , x 22 , x 35 ) related to the response
1
variable, while the remaining 34 input variables were not used in the
formulation and were assumed as irrelevant variables. The different
coefficients value in the formulation (8) shows the contribution level of each
relevant variable to the response variable. It should be considered if these
relevant variables were manually selected in the formulation, in fact the
importance of input variables is generally unknown. All these input variables
are represented as n x m matrix X and used in the calculation for model
construction. In the PLSR model, the number of latent variables (also called as
components) is a principal indicator in the modeling since it may always be
164 | I S I W S C 2 0 1 9