Page 135 - Contributed Paper Session (CPS) - Volume 4
P. 135
CPS2156 Luis Sanguiao Sande
estimations. The increase of the square mean error might be higher though,
so it would be a good thing to know when this is going to happen. We do not
have the answer to this question, but some ideas that might help.
First of all, we are discarding one element (out of bag) of the sample for
each model. This decreases the efficiency of the machine learning algorithm,
because the training set is smaller. The higher the sample size the lesser this
effect, so this might lead to a faster improvement of the bias-corrected
estimator while increasing sample size.
Since estimators for the variance of the second stage based estimator are
known, the variance might be estimated. The variance of the synthetic
estimator is more difficult to estimate, but a bootstrap approximation might
be used. However, it is difficult to say whether a comparison of estimated
variances would solve the problem though.
Observe that the first summand in Proposition C gives us an upper bound
for the variance, because the subtrahend is always positive. But the
expectation of this first summand is the mean of the variances of the difference
between the target variable and its predictions (excluding the training set). So,
the better the algorithm models, the lesser the variance is. Even if it happens
for a small set of samples, overfitting might lead to big variances, while it could
be slightly better for the synthetic estimator, because of error cancellation.
About the two stage decomposition it is a difficult choice and further
research would be needed. If we want unbiased estimation and the same for
variance, the second stage have to have a measurable sampling design. This
usually will mean that the second sample will be of at least size two for each
stratum. For cluster sampling the decomposition is going to be a little more
complicated and probably a minimum of two clusters should be selected at
second stage. As it has already been stated, the bigger the second stage
sample size, the lesser the effective training set of the model, so we should
keep the second stage as smaller as possible.
The decomposition might be chosen so the weights are equal for the
second stage based estimator [5], but there is no reason why this has to be
better than any other choice (except perhaps for calculations). Note that if we
select the weights on sample information, the estimator might become biased.
There exists also the problem of the computational cost, although it would
be not so expensive in production. However, if we want to test other
algorithms with simulations like these, fast specialized software has to be
written before. Fortunately, the problem is embarrassingly parallel so it should
be easy to adapt the simulations to run in a cluster for greater speed.
124 | I S I W S C 2 0 1 9