Page 135 - Contributed Paper Session (CPS) - Volume 4
P. 135

CPS2156 Luis Sanguiao Sande
            estimations. The increase of the square mean error might be higher though,
            so it would be a good thing to know when this is going to happen. We do not
            have the answer to this question, but some ideas that might help.
                First of all, we are discarding one element (out of bag) of the sample for
            each model. This decreases the efficiency of the machine learning algorithm,
            because the training set is smaller. The higher the sample size the lesser this
            effect,  so  this  might  lead  to  a  faster  improvement  of  the  bias-corrected
            estimator while increasing sample size.
                Since estimators for the variance of the second stage based estimator are
            known,  the  variance  might  be  estimated.  The  variance  of  the  synthetic
            estimator is more difficult to estimate, but a bootstrap approximation might
            be used. However, it is difficult to say whether a  comparison of estimated
            variances would solve the problem though.
                Observe that the first summand in Proposition C gives us an upper bound
            for  the  variance,  because  the  subtrahend  is  always  positive.  But  the
            expectation of this first summand is the mean of the variances of the difference
            between the target variable and its predictions (excluding the training set). So,
            the better the algorithm models, the lesser the variance is. Even if it happens
            for a small set of samples, overfitting might lead to big variances, while it could
            be slightly better for the synthetic estimator, because of error cancellation.
                About  the  two  stage  decomposition  it  is  a  difficult  choice  and  further
            research would be needed. If we want unbiased estimation and the same for
            variance, the second stage have to have a measurable sampling design. This
            usually will mean that the second sample will be of at least size two for each
            stratum. For cluster sampling the decomposition is going to be a little more
            complicated and probably a minimum of two clusters should be selected at
            second  stage.  As  it  has  already  been  stated,  the  bigger  the  second  stage
            sample size, the lesser the effective training set of the model, so we should
            keep the second stage as smaller as possible.
                The  decomposition  might  be  chosen  so  the  weights  are  equal  for  the
            second stage based estimator [5], but there is no reason why this has to be
            better than any other choice (except perhaps for calculations). Note that if we
            select the weights on sample information, the estimator might become biased.
                There exists also the problem of the computational cost, although it would
            be  not  so  expensive  in  production.  However,  if  we  want  to  test  other
            algorithms  with  simulations  like  these,  fast  specialized  software  has  to  be
            written before. Fortunately, the problem is embarrassingly parallel so it should
            be easy to adapt the simulations to run in a cluster for greater speed.






                                                               124 | I S I   W S C   2 0 1 9
   130   131   132   133   134   135   136   137   138   139   140