Page 132 - Contributed Paper Session (CPS) - Volume 4
P. 132

CPS2156 Luis Sanguiao Sande
                  Definition 2: The second stage based estimator is given by
                                                        ( ) (\ )
                                         ̂
                                         2  =  ∑  ̂  1  1  2    1
                                                     1
                                                            ()
                                               ⍛⊂
                  The second stage based estimator can be used to build unbiased estimators
                  according to the following proposition.
                  Proposition A: If the estimator   1 is 2 unbiased, then the second stage
                  based estimator is  unbiased.
                  Proof. We have to proof that  ( ) =  so
                                                  ̂
                                                
                                                   2






                  where in the last equality we are just changing the order of the summands,
                  grouping by 1. But now, we can move the factor 1(1) outside the second
                  summation, so finally






                  since   is 2 unbiased and ∑  ( ) = 1.
                        ̂
                                                   1
                                                      1
                                               1
                         1
                     Now it is easy to build unbiased estimators. The sample 1 can be used to
                  build the function (1) and we can use a second stage Horvitz-Thompson to
                  get  an  unbiased  estimation  of  the  sum  of  the  model  errors.  Since  the
                  predictions  are known  for  the  whole  population  and  the  target  variable  is
                  known for 1 it is trivial to get an unbiased estimator of .
                     Under simple random sampling with the decomposition mentioned in the
                  previous section, the expression of the estimator would be



                     The first summand inside the parenthesis is the sum of the training set, and
                  we have to add it because we want to estimate  and we do not have those
                  elements at second stage. The second summand is the synthetic estimator for
                  the totals of  on the second stage population. The third one is the second
                  stage Horvitz-Thompson estimator of the difference between the totals of 
                  and   ,  once  again  on  the  second  stage  population.  Therefore,  the  three
                  summands  compound  an  unbiased  second  stage  estimator  of  ,  and  by
                  Proposition A,  is unbiased for the sampling design .
                                ̂
                                 2
                     Note that we have to fit a lot of models to build these estimators: even if
                  2 contains just one element, we will have to fit  models! So second stage
                  estimators are computationally expensive and we lack a fast specific software

                                                                     121 | I S I   W S C   2 0 1 9
   127   128   129   130   131   132   133   134   135   136   137