Page 401 - Special Topic Session (STS) - Volume 2
P. 401

STS507 Katherine Jenny T. et al.
            receipts are often weakly related.  In most industries, the frequently reported
            products are highly correlated with total receipts and generally make up the
            majority of the total value of receipts, whereas the remaining products are not.
            Thus,  the  best  predictors  of  an  establishment’s  products  are  the  industry
            assigned to the establishment from the sampling frame, which may change
            after collection, and the total receipts value (Ellis and Thompson 2015). Given
            the lack of predictors and the concerns about the consistency of the 2012 and
            2017  data  collections,  the  team  considered  four  candidate  imputation
            methods:
              Ratio imputation
              Sequential  Regression  Multivariate  Imputation  (SRMI)  as  described  in
                Raghunathan et al (2001)
              Two variations of  hot deck  imputation (random and nearest neighbor),
                both which imputed the multivariate distribution of products from donor
                establishments.
                The  team  decided  on  a  simulation  approach  to  create  industry
            “populations” from historical sample data in 39 industries by applying each
            candidate imputation method to replace the missing data as suggested by Dr.
            Trivellore  Raghunathan  (University  of  Michigan).  Product  nonresponse  was
            induced in 50 independent replicates in each completed population, and all
            four candidate methods were used to “complete” the datasets, using multiple
            imputation to obtain the imputed estimates, standard errors, and evaluation
            statistics (imputation error and fraction of missing information).
            By  design  and  necessity,  the  simulation  study  made  some  simplifying
            assumptions beyond those already mentioned. Small sample size effects were
            controlled by choice of estimate level (national) and the selection of study
            industries.  These  choices  sidestepped  issues  that  would  arise  from  small
            respondent sample sizes in imputation cells. The evaluation was restricted to
            the two best-reported broad products in each studied industry in terms of
            number of establishments that reported the product. Rescaling the size of the
            problem reduced computation time and increased available time for analysis,
            although it did impact the study’s “representativeness.” Lastly, the evaluation
            used  rank-based  tests  within  industry  to  compare  the  procedures,  so  that
            substantive improvements or deficiencies in specific situations were largely
            ignored.  The  evaluation  procedures  found  common  patterns  among  the
            methods on each evaluation criterion on the best-reported products instead
            of  using  statistics  for  every  product  reported  in  an  industry.  The  team
            recommended using hot deck imputation for broad products in all industries,
            allowing different hot deck variations by industries. This recommendation was
            endorsed  by  the  project  stakeholders.  That  said,  the  recommendation  was
            incomplete. No guidance was provided in terms of optimal imputation cells,
            minimum cell size (or collapsing rules), backup imputation methods (in the

                                                               390 | I S I   W S C   2 0 1 9
   396   397   398   399   400   401   402   403   404   405   406