Page 401 - Special Topic Session (STS) - Volume 2
P. 401
STS507 Katherine Jenny T. et al.
receipts are often weakly related. In most industries, the frequently reported
products are highly correlated with total receipts and generally make up the
majority of the total value of receipts, whereas the remaining products are not.
Thus, the best predictors of an establishment’s products are the industry
assigned to the establishment from the sampling frame, which may change
after collection, and the total receipts value (Ellis and Thompson 2015). Given
the lack of predictors and the concerns about the consistency of the 2012 and
2017 data collections, the team considered four candidate imputation
methods:
Ratio imputation
Sequential Regression Multivariate Imputation (SRMI) as described in
Raghunathan et al (2001)
Two variations of hot deck imputation (random and nearest neighbor),
both which imputed the multivariate distribution of products from donor
establishments.
The team decided on a simulation approach to create industry
“populations” from historical sample data in 39 industries by applying each
candidate imputation method to replace the missing data as suggested by Dr.
Trivellore Raghunathan (University of Michigan). Product nonresponse was
induced in 50 independent replicates in each completed population, and all
four candidate methods were used to “complete” the datasets, using multiple
imputation to obtain the imputed estimates, standard errors, and evaluation
statistics (imputation error and fraction of missing information).
By design and necessity, the simulation study made some simplifying
assumptions beyond those already mentioned. Small sample size effects were
controlled by choice of estimate level (national) and the selection of study
industries. These choices sidestepped issues that would arise from small
respondent sample sizes in imputation cells. The evaluation was restricted to
the two best-reported broad products in each studied industry in terms of
number of establishments that reported the product. Rescaling the size of the
problem reduced computation time and increased available time for analysis,
although it did impact the study’s “representativeness.” Lastly, the evaluation
used rank-based tests within industry to compare the procedures, so that
substantive improvements or deficiencies in specific situations were largely
ignored. The evaluation procedures found common patterns among the
methods on each evaluation criterion on the best-reported products instead
of using statistics for every product reported in an industry. The team
recommended using hot deck imputation for broad products in all industries,
allowing different hot deck variations by industries. This recommendation was
endorsed by the project stakeholders. That said, the recommendation was
incomplete. No guidance was provided in terms of optimal imputation cells,
minimum cell size (or collapsing rules), backup imputation methods (in the
390 | I S I W S C 2 0 1 9