Page 404 - Special Topic Session (STS) - Volume 2
P. 404
STS507 Katherine Jenny T. et al.
this, category averages were computed for each detailed product within a
broad product for each potential imputation cell (generally industry-by-state-
by-unit type, industry-by-state, industry) with a required minimum of one
establishment in the cell reporting the detailed products. Designated missing
detailed products were imputed from their associated broad product total by
using the appropriate category averages. Once this process was finished and
all donors were made “Complete,” the hot deck process was performed to
impute products for all “Full” recipients. This approach of completing the
partial and minimal donors maximized the use of reported data in the hot deck
imputation procedures, but later complicated the variance estimation of
detailed products due to the partial donor/recipient establishments.
The implementation team met regularly over a two-year period. During
this collaborative period, methodologists met separately each week (along
with the team leader) to develop the missing data procedures and treatments
that were not addressed by the research team. Specifications were reviewed
first by this subgroup, then by the entire team. Testing was a larger problem.
Using small, single industry test decks, we were able to verify that the category
average and hot-deck processes were working correctly. However, one of the
concerns about hot-deck imputation was the time it would take to run the
process to impute missing products for the entire EC. To determine estimated
run-times, as well as test more scenarios, we created a full size test deck with
roughly 2.4 million donors (with over 20 million products) and 1.1 million full
recipients covering all NAICS sectors in-scope to the EC. Using a concordance
that mapped 2012 product codes to 2017 NAPCS codes, we converted the
2012 EC product data to a 2017 NAPCS basis, again making simplifying
assumptions, while ensuring that certain specific scenarios were included in
the test data. The performance testing using this test deck took approximately
80 minutes. This was a reassuring result, although it might not directly
translate to run times using actual 2017 EC production data and systems.
4. Discussion and Conclusion
When developing a research plan that applies to an ongoing survey,
finding balance is hard. On one hand, making the scenario as simple as
possible reduces the probability of treatment effects (solutions) being
confounded by factors such as sample size or random noise. On the other
hand, oversimplification can lead to very impractical solutions. Of course, it is
crucial to limit the scope so that the research can be timely enough to be
relevant when completed. However, it should be acknowledged that
compressing the scope can lead to hasty decisions later in the implementation
process, when there is no time left for careful further investigation.
There are real advantages in establishing (almost) separate research and
implementation teams as discussed in this paper. Having two teams approach
393 | I S I W S C 2 0 1 9