Page 404 - Special Topic Session (STS) - Volume 2
P. 404

STS507 Katherine Jenny T. et al.
                  this,  category  averages were  computed  for  each  detailed  product  within  a
                  broad product for each potential imputation cell (generally industry-by-state-
                  by-unit  type,  industry-by-state,  industry)  with  a  required  minimum  of  one
                  establishment in the cell reporting the detailed products. Designated missing
                  detailed products were imputed from their associated broad product total by
                  using the appropriate category averages. Once this process was finished and
                  all donors were made “Complete,” the hot deck process was performed to
                  impute products for all “Full” recipients.  This approach of completing the
                  partial and minimal donors maximized the use of reported data in the hot deck
                  imputation  procedures,  but  later  complicated  the  variance  estimation  of
                  detailed products due to the partial donor/recipient establishments.
                      The implementation team met regularly over a two-year period. During
                  this  collaborative  period,  methodologists  met  separately  each  week  (along
                  with the team leader) to develop the missing data procedures and treatments
                  that were not addressed by the research team. Specifications were reviewed
                  first by this subgroup, then by the entire team. Testing was a larger problem.
                  Using small, single industry test decks, we were able to verify that the category
                  average and hot-deck processes were working correctly. However, one of the
                  concerns about hot-deck imputation was the time it would take to run the
                  process to impute missing products for the entire EC. To determine estimated
                  run-times, as well as test more scenarios, we created a full size test deck with
                  roughly 2.4 million donors (with over 20 million products) and 1.1 million full
                  recipients covering all NAICS sectors in-scope to the EC. Using a concordance
                  that mapped 2012 product codes to 2017 NAPCS codes, we converted the
                  2012  EC  product  data  to  a  2017  NAPCS  basis,  again  making  simplifying
                  assumptions, while ensuring that certain specific scenarios were included in
                  the test data. The performance testing using this test deck took approximately
                  80  minutes.  This  was  a  reassuring  result,  although  it  might  not  directly
                  translate to run times using actual 2017 EC production data and systems.

                  4.  Discussion and Conclusion
                      When  developing  a  research  plan  that  applies  to  an  ongoing  survey,
                  finding  balance  is  hard.  On  one  hand,  making  the  scenario  as  simple  as
                  possible  reduces  the  probability  of  treatment  effects  (solutions)  being
                  confounded by factors such as sample size or random noise. On the other
                  hand, oversimplification can lead to very impractical solutions. Of course, it is
                  crucial to limit the scope so that the research can be timely enough to be
                  relevant  when  completed.  However,  it  should  be  acknowledged  that
                  compressing the scope can lead to hasty decisions later in the implementation
                  process, when there is no time left for careful further investigation.
                      There are real advantages in establishing (almost) separate research and
                  implementation teams as discussed in this paper. Having two teams approach

                                                                     393 | I S I   W S C   2 0 1 9
   399   400   401   402   403   404   405   406   407   408   409