Page 352 - Special Topic Session (STS) - Volume 2
P. 352

STS498 Wei-Yin L.
                  12 months. Each variable with missing values is associated with a “missing
                  value  flag  variable”  that  takes  values  given  in  Table  2.  Flag  variables  have
                  underscores in their names; e.g, INTRDVX is the flag variable associated with
                  INTRDVX. There are 587 neither constant nor completely missing X variables
                  that may be used to estimate the population mean of INTRDVX. About 20%
                  of  these  variables  have  missing  values;  67  of  them  have  more  than  95%
                  missing values. No CU has complete responses on all 587 variables.

                     Table 1: Variables and percents of missing values in consumer expenditure data
                   AGE REF       Age of reference person                                   0
                   FFTAXOWE      Weighted estimate for federal tax liabilities             0
                   INTRDVX       Interest or dividend received past 12 mos.                0
                   PERINSPQ      Personal insurance and pensions last quarter
                   RENTEQVX      Monthly rent if home rented                            15.6
                   RETSURVX      Retirement, survivor or disability pensions past 12 mos.   0
                   RETS RVX      Flag variable for RETSURVX                                0
                   STATE         State (39 categories)                                  11.1
                   STOCKX        Value of directly-held stocks, bonds, mutual funds     92.0
                   TOTXEST       Estimated total taxes paid                                0

                             Table 2: Codes and definitions of missing value flag variables
                              A    valid nonresponse: a response is not anticipated
                              C    “don’t know”, refusal or other type of nonresponse
                              D    valid data value
                              T    topcoding applied to value

                     Figure  1  shows  the  GUIDE  piecewise-constant  regression  tree  for
                  estimating the mean of INTRDVX. A condition is printed on the left side of
                  each intermediate node of the tree. A respondent goes to the left branch if
                  and  only  if  the  condition  is  satisfied.  The  sample  size  and  sample  mean
                  INTRDVX are printed below each terminal node. For example, at the root node,
                  the 803 respondents who are 57 years or younger go to the left subnode which
                  has a mean INTRDVX of $803. The other respondents go to the right subnode.
                  The symbol “<∗” is an abbreviation for “< or missing.” For example, the right
                  node immediately below the root node is split on STOCKX. Respondents with
                  STOCKX < $191,160 or STOCKX = missing go to the left subnode. The node in
                  black shows a special case where respondents go to the left branch if and only
                  if RETSURVX < $11,342 or with flag variable RETS RVX = A. See Loh et al. (2019)
                  for a deeper analysis of the data.








                                                                     341 | I S I   W S C   2 0 1 9
   347   348   349   350   351   352   353   354   355   356   357