Page 107 - Invited Paper Session (IPS) - Volume 1
P. 107

IPS102 Sigita G. et al.
            (national accounts and households surveys). The method for allocating the
            gap depends on the previously mentioned factors. If the initial micro or macro
            data quality is low, the quality of the distributional result will be low. Next, the
            consistency  of  concepts  and  actual  micro  and  macro  data  needs  to  be
            compared. If the consistency of concepts and data is high, the distributional
            results could be estimated to be good, however if the consistency of concepts
            or data is not satisfactory the quality of distributional results would be medium
            or low.

            2.2. Joint distribution of ICW
                Few  countries  run  integrated  surveys  for  collecting  data  on  income,
            consumption and/or wealth at once. This is because such a survey would be
            excessively long and households reluctant to answer. Thus, in most countries
            individual income, consumption and wealth surveys collect data from different
            households. As a consequence, there is no way to directly link the records of
            these surveys and we need statistical matching methods to join the data from
            the  different  sources  together  into  a  single  data  set  using  the  categorical
            variables they have in common.
                In previous experiments, results produced by different matching methods
            (random hot-deck, rank hot-deck, distance hot-deck, conditional mean, mixed
            approach), which have been described by D’Orazio et al. (2006),  had been
            compared.  The  random  hot-deck  method  turned  out  to  be  well  suited  to
            match EU-SILC and HBS data. For joining HFCS to the matched EU-SILC-HBS
            data set, we make use of the gross income variable available in both EU-SILC
            and HFCS to apply the rank hot deck method. It should be kept in mind though
            that both these methods rely on the Conditional Independence Assumption
            (CIA), assuming that the variables of interest (total disposable income, total
            consumption expenditure and total assets) are fully explained by the matching
            variables  and  independent  from  each  other.  Since  the  CIA  might  be
            challenged,  indicators  based  on  the  joint  ICW  micro  data  set  are  purely
            experimental at this stage.
                A  prerequisite  of  statistical  matching  methods  is  the  comparability  of
            potential matching variables. Therefore, we first define the reference person
            of households (following the definition adopted by the Canberra group on
            household  income  statistics,  UNECE  2011)  and  harmonise  common
            categorical variables. These potential matching variables are then compared
            using the Hellinger Distance. We consider variables of the different data sets
            “equally distributed” if the Hellinger Distance is below 0.05. Subsequently, we
            run a backward regression to select those matching variables with the highest
            explanatory power predicting consumption variations. Both EU-SILC and HBS
            data are stratified according to these matching variables. Within each stratum,
            HBS donor observations are randomly selected to match EU-SILC recipient

                                                               96 | I S I   W S C   2 0 1 9
   102   103   104   105   106   107   108   109   110   111   112