Page 107 - Invited Paper Session (IPS) - Volume 1
P. 107
IPS102 Sigita G. et al.
(national accounts and households surveys). The method for allocating the
gap depends on the previously mentioned factors. If the initial micro or macro
data quality is low, the quality of the distributional result will be low. Next, the
consistency of concepts and actual micro and macro data needs to be
compared. If the consistency of concepts and data is high, the distributional
results could be estimated to be good, however if the consistency of concepts
or data is not satisfactory the quality of distributional results would be medium
or low.
2.2. Joint distribution of ICW
Few countries run integrated surveys for collecting data on income,
consumption and/or wealth at once. This is because such a survey would be
excessively long and households reluctant to answer. Thus, in most countries
individual income, consumption and wealth surveys collect data from different
households. As a consequence, there is no way to directly link the records of
these surveys and we need statistical matching methods to join the data from
the different sources together into a single data set using the categorical
variables they have in common.
In previous experiments, results produced by different matching methods
(random hot-deck, rank hot-deck, distance hot-deck, conditional mean, mixed
approach), which have been described by D’Orazio et al. (2006), had been
compared. The random hot-deck method turned out to be well suited to
match EU-SILC and HBS data. For joining HFCS to the matched EU-SILC-HBS
data set, we make use of the gross income variable available in both EU-SILC
and HFCS to apply the rank hot deck method. It should be kept in mind though
that both these methods rely on the Conditional Independence Assumption
(CIA), assuming that the variables of interest (total disposable income, total
consumption expenditure and total assets) are fully explained by the matching
variables and independent from each other. Since the CIA might be
challenged, indicators based on the joint ICW micro data set are purely
experimental at this stage.
A prerequisite of statistical matching methods is the comparability of
potential matching variables. Therefore, we first define the reference person
of households (following the definition adopted by the Canberra group on
household income statistics, UNECE 2011) and harmonise common
categorical variables. These potential matching variables are then compared
using the Hellinger Distance. We consider variables of the different data sets
“equally distributed” if the Hellinger Distance is below 0.05. Subsequently, we
run a backward regression to select those matching variables with the highest
explanatory power predicting consumption variations. Both EU-SILC and HBS
data are stratified according to these matching variables. Within each stratum,
HBS donor observations are randomly selected to match EU-SILC recipient
96 | I S I W S C 2 0 1 9