Page 270 - Special Topic Session (STS) - Volume 2
P. 270
STS493 Irene S.
missing data needs to be imputated than preferably in such a way that the
outcome is consistent with previous published outcomes and when similar
aggregates are published in various tables one strives for micro-meso-macro
consistency. The real challenge and paradigm shift will be the approach where
in advance to making the observation and survey design, the information
already available (registers, registrations, Big Data, etc.) is taken into account:
integration by design. Instead of enriching surveys with administrative data the
(combined), various administrative and Big Data sets are completed with survey
data (only when necessary). In addition, the challenge can even be take one
step further. What about the knowledge we have on events that have taken
place and are known to us before we have collected the data? Event driven
processing opens up a completely new area of possibilities and challenges.
2.6 Technology
For researchers there is the need to get easier and faster access to more
data with better tools. The reality is that datasets become too big to copy, are
not allowed to “leave the building”, need matching between multiple sources,
require knowledge on the source and its metadata, etc. How to deal with these
different challenges may be dependent on the type of “data sharing”. In its data
architecture, CBS defined four patterns; 1) External data is coming to CBS to be
matched with CBS data, 2) CBS data is brought to an allocated environment to
be matched with other data, 3) Both external and CBS data are kept on premise,
data is matched by virtual connecting and 4) The algorithm (and data) is send
to the other data source to execute the matching. These patterns come with
certain (distinct) capabilities that need further investigation. For example,
Privacy Preserving Analytics enables analysis of privacy sensitive data of various
sources without the risk to look into each other’s micro data but with the
outcome of new statistics and insights. It is also possible to apply special
encryption methods to enable various parties to execute computations with
each other’s privacy sensitive data (secure multi party computing). The content
of the records stays hidden and no third party is needed to keep the encryption
keys. Often datasets consist of different formats and size, which makes it
impractible or undesirable to copy this data physically to one location. The
technique of data virtualisation then offers the possibility to simplify access to
data regardless where and how the data is stored. An important prerequisite
to almost all techniques is presence of meta data (conceptual, technical,
process, origin). In fact, meta data is the hub of the action and forms the basis
for seeking and finding data. In order to help users find the right data,
understand its semantic characteristics and use this information in data
integration, data analyses and other statistical activities, CBS developed a meta
data model. The model is based on a graph representation of characteristics
that describe statistical datasets as well as relationships between datasets.
259 | I S I W S C 2 0 1 9