Page 270 - Special Topic Session (STS) - Volume 2
P. 270

STS493 Irene S.
                  missing data needs to be imputated than preferably in such a way that the
                  outcome  is  consistent  with  previous  published  outcomes  and  when  similar
                  aggregates are published in various tables one strives for micro-meso-macro
                  consistency. The real challenge and paradigm shift will be the approach where
                  in  advance  to  making  the  observation  and  survey  design,  the  information
                  already available (registers, registrations, Big Data, etc.) is taken into account:
                  integration by design. Instead of enriching surveys with administrative data the
                  (combined), various administrative and Big Data sets are completed with survey
                  data (only when necessary). In addition, the challenge can even be take one
                  step further. What about the knowledge we have on events that have taken
                  place and are known to us before we have collected the data? Event driven
                  processing opens up a completely new area of possibilities and challenges.

                  2.6 Technology
                     For researchers there is the need to get easier and faster access to more
                  data with better tools. The reality is that datasets become too big to copy, are
                  not allowed to “leave the building”, need matching between multiple sources,
                  require knowledge on the source and its metadata, etc. How to deal with these
                  different challenges may be dependent on the type of “data sharing”. In its data
                  architecture, CBS defined four patterns; 1) External data is coming to CBS to be
                  matched with CBS data, 2) CBS data is brought to an allocated environment to
                  be matched with other data, 3) Both external and CBS data are kept on premise,
                  data is matched by virtual connecting and 4) The algorithm (and data) is send
                  to the other data source to execute the matching. These patterns come with
                  certain  (distinct)  capabilities  that  need  further  investigation.  For  example,
                  Privacy Preserving Analytics enables analysis of privacy sensitive data of various
                  sources  without  the  risk  to  look  into  each  other’s  micro  data  but  with  the
                  outcome  of  new  statistics  and  insights.  It  is  also  possible  to  apply  special
                  encryption methods to enable various parties to execute computations with
                  each other’s privacy sensitive data (secure multi party computing). The content
                  of the records stays hidden and no third party is needed to keep the encryption
                  keys.  Often  datasets  consist  of  different  formats  and  size,  which  makes  it
                  impractible or undesirable to copy this data physically to one location. The
                  technique of data virtualisation then offers the possibility to simplify access to
                  data regardless where and how the data is stored. An important prerequisite
                  to  almost  all  techniques  is  presence  of  meta  data  (conceptual,  technical,
                  process, origin). In fact, meta data is the hub of the action and forms the basis
                  for  seeking  and  finding  data.  In  order  to  help  users  find  the  right  data,
                  understand  its  semantic  characteristics  and  use  this  information  in  data
                  integration, data analyses and other statistical activities, CBS developed a meta
                  data model. The model is based on a graph representation of characteristics
                  that describe statistical datasets as well as relationships between datasets.

                                                                     259 | I S I   W S C   2 0 1 9
   265   266   267   268   269   270   271   272   273   274   275