Page 269 - Special Topic Session (STS) - Volume 2
P. 269

STS493 Irene S.
            methodologies are developed to use several (new) sources to distinguish for
            example enterprises involved in internet economy, e.g. Oostrom et al. (2016).
            In  order  to  be  useful  for  statistical  purposes,  as  for  many  other  Big  Data
            investigations, the data needs to be linked to statistical information. In this case,
            the characteristics of websites needed to be linked to the businesses behind
            the website. Therefore two key pieces of information were used; the websites
            as  recorded  in  the  SBR,  and  the  businesses’  CoC-registration  number  as
            published  on  the  website.  These  identifiers  provide  the  basis  upon  which
            websites  can  be  linked  to  the  respective  businesses.  Subsequently,  when
            successfully linked, the SBR facilities further links to a variety of data sources
            available at  CBS  (see 2.2).  Obviously,  the  future lies  in extensively  linking  a
            multitude on data sources where the international dimension in characterizing
            enterprises  (also  small  and  medium)  is  unabated  important.  As  Timothy
            Sturgeon (2013) stated: “Clearly, the assumptions behind current data regimes
            have changed and statistical systems are struggling to catch up. While it will be
            exceedingly difficult to fill data gaps without new data, and progress that relies
            only  on  existing  data  resources  will  always  be  limited,  the  most  efficient
            approach  will  be  to  develop  systematic  links  between  key  existing  data,
            supplemented  with  a  few  additional  variables,  with  data  on  enterprise
            characteristics  drawn  from  administrative  sources,  all  tied  together  by
            enterprise identifiers that make ownership clear, even when it extends across
            borders.”

            2.5 Methodology on combining data
                In our ambition to increase our statistical output an important prerequisite
            for CBS is to make as much and diverse as possible data available. CBS does
            this by combining administrative data from registers, registrations, Big Data (i.e.
            sensor  data),  private  data  and  survey  data.  The  adequate  combination  of
            sources can be decisive regarding the outcome, implying that approach and
            way of working need to be adjusted. In fact, when combining multiple data
            sources from multiple modes the challenge is to develop methodology that
            helps to deal with issues concerning matching of data sources. Specific issues
            that can occur when matching various sources are; units to be matched do not
            equal source units (persons, businesses), sources do not contain overlapping
            units  however  one  wants  to  estimate  the  correlation  between  variables
            occurring in both sources, matching errors resulting in bias of an estimator that
            one wants to correct and (assuring) the coherence between statistics. General
            techniques  like  probabilistic  matching,  matching  with  supervised  machine
            learning and synthetic matching are extended and/or combined to solve these
            issues. In addition, combining registers and survey data comes with its specific
            challenges; variables can occur in multiple sources with different measurement
            errors for which methods are developed to come to consistent estimates. When

                                                               258 | I S I   W S C   2 0 1 9
   264   265   266   267   268   269   270   271   272   273   274