Page 332 - Contributed Paper Session (CPS) - Volume 4
P. 332

CPS2249 Azrin A. et al.
                     2.3 Statistical Disclosure Limitation (SDL)
                         Nowadays, the emerging of new technologies in the world of data
                     dissemination and access has put the data producer and disseminators
                     in  a  difficult  position.  They  are  pressured  by  users  who  required
                     everything about the data but at the same time presurred by the limit
                     and restriction on what to be released. The conflict between the accuracy
                     of the dissemination and the risk of disclosing respondent information
                     also have to be put into consideration and overcomed with the most
                     appropriate disclosure procedure or implementation.  The key aspect
                     also is how to trade off these two elements whilst guaranteeing the users
                     requirement are being fulfilled. Statistical disclosure limitation divides
                     into strategies based on restricted data and those based on restricted
                     access.
                         Restricted data SDL strategies referred to masking and modifiying
                     the data in ways that limit potential for disclosure. The modifications
                     includes simple thing such as removing variables and records. However,
                     in  most  cases,  this  is  not  enough  which  required  a  more  complex
                     alterarion such as swapping (Dalenius and Reiss, 1982; Gomatam, Karr
                     and Sanil, 2003), adding random noise to units' values (Fuller, 1993),
                     microaggregation, and other forms of data pertubation (Gomatam et al.,
                     2016). For example, the first step in preventing identity disclosure is by
                     removing explicit identifiers such as HIV status, address, identification
                     card number, as well as implicit identifiers, such as "Occupation = Chief
                     Statistician of Malaysia." Another example of this strategy is to protect
                     units with high incomes, income is frequently "top coded," so that one
                     category  is  "More  than  $X."  Resctricted  data  SDl  strategies  can  be
                     applied with varying intensity. Generally, the higher the SDL intensity,
                     the greater the protection against disclosure risk, but the less the utility
                     of the released data. At least implicitly, agencies choose SDL strategies
                     by  balancing  confidentiality  protection  and  utility  of  the  released
                     information.
                       While for the restricted access SDL strategies, the mechanisms include
                     data  centers,  licensing,  and  vetting  of  researchers  and  their  research
                     plan.  This  strategy  allows  users  to  perform  analyses  directly  on  the
                     underlying data. The specific analyses may be suppressed, if the analysis
                     is known to threaten confidentiality, or a posteriori, the output reveals a
                     threat. According to the confidential level, there are four types of files of
                     dissemination: public use files, licensed files; data enclave; remote data
                     access.These  centers  rely  on  the  honesty  of  researcher  to  protect
                     confidentiality, and can be expensive for the agencies and inconvenient
                     for researcher.



                                                                  321 | I S I   W S C   2 0 1 9
   327   328   329   330   331   332   333   334   335   336   337