Page 332 - Contributed Paper Session (CPS) - Volume 4
P. 332
CPS2249 Azrin A. et al.
2.3 Statistical Disclosure Limitation (SDL)
Nowadays, the emerging of new technologies in the world of data
dissemination and access has put the data producer and disseminators
in a difficult position. They are pressured by users who required
everything about the data but at the same time presurred by the limit
and restriction on what to be released. The conflict between the accuracy
of the dissemination and the risk of disclosing respondent information
also have to be put into consideration and overcomed with the most
appropriate disclosure procedure or implementation. The key aspect
also is how to trade off these two elements whilst guaranteeing the users
requirement are being fulfilled. Statistical disclosure limitation divides
into strategies based on restricted data and those based on restricted
access.
Restricted data SDL strategies referred to masking and modifiying
the data in ways that limit potential for disclosure. The modifications
includes simple thing such as removing variables and records. However,
in most cases, this is not enough which required a more complex
alterarion such as swapping (Dalenius and Reiss, 1982; Gomatam, Karr
and Sanil, 2003), adding random noise to units' values (Fuller, 1993),
microaggregation, and other forms of data pertubation (Gomatam et al.,
2016). For example, the first step in preventing identity disclosure is by
removing explicit identifiers such as HIV status, address, identification
card number, as well as implicit identifiers, such as "Occupation = Chief
Statistician of Malaysia." Another example of this strategy is to protect
units with high incomes, income is frequently "top coded," so that one
category is "More than $X." Resctricted data SDl strategies can be
applied with varying intensity. Generally, the higher the SDL intensity,
the greater the protection against disclosure risk, but the less the utility
of the released data. At least implicitly, agencies choose SDL strategies
by balancing confidentiality protection and utility of the released
information.
While for the restricted access SDL strategies, the mechanisms include
data centers, licensing, and vetting of researchers and their research
plan. This strategy allows users to perform analyses directly on the
underlying data. The specific analyses may be suppressed, if the analysis
is known to threaten confidentiality, or a posteriori, the output reveals a
threat. According to the confidential level, there are four types of files of
dissemination: public use files, licensed files; data enclave; remote data
access.These centers rely on the honesty of researcher to protect
confidentiality, and can be expensive for the agencies and inconvenient
for researcher.
321 | I S I W S C 2 0 1 9