Page 354 - Special Topic Session (STS) - Volume 4
P. 354
STS1080 Fionn M.
A repository, entitled “Medical Information Mart for Intensive Care III”,
MIMIC-III, with its data on over 40,000 patients is described. Access to that is
described. An interesting statement is that the demographics of China will lead
to very high quality big data sources in China. Description is provided of an
important release at the beginning of 2017 of the “National scientific data
sharing platform for population and health (NSDSPPH)” in China, comprising
observed and recorded data with 280 million observations or records. This is
noted as an “historic leap in clinical research”.
2. Bias from Self-Selection of Behavioural Data
Keiding and Louis (2016), this most comprehensive survey (118 citations)
sets out new contemporary issues of sampling and population distribution
estimation. An interesting conclusion is the following. “There is the potential
for big data to evaluate or calibrate survey findings ... to help to validate cohort
studies”. Examples are discussed of “how data ... tracks well with the official”,
far larger, repository or holdings.
Association with such data calibration, and following also the need to
integrate data sources, is the importance of bridging and shared patterns and
associations in the data. Hence, this is to benefit from the methodology of
eminent social scientist, Pierre Bourdieu.
In Keiding and Louis (2016), it is well pointed out how one case study
discussed “shows the value of using big data to conduct research on surveys
(as distinct from survey research)”. Limitations though are clear: “Although
randomization in some form is very beneficial, it is by no means a panacea.
Trial participants are commonly very different from the external ... pool, in part
because of self-selection”. This is due to, “One type of selection bias is self-
selection (which is our focus)”. Important points towards addressing these
contemporary issues include the following. “When informing policy, inference
to identified reference populations is key”. This is part of the bridge which is
needed, between data analytics technology and deployment of outcomes.
“In all situations, modelling is needed to accommodate non-response,
dropouts and other forms of missing data”. While “Representativity should be
avoided”, here is an essential way to address in a fundamental way, what we
need to address: “Assessment of external validity, i.e. generalization to the
population from which the study subjects originated or to other populations,
will in principle proceed via formulation of abstract laws of nature similar to
physical laws”.
In Lebaron (2009), Bourdieu’s analytics “amounted to the global (hence Big
Data) effects of a complex structure of interrelationships, which is not
reducible to the combination of the multiple (... effects) of independent
variables”. The concept of field, here, uses Geometric Data Analysis that is core
343 | I S I W S C 2 0 1 9