Page 354 - Special Topic Session (STS) - Volume 4
P. 354

STS1080 Fionn M.

                      A  repository,  entitled  “Medical  Information  Mart  for  Intensive  Care  III”,
                  MIMIC-III, with its data on over 40,000 patients is described. Access to that is
                  described. An interesting statement is that the demographics of China will lead
                  to very high quality big data sources in China. Description is provided of an
                  important release at the beginning of 2017 of the “National scientific data
                  sharing platform for population and health (NSDSPPH)” in China, comprising
                  observed and recorded data with 280 million observations or records. This is
                  noted as an “historic leap in clinical research”.

                  2.   Bias from Self-Selection of Behavioural Data
                      Keiding and Louis (2016), this most comprehensive survey (118 citations)
                  sets out new contemporary issues of sampling and population distribution
                  estimation. An interesting conclusion is the following. “There is the potential
                  for big data to evaluate or calibrate survey findings ... to help to validate cohort
                  studies”. Examples are discussed of “how data ... tracks well with the official”,
                  far larger, repository or holdings.
                      Association  with  such  data  calibration,  and  following  also  the  need  to
                  integrate data sources, is the importance of bridging and shared patterns and
                  associations in the data. Hence, this is to benefit from the methodology of
                  eminent social scientist, Pierre Bourdieu.
                      In Keiding and Louis (2016),  it is well pointed out how one case study
                  discussed “shows the value of using big data to conduct research on surveys
                  (as distinct from survey research)”. Limitations though are clear: “Although
                  randomization in some form is very beneficial, it is by no means a panacea.
                  Trial participants are commonly very different from the external ... pool, in part
                  because of self-selection”. This is due to, “One type of selection bias is self-
                  selection  (which  is  our  focus)”.  Important  points  towards  addressing  these
                  contemporary issues include the following. “When informing policy, inference
                  to identified reference populations is key”. This is part of the bridge which is
                  needed, between data analytics technology and deployment of outcomes.
                      “In  all  situations,  modelling  is  needed  to  accommodate  non-response,
                  dropouts and other forms of missing data”. While “Representativity should be
                  avoided”, here is an essential way to address in a fundamental way, what we
                  need to address: “Assessment of external validity, i.e. generalization to the
                  population from which the study subjects originated or to other populations,
                  will in principle proceed via formulation of abstract laws of nature similar to
                  physical laws”.
                      In Lebaron (2009), Bourdieu’s analytics “amounted to the global (hence Big
                  Data)  effects  of  a  complex  structure  of  interrelationships,  which  is  not
                  reducible  to  the  combination  of  the  multiple  (...  effects)  of  independent
                  variables”. The concept of field, here, uses Geometric Data Analysis that is core




                                                                     343 | I S I   W S C   2 0 1 9
   349   350   351   352   353   354   355   356   357   358   359