Page 86 - Invited Paper Session (IPS) - Volume 1
P. 86

IPS98 Luciana D. V. at al.



                              Data integration of unbalanced social media and
                                      survey information via calibration
                                                                      2
                                                          1
                                        Luciana Dalla Valle , Ron Kenett
                                         University of Plymouth, Plymouth, UK
                         KPA Ltd, Neaman Institute, Technion and Hebrew University, Raanana, Israel

                  Abstract
                  We live in an age of unprecedented amounts of information, generated in
                  every  sector,  including  business,  government  and  health  care,  delivered  at
                  high speed and available in a wide variety of forms and formats.  This data
                  may come from many different sources, such as social media posts, digital
                  pictures and videos, cell phone GPS, purchase transaction records and signals
                  sensors used to gather climate information.  This information - high volume,
                  diverse and fast - is what is called Big Data.  Social media are amongst the
                  most prolific generators of big data and allow billions of people all around the
                  world  to  daily  interact,  post  and  share  contents  and  give  spontaneous
                  feedback  on  specific  topics.  As  opposed  to  traditional  media  such  as
                  newspapers,  books  and  television,  social  media  is  freely  accessible,  which
                  means  everyone  can  publish  content  and  control  how  the  information  is
                  generated and shared.  Through social media, people express their opinions
                  and sentiments towards specific topics, products and services.  The ability to
                  harness  big  data  and  social  media  data  is  an  opportunity  to  obtain  more
                  accurate analyses and to improve decision-making in industry, government
                  and  many  other  organizations.  However,  handling  big  data  may  be
                  challenging and proper data integration is a key dimension in achieving high
                  information quality.  We propose a novel approach to data integration that
                  calibrates online generated big data with interview-based customer survey
                  data.  A common issue of customer surveys is that responses are often overly
                  positive, making it difficult to identify areas of weaknesses in organizations.
                  On the other hand, online reviews are often overly negative, hampering an
                  accurate  evaluation  of  areas  of  excellence.    The  proposed  methodology
                  calibrates  the  levels  of  unbalanced  responses  in  different  data  sources  via
                  resampling  and  performs  data  integration  using  Bayesian  Networks  to
                  propagate the new re-balanced information.   We show, with a case study
                  example,  how  the  novel  data  integration  approach  allows  businesses  and
                  organizations to get a bias-corrected appraisal of the level of satisfaction of
                  their customers.  The application is based on the integration of online data of
                  review blogs and customer satisfaction surveys from the San Francisco airport.
                  We illustrate how this integration enhances the information quality of the data




                                                                     75 | I S I   W S C   2 0 1 9
   81   82   83   84   85   86   87   88   89   90   91