Page 88 - Invited Paper Session (IPS) - Volume 1
P. 88

IPS98 Luciana D. V. at al.
                  schema mapping, record linkage and data fusion and identify a range of open
                  problems  in  this  research  area.  Chakraborty  et  al.  (2015)  define  a  novel
                  approach to integrate diverse data types, such as historic data, survey data,
                  management  planning  data,  expert  knowledge  and  incomplete  data,  by
                  converting data into Bayesian probability forms. Dalla Valle (2014 and 2017a)
                  and  Dalla  Valle  and  Kenett  (2015)  introduced  an  innovative  approach  to
                  integrate survey data with official statistics data based on calibration using
                  copulas and nonparametric Bayesian Networks (BNs). For an overview about
                  copulas and their applications to finance, see Dalla Valle (2017b and 2017c)
                  and references therein.  For an introduction to BNs see, for example, Pearl
                  (2009), Jensen (2001), Ben Gal (2007), Koski and Noble (2009) and Pourret et
                  al. (2008).  In this paper, we propose a novel methodology that calibrates social
                  media  information  with  online  review  data  via  resampling  and  performs
                  integration using BNs. This approach allows businesses and organizations to
                  correctly analyze the sentiments of online users on social media, facilitating an
                  accurate evaluation of the satisfaction of their customers. Such an integration,
                  combining  different  overlapping  data  sources,  enhances  the  information
                  quality  of  the  data  analytic  work  in  four  dimensions:  Data  Structure,  Data
                  Integration, Temporal Relevance and Chronology of Data and Goal (Kenett
                  and Shmueli, 2016).

                  2.  Methodology
                      The  methodology  proposed  in  this  paper  aims  at  achieving  data
                  integration of traditional customer satisfaction survey data with social media
                  data via resampling using BNs, expanding the approach presented in Dalla
                  Valle and Kenett (2015). We perform data integration emphasizing blog-type
                  data,  which  is  a  big  data  environment  source.  However,  our  approach  is
                  scalable to other social media  and big data sources. As mentioned above,
                  properly  handling  data  integration  is  a  key  dimension  in  achieving  high
                  information quality (Kenett and Shmueli, 2016). The proposed data integration
                  methodology  aggregates  customer  survey  data  with  information  extracted
                  from social media, performing calibration of different datasets. The idea is in
                  the  same  spirit  of  external  benchmarking  used  in  small  area  estimation
                  (Pfeffermann,  2013).  In small  area  estimation  benchmarking  robustifies  the
                  inference by forcing the model-based predictors to agree with a design-based
                  estimator. Similarly, our methodology is based on qualitative data calibration
                  performed  via  resampling,  where  the  variables  levels  are  balanced  and
                  customer survey estimates are updated to agree with more timely social media
                  data estimates.  Calibration is implemented by altering the class distribution
                  of customers’ reviews in one of the datasets to obtain a re-balanced sample,
                  which reflects the distribution of the second dataset.



                                                                     77 | I S I   W S C   2 0 1 9
   83   84   85   86   87   88   89   90   91   92   93