Page 90 - Invited Paper Session (IPS) - Volume 1
P. 90

IPS98 Luciana D. V. at al.
                     customer satisfaction. However, the presence of unbalanced samples can
                     affect the correct assessment and evaluation of customer satisfaction and
                     may  lead  to  misleading  conclusion.  Data  integration,  implemented  by
                                                                                       SM
                     rebalancing  the  unbalanced  levels  of  D   with  the  levels  of  D   (or
                                                              SU
                     viceversa), allows us to accurately analyze customer satisfaction.
                  2)  Identification of the calibration link. In the second phase a calibration link,
                     in the form of one or more unbalanced key variables, is identified between
                     customer  survey  and  social  media  data.  Denoting  with  (x ,  y )  the
                                                                                  SU
                                                                                      SU
                                            SM
                                   SU
                                                                      SM
                     variables  of  D   and  (x ,  y )  the  variables  of  D ,  then  let  y   be  the
                                                                                    SU
                                                 SM
                                                                              SM
                                        SU
                                                SM
                     calibration link of D  and y  be the calibration link of D . We suppose
                                                                        SU
                     that calibration links are unbalanced variables, with y  taking values in the
                                                          SU
                                                                                           SU
                                                   SU
                     categorical  domain  Y =  {Ymin ,  Ymaj },  with  proportions  p =  {pmin ,
                                           SU
                                                                                  SU
                                                                                          SM
                                                                                   SM
                                                     SM
                                 SM
                                               SM
                        SU
                                                                           SM
                                       SM
                     pmaj }, and y  in Y = {Ymin , Ymaj }, with proportions p = {pmin , pmaj },
                               SU
                     where Ymin  and Ymin  are the minority classes and Ymaj  and Ymaj  the
                                                                                       SM
                                          SM
                                                                            SU
                     majority classes of the interview- and blog-based surveys. Calibration links
                     can  be  target  variables  expressing  overall  satisfaction  or  can  be  other
                     variables influencing the overall satisfaction.
                  3)  Performing calibration. In the last phase calibration is performed by suitably
                     resampling the datasets, based on the distribution of the calibration link
                     variables. In this phase, one of the dataset, for example D , is rebalanced
                                                                             SU
                     following  the  resampling  approach  described  above,  until  p SU    p .
                                                                                          SM
                                                           SU
                     Therefore, a new rebalanced dataset D * with the desired proportions of
                     the calibration link variable will be generated. Similarly, calibration can be
                                     SM
                                                                                 SM
                     performed on D , obtaining the new rebalanced dataset D *. BNs are
                                                                            SM
                                                                   SU
                     then  updated  for  the  re-balanced  datasets  D *  or  D *,  allowing  the
                     calibrated information to be propagated to achieve data integration. This
                     approach will allow us to properly analyze customer satisfaction surveys
                     and to achieve the goal of accurately identifying pockets of dissatisfaction
                     and areas of excellence within an organization.

                  3.  Result
                      We illustrate the application of the methodology by integrating airport
                  passengers’ data collected via interview-based survey with data extracted from
                  an online review website. The context of this example is an analysis focused
                  on improving the Temporal Relevance of a customer satisfaction survey by
                  linking its results to online reviews that are continuously updated. The data
                  integration  methodology  described  here  provides  information  to  decision
                  makers that is both up to date and comprehensive. In this sense, the Data
                  Integration  supports  proper  Chronology  of  Data  and  Goal.  The  example
                  therefore enhances the information quality in four of the InfoQ dimensions:
                  Data Structure, Data Integration, Temporal Relevance and Chronology of Data
                  and Goal.
                                                                     79 | I S I   W S C   2 0 1 9
   85   86   87   88   89   90   91   92   93   94   95