Page 92 - Invited Paper Session (IPS) - Volume 1
P. 92

IPS98 Luciana D. V. at al.
                  3.1 Data Integration of BOT1+2 Datasets:
                     In  the  first  phase  of  data  integration,  we  analyzed  the  SFO  customer
                  satisfaction survey data with BNs, implemented using the GeNIe software V
                  2.1 (University of Pittsburgh, Pittsburgh, USA).
                     The data modelling phase consists in the construction of BNs for both the
                  SFO and the Skytrax datasets.
                     Then, we select the OVERALL variable as calibration link for the BOT1+2
                  dichotomized datasets. The percentage of dissatisfied passengers in the SFO
                  survey dataset is only 2%, while the same percentage in the Skytrax online
                  dataset is almost 50%. Therefore, the levels of OVERALL in the SFO survey
                  dataset need to be re-balanced by resampling, to make the distribution similar
                  to that of the Skytrax online dataset. The SFO customer survey dataset was
                  resampled, as explained in Section 2, using the R package ROSE (Lunardon et
                  al., 2014). The BN was updated via parameter learning and hence calibrated to
                  reflect the information contained in the online reviews. Figure 1 (left panel)
                  illustrates the BN of the BOT1+2 SFO customer satisfaction survey dataset,
                  after calibration of the OVERALL node via resampling. The distribution of the
                  overall satisfaction is now balanced, with a higher proportion of dissatisfied
                  customers, as appears in online reviews. This calibrated BN shows that the
                  percentages  of  passengers  who  are  dissatisfied  with  cleanliness,  walkways,
                  shopping areas and the free Wi-Fi are 19%, 23%, 33% and 14%, respectively.
                  These results highlight, much more clearly than those based on the original
                  unbalanced dataset, the weaknesses and corresponding areas of improvement
                  of the airport.
                     Skytrax dataset. However, there is an imbalance in its classes, since the
                  percentage of ‘excellent’ answers is only 24%. The same variable appears to
                  be  well-balanced  in  the  SFO  survey  dataset,  where  the  percentage  of
                  ‘excellent’  is  close  to  50%.  Therefore,  the  Skytrax  dataset  needs  to  be
                  resampled, in order to re-balance the distribution of QUEUING according to
                  the  distribution  of  the  SFO  survey  dataset.    In  order  to  re-balance  the
                  QUEUING variable, the Skytrax online reviews dataset was resampled, to reflect
                  the distribution of a similar variable (PASSTHRU) in the SFO customer survey
                  dataset. Calibration between the two datasets was performed and the BN of
                  the TOP5 Skytrax dataset was updated via parameter learning. Figure 1 (right
                  panel) illustrates the BN of the TOP5 Skytrax reviews social media dataset, after
                  calibration  of  the  QUEUING  node  via  resampling.  The  distribution  of
                  passengers’  satisfaction  with  queuing  is  now  balanced,  with  a  higher
                  proportion of extremely satisfied passengers, as appears in the SFO customer
                  survey dataset. This calibrated BN shows that the percentages of passengers
                  who are extremely satisfied with cleanliness, restaurants, shopping and seating
                  areas have increased and are equal to 44%, 37%, 34% and 14%, respectively.
                  In addition, the percentage of very satisfied passengers overall is 34%. These

                                                                     81 | I S I   W S C   2 0 1 9
   87   88   89   90   91   92   93   94   95   96   97