Page 92 - Invited Paper Session (IPS) - Volume 1
P. 92
IPS98 Luciana D. V. at al.
3.1 Data Integration of BOT1+2 Datasets:
In the first phase of data integration, we analyzed the SFO customer
satisfaction survey data with BNs, implemented using the GeNIe software V
2.1 (University of Pittsburgh, Pittsburgh, USA).
The data modelling phase consists in the construction of BNs for both the
SFO and the Skytrax datasets.
Then, we select the OVERALL variable as calibration link for the BOT1+2
dichotomized datasets. The percentage of dissatisfied passengers in the SFO
survey dataset is only 2%, while the same percentage in the Skytrax online
dataset is almost 50%. Therefore, the levels of OVERALL in the SFO survey
dataset need to be re-balanced by resampling, to make the distribution similar
to that of the Skytrax online dataset. The SFO customer survey dataset was
resampled, as explained in Section 2, using the R package ROSE (Lunardon et
al., 2014). The BN was updated via parameter learning and hence calibrated to
reflect the information contained in the online reviews. Figure 1 (left panel)
illustrates the BN of the BOT1+2 SFO customer satisfaction survey dataset,
after calibration of the OVERALL node via resampling. The distribution of the
overall satisfaction is now balanced, with a higher proportion of dissatisfied
customers, as appears in online reviews. This calibrated BN shows that the
percentages of passengers who are dissatisfied with cleanliness, walkways,
shopping areas and the free Wi-Fi are 19%, 23%, 33% and 14%, respectively.
These results highlight, much more clearly than those based on the original
unbalanced dataset, the weaknesses and corresponding areas of improvement
of the airport.
Skytrax dataset. However, there is an imbalance in its classes, since the
percentage of ‘excellent’ answers is only 24%. The same variable appears to
be well-balanced in the SFO survey dataset, where the percentage of
‘excellent’ is close to 50%. Therefore, the Skytrax dataset needs to be
resampled, in order to re-balance the distribution of QUEUING according to
the distribution of the SFO survey dataset. In order to re-balance the
QUEUING variable, the Skytrax online reviews dataset was resampled, to reflect
the distribution of a similar variable (PASSTHRU) in the SFO customer survey
dataset. Calibration between the two datasets was performed and the BN of
the TOP5 Skytrax dataset was updated via parameter learning. Figure 1 (right
panel) illustrates the BN of the TOP5 Skytrax reviews social media dataset, after
calibration of the QUEUING node via resampling. The distribution of
passengers’ satisfaction with queuing is now balanced, with a higher
proportion of extremely satisfied passengers, as appears in the SFO customer
survey dataset. This calibrated BN shows that the percentages of passengers
who are extremely satisfied with cleanliness, restaurants, shopping and seating
areas have increased and are equal to 44%, 37%, 34% and 14%, respectively.
In addition, the percentage of very satisfied passengers overall is 34%. These
81 | I S I W S C 2 0 1 9