Page 91 - Invited Paper Session (IPS) - Volume 1
P. 91
IPS98 Luciana D. V. at al.
The first dataset we analyze is a subset of the 2016 customer survey
1
administered to the passengers of San Francisco International Airport (SFO) .
The passenger dataset contains information pertaining to customer
demographics and satisfaction with airport facilities, services, and initiatives.
The data was collected in May 2016 through interviews with 3,087 customers
in each of SFO's terminals and boarding areas. Customers were asked to rate
the airport in several categories, including cleanliness ratings. Additional data
collected include customers’ income, mode of arrival to the airport, travel style,
and various other categories. The SFO dataset comprises demographic and
satisfaction variables, including a variable expressing customers’ overall
satisfaction. The satisfaction variables included in the SFO dataset express the
passengers’ judgements on a five-point scale. For comparison purposes, we
transformed the original customers’ ratings into dichotomous variables. The
variables were dichotomized following two different schemes. The first of
these schemes is called BOT1+2 and it is constructed by aggregating
customers who responded ‘1’ or ‘2’ (corresponding to extreme dissatisfaction
and dissatisfaction, respectively). The second scheme is called TOP5 and
identifies customers who responded ‘5’ (corresponding to extremely satisfied)
on the five-point scale. BOT1+2 is very effective in identifying pockets of
dissatisfaction and areas of improvements, while TOP5 emphasizes areas of
excellence. For more on statistical analyses using the two dichotomizing
schemes see Kenett and Salini (2011).
The second dataset, that we named Skytrax dataset, contains information
extracted from the reviews published online by passengers of the SFO airport .
2
For comparative purposes, only recent reviews of SFO passengers were
analyzed. The dataset includes demographic and satisfaction variables, with
judgements on individual characteristics and on the airport as a whole. For
the sake of comparison, we applied the BOT1+2 and TOP5 dichotomization
schemes to the Skytrax satisfaction variables.
After transforming the original data, we applied the three phases of the
data integration methodology described in Section 2 to the SFO customer
survey and to the Skytrax social media datasets using the BOT1+2 as well as
the TOP5 dichotomization. Initially, from SFO as well as Skytrax, two new
datasets were generated according to the BOT1+2 and TOP5 dichotomization
schemes. Then, the data integration methodology was applied twice: once to
the BOT1+2 datasets and once to the TOP5 datasets, to illustrate the use of
different calibration functions.
1 The data are publicly available on the website http://www.flysfo.com/media/customer-survey-
data
2 The data are publicly available on the website http://www.airlinequality.com
80 | I S I W S C 2 0 1 9