Page 91 - Invited Paper Session (IPS) - Volume 1
P. 91

IPS98 Luciana D. V. at al.
                The  first  dataset  we  analyze  is  a  subset  of  the  2016  customer  survey
                                                                                      1
            administered to the passengers of San Francisco International Airport (SFO) .
            The  passenger  dataset  contains  information  pertaining  to  customer
            demographics and satisfaction with airport facilities, services, and initiatives.
            The data was collected in May 2016 through interviews with 3,087 customers
            in each of SFO's terminals and boarding areas. Customers were asked to rate
            the airport in several categories, including cleanliness ratings. Additional data
            collected include customers’ income, mode of arrival to the airport, travel style,
            and various other categories. The SFO dataset comprises demographic and
            satisfaction  variables,  including  a  variable  expressing  customers’  overall
            satisfaction. The satisfaction variables included in the SFO dataset express the
            passengers’ judgements on a five-point scale. For comparison purposes, we
            transformed the original customers’ ratings into dichotomous variables. The
            variables  were  dichotomized  following  two  different  schemes.  The  first  of
            these  schemes  is  called  BOT1+2  and  it  is  constructed  by  aggregating
            customers who responded ‘1’ or ‘2’ (corresponding to extreme dissatisfaction
            and  dissatisfaction,  respectively).  The  second  scheme  is  called  TOP5  and
            identifies customers who responded ‘5’ (corresponding to extremely satisfied)
            on  the  five-point  scale.  BOT1+2  is  very  effective  in  identifying  pockets  of
            dissatisfaction and areas of improvements, while TOP5 emphasizes areas of
            excellence.  For  more  on  statistical  analyses  using  the  two  dichotomizing
            schemes see Kenett and Salini (2011).
                The second dataset, that we named Skytrax dataset, contains information
            extracted from the reviews published online by passengers of the SFO airport .
                                                                                      2
            For  comparative  purposes,  only  recent  reviews  of  SFO  passengers  were
            analyzed. The dataset includes demographic and satisfaction variables, with
            judgements on individual characteristics and on the airport as a whole.  For
            the sake of comparison, we applied the BOT1+2 and TOP5 dichotomization
            schemes to the Skytrax satisfaction variables.
                After transforming the original data, we applied the three phases of the
            data integration methodology described in Section 2 to the SFO customer
            survey and to the Skytrax social media datasets using the BOT1+2 as well as
            the  TOP5  dichotomization.  Initially,  from  SFO  as  well  as  Skytrax,  two  new
            datasets were generated according to the BOT1+2 and TOP5 dichotomization
            schemes. Then, the data integration methodology was applied twice: once to
            the BOT1+2 datasets and once to the TOP5 datasets, to illustrate the use of
            different calibration functions.



            1  The data are publicly available on the website http://www.flysfo.com/media/customer-survey-
            data
            2  The data are publicly available on the website http://www.airlinequality.com
                                                               80 | I S I   W S C   2 0 1 9
   86   87   88   89   90   91   92   93   94   95   96