Page 90 - Invited Paper Session (IPS) - Volume 1
P. 90
IPS98 Luciana D. V. at al.
customer satisfaction. However, the presence of unbalanced samples can
affect the correct assessment and evaluation of customer satisfaction and
may lead to misleading conclusion. Data integration, implemented by
SM
rebalancing the unbalanced levels of D with the levels of D (or
SU
viceversa), allows us to accurately analyze customer satisfaction.
2) Identification of the calibration link. In the second phase a calibration link,
in the form of one or more unbalanced key variables, is identified between
customer survey and social media data. Denoting with (x , y ) the
SU
SU
SM
SU
SM
variables of D and (x , y ) the variables of D , then let y be the
SU
SM
SM
SU
SM
calibration link of D and y be the calibration link of D . We suppose
SU
that calibration links are unbalanced variables, with y taking values in the
SU
SU
SU
categorical domain Y = {Ymin , Ymaj }, with proportions p = {pmin ,
SU
SU
SM
SM
SM
SM
SM
SU
SM
SM
pmaj }, and y in Y = {Ymin , Ymaj }, with proportions p = {pmin , pmaj },
SU
where Ymin and Ymin are the minority classes and Ymaj and Ymaj the
SM
SM
SU
majority classes of the interview- and blog-based surveys. Calibration links
can be target variables expressing overall satisfaction or can be other
variables influencing the overall satisfaction.
3) Performing calibration. In the last phase calibration is performed by suitably
resampling the datasets, based on the distribution of the calibration link
variables. In this phase, one of the dataset, for example D , is rebalanced
SU
following the resampling approach described above, until p SU p .
SM
SU
Therefore, a new rebalanced dataset D * with the desired proportions of
the calibration link variable will be generated. Similarly, calibration can be
SM
SM
performed on D , obtaining the new rebalanced dataset D *. BNs are
SM
SU
then updated for the re-balanced datasets D * or D *, allowing the
calibrated information to be propagated to achieve data integration. This
approach will allow us to properly analyze customer satisfaction surveys
and to achieve the goal of accurately identifying pockets of dissatisfaction
and areas of excellence within an organization.
3. Result
We illustrate the application of the methodology by integrating airport
passengers’ data collected via interview-based survey with data extracted from
an online review website. The context of this example is an analysis focused
on improving the Temporal Relevance of a customer satisfaction survey by
linking its results to online reviews that are continuously updated. The data
integration methodology described here provides information to decision
makers that is both up to date and comprehensive. In this sense, the Data
Integration supports proper Chronology of Data and Goal. The example
therefore enhances the information quality in four of the InfoQ dimensions:
Data Structure, Data Integration, Temporal Relevance and Chronology of Data
and Goal.
79 | I S I W S C 2 0 1 9