Page 86 - Invited Paper Session (IPS) - Volume 1
P. 86
IPS98 Luciana D. V. at al.
Data integration of unbalanced social media and
survey information via calibration
2
1
Luciana Dalla Valle , Ron Kenett
University of Plymouth, Plymouth, UK
KPA Ltd, Neaman Institute, Technion and Hebrew University, Raanana, Israel
Abstract
We live in an age of unprecedented amounts of information, generated in
every sector, including business, government and health care, delivered at
high speed and available in a wide variety of forms and formats. This data
may come from many different sources, such as social media posts, digital
pictures and videos, cell phone GPS, purchase transaction records and signals
sensors used to gather climate information. This information - high volume,
diverse and fast - is what is called Big Data. Social media are amongst the
most prolific generators of big data and allow billions of people all around the
world to daily interact, post and share contents and give spontaneous
feedback on specific topics. As opposed to traditional media such as
newspapers, books and television, social media is freely accessible, which
means everyone can publish content and control how the information is
generated and shared. Through social media, people express their opinions
and sentiments towards specific topics, products and services. The ability to
harness big data and social media data is an opportunity to obtain more
accurate analyses and to improve decision-making in industry, government
and many other organizations. However, handling big data may be
challenging and proper data integration is a key dimension in achieving high
information quality. We propose a novel approach to data integration that
calibrates online generated big data with interview-based customer survey
data. A common issue of customer surveys is that responses are often overly
positive, making it difficult to identify areas of weaknesses in organizations.
On the other hand, online reviews are often overly negative, hampering an
accurate evaluation of areas of excellence. The proposed methodology
calibrates the levels of unbalanced responses in different data sources via
resampling and performs data integration using Bayesian Networks to
propagate the new re-balanced information. We show, with a case study
example, how the novel data integration approach allows businesses and
organizations to get a bias-corrected appraisal of the level of satisfaction of
their customers. The application is based on the integration of online data of
review blogs and customer satisfaction surveys from the San Francisco airport.
We illustrate how this integration enhances the information quality of the data
75 | I S I W S C 2 0 1 9