Page 212 - Special Topic Session (STS) - Volume 4
P. 212
STS580 Vassilis P. P.
Healthcare fraud detection using machine
learning approaches
Vassilis P. Plagianakos
Department of Computer Science and Biomedical Informatics University of Thessaly, Greece
Hellenic National Organization for the Provision of Health Services (E.O.P.Y.Y.)
Abstract
Biomedicine is undergoing a revolution driven by the explosion of biomedical
data, as a result, Big Data has shifted the biomedical informatics research from
case-based to data-driven-based studies. Data from hospitals and clinics form
a very large data set (Big Data) since they have monthly submissions, posing
several challenges under the perspective of Big Data mining and analysis. On
parallel, fraud detection in the healthcare domain is an important issue, since
it has considerably inflated losses for individuals, entities, and governments.
Hence, there is an imperative need for new computational tools able to
effectively detect fraud by exploiting the potential of Big Data. Machine
Learning (ML) approaches can shed more light in healthcare fraud since they
can cope with these challenges. In this study, we utilized clinical data from the
National Organization for the Provision of Health Services of Greece, focusing
on investigating the Clinics behavior with respect to their hospital expenditure.
The core of our analysis is based on t-SNE and Density Peak, two well-
established ML tools for data visualization and clustering respectively. Our
results show that ML approaches can contribute to healthcare fraud detection
and interpretation.
Keywords
Fraud Detection; Machine Learning; Visualization; Big Data
1. Introduction
We live in the “Big Data” era, where there is a great potential for
revolutionizing the entire healthcare domain [1]. Biomedical data generation
is increased constantly through the recent advancements in biomedicine field
creating a large pool of heterogeneous increased with exponentially
increasing rate. This data volume poses several challenges under the
perspective of Big Data analysis and visualization. Given the fact that these
data have ultra-high dimensionality and complexity, it is obvious that we need
computational tools to cope with these challenges. Machine Learning (ML)
techniques are among the best approaches to tackle these limitations [2,3].
Nevertheless, data generated in the health domain are too big, too complex
and their production rate too fast for the healthcare providers to process and
interpret with the existing tools. Hence, there is an imperative and urgent need
201 | I S I W S C 2 0 1 9