Page 213 - Special Topic Session (STS) - Volume 4
P. 213
STS580 Vassilis P. P.
for more accurate, fast and intelligent methodological frameworks from the
ML perspective. Furthermore, clinical data follows the Big Data era offering
data with high diversity since these they come from different subareas [4].
Also, the majority of clinical data has high dimensionality due to a limited
number of patients (small/large n, large p). However, most computational
tools can handle data with large n and small p, since in high-dimensional data
there exists the “curse of dimensionality” phenomenon [5].
Also, the integration of big biomedical data and advanced computational
tools can contribute in healthcare fraud detection. The frauds in healthcare are
classified in three main pillars related to health insurance, drug and medical.
Healthcare fraud is a field with high impact since several people suffer
financially with indicative examples the insurance holder who have to pay
higher expenses while she/he receives reduced coverage, the business who
pay increasing amounts for employer healthcare, increasing cost of doing
business, clinics that charge patients for their services or charge services that
should be covered by the state and so on. Indicatively, the World Health
Organization (WHO) has estimated recently that every year the state is lost the
7.3% of the annual healthcare expenditure (around $470 billion) to healthcare
fraud annually [6]. In this study, we utilized clinical data from National
Organization for the Provision of Health Services of Greece, focusing in
investigating the Clinics behavior with respect to their hospital expenditure.
Our analysis is based on t-SNE and Density Peak, two well-established ML
tools for data visualization and clustering respectively.
2. Machine Learning approaches in Healthcare Fraud detection
Machine learning (ML) approaches can tackle part of the complexity of
fraud detection since the digitalization of health care information offers more
data enabling robust Data Mining and ML frameworks [7]. These methods are
classified into three categories as supervised, unsupervised learning and
reinforcement learning. Briefly, the first category is the process where the
algorithm constructs a function that represents given inputs (training set) at
known desired outputs, with the ultimate goal of generalizing this function
and for inputs with unknown output. It is used in real word problems related
to classification, prediction and data interpretation. Unsupervised Learning is
the process where the algorithm constructs a model for a set of inputs in the
form of observations without knowing the desired outputs. We have no
knowledge of the true label of data in order to compare its efficiency, as we
can in previous model. It is used in real word problems related to data
clustering and association analysis. The latter category concerns methods
which learn a strategy of actions through direct interaction with the
environment. It is used in planning problems such as robot mobility control or
functions optimization in industries.
202 | I S I W S C 2 0 1 9