Page 217 - Special Topic Session (STS)

Page 217 - Special Topic Session (STS) - Volume 4

P. 217

STS580 Vassilis P. P.

Figure 4: 2-dimensional visualization of the statistics based constructed dataset using PCA (left).
2-dimensional visualization of the complete dataset using PCA (right).

deviation of the expenditure, the count of incidents and the total expenditure
(as a sum for this period). To visually investigate the structure of the resulting
dataset we employ a 2 dimensional visualization using Principal Component
Analysis (PCA).
As shown in Figure 4(left) although we can identify some outliers, a clear
pattern is not available. Still critical information such as how are Clinics
associated with different types of DRGs is missing. We already know from
Figure 1 that the average expenditure varies between the Clinics, an effect that
is probably caused by the variation in the types of DRGs that each Clinic deals
with. To investigate this further we need to take into account whether a
particular Clinic focuses on specific DRG types, for example a Clinic with high
expenditure rate may deal with DRGs that are significantly more costly than
others. As such we reconstruct the data incorporating information from types
of DRG per Clinic. The newly generated variables are dummy variables
containing the count of each DRG for each corresponding Clinic Code which
is combined with the aforementioned statistics and normalized accordingly. In
Figure 4(right) the 2-dimensional visualization using PCA is illustrated where
we observe increased variability.
Subsequently, we may employ more advanced visualization tools for
further investigation. In Figure, 5 we employ the popular t-SNE methodology
for visualization. The two dimensional embedding is presented along with a
cluster label denoted by a different color. This has been retrieved by applying
the k-means algorithm to the original input (before the dimensionality
reduction). It is shown that the clustering result fits very well the resulting
visualization discovering clear patterns in the dataset for different values of k.

206 | I S I W S C 2 0 1 9

212 213 214 215 216 217 218 219 220 221 222