Page 130 - Special Topic Session (STS) - Volume 2
P. 130
STS466 Md. Khadzir S.A. et al.
data. The unstructured data can be in the form of free-text, visual, audio and
machine generated data. Unstructured data does not have predetermined
values and not stored in an organized manner to be analysed by a
conventional data warehouse. Therefore, other techniques need to be applied.
MyHarmony aims to address this and be included as part of MyHDW.
There were three (3) major deliverables in the conceptual stage. The first
part refers to the development and implementation of health terminology
standards, namely SNOMED CT, which will be the knowledge bases for
MyHarmony. The second part was harmonization of the medical terminology
to SNOMED CT terms by way of mapping. The last part was about the
development and implementation of MyHarmony to show that the application
can codify relevant terms in free-text using Natural Language Processing (NLP)
technique. The SNOMED CT codified data can then be analysed for
information generation.
2. Methodology
The development was first started in 2014 with the development of
Cardiology Refset. Cardiology Refset was the terminology reference for the
MyHarmony engine during the harmonisation/mapping and codification
process. Cardiology Refset (version 1.0) was completed and released in 2014.
It is a simple reference set [1] containing about 600 terms related to
Cardiology speciality including signs and symptoms, diagnoses, procedures,
body structures, medical devices and medications. It was delivered in time to
be tested on MyHarmony standalone system to generate National
Cardiovascular Disease (NCVD) registries.
The draft Refset and method was presented during IHTSDO meetings and
Expo in succession on September 2013, October 2013, and April 2014 to gain
feedback from experts in the international community. The finalized method
was presented during SNOMED CT Expo, October 2014 [2].
The Cardiology Refset was then expanded to include all cardiology related
terms and Cardiology Refset v1.1 was completed in July 2016 containing more
than 6000 concepts. First, more than 300,000 SNOMED CT concepts (Fully
Specified Names) were extracted and reviewed by PIK using eyeballing
technique. About 12,000 concepts that were believed to be related to
Cardiology specialty was given to the clinicians for review. The clinicians
reduced the number of concepts to about 6,000. Additionally, the Refset
included local terms and common abbreviations which were mapped to
existing concepts.
Next, the team utilise MyHarmony to generate the analysis. There were 4
main functions in MyHarmony:
119 | I S I W S C 2 0 1 9