Page 123 - Special Topic Session (STS) - Volume 2
P. 123
STS466 Sasongko Y.
2. Methodology
In the principle, Big Data can be defined as an all-encompassing term for
any collection of data sets so large and complex that it become difficult to
process using traditional data processing applications. Of course, the
definition has been updated from time to time according to the development
of the technology supporting the Big Data ecosystem.
2.1 A Need for Massive Processing
The original intention in big data research is the need of parallel
processing. Based on computer architecture, the components performing
processing are CPU and RAM, while data is stored in disk. The technology in
CPU and RAM development is not as fast as technology in developing disk for
storage. As result we can see the size that disk has is growing very fast
compare the speed of CPU and the size of RAM.
A computing process is started with transferring some data from disk into
memory, and then CPU will perform the processing. There is an issue of
performance with that kind of approach as the data transferred into RAM will
be limited following to the size of the RAM itself. So, instead of performing
total in-memory processing, the platform is only able to perform batch
processing. Some of processes are still acceptable to be performed using batch
processing. Unfortunately, this kind of process is not suitable for some analytics
processing, such Machine Learning as part of Artificial Intelligence framework.
The rise of Big Data Analytics (BDA) contributes significantly for Artificial
Intelligence (AI) research development. AI works based on learning algorithm
whereby training data set is established for the machine to learn and to
understand the context of the knowledge. Some of popular algorithm in AI is
Neural Network. This algorithm works by loading all data into memory and
perform repetitive process in order to establish the model. The quality of the
model will be based on the volume and variety of data, and the number of
repetitions.
This methodology requires a massive parallel processing to achieve the
best accuracy model. It is very costly to perform such processing using
traditional technology framework and infrastructure. It is also the answer why
neural network was not popular, until the development of Big Data Analytics
has reached the maturity from the perspective of implementation.
2.2 Scalable Distributed Resources
BDA offers a scalable resources architecture. It allows user to have multiple
resources from different machines to be aggregated as a single processing
cluster. Scalability is the key of BDA implementation which has been adopted
by many organizations especially the Social Media Provider such as Google,
112 | I S I W S C 2 0 1 9