Page 123 - Special Topic Session (STS) - Volume 2
P. 123

STS466 Sasongko Y.
            2.  Methodology
                In the principle, Big Data can be defined as an all-encompassing term for
            any collection of data sets so large and complex that it become difficult to
            process  using  traditional  data  processing  applications.  Of  course,  the
            definition has been updated from time to time according to the development
            of the technology supporting the Big Data ecosystem.

            2.1 A Need for Massive Processing
                The  original  intention  in  big  data  research  is  the  need  of  parallel
            processing.  Based  on  computer  architecture,  the  components  performing
            processing are CPU and RAM, while data is stored in disk. The technology in
            CPU and RAM development is not as fast as technology in developing disk for
            storage.  As  result  we  can  see  the  size  that  disk  has  is  growing  very  fast
            compare the speed of CPU and the size of RAM.
                A computing process is started with transferring some data from disk into
            memory,  and  then  CPU  will  perform  the  processing.  There  is  an  issue  of
            performance with that kind of approach as the data transferred into RAM will
            be limited following to the size of the RAM itself. So, instead of performing
            total  in-memory  processing,  the  platform  is  only  able  to  perform  batch
            processing. Some of processes are still acceptable to be performed using batch
            processing. Unfortunately, this kind of process is not suitable for some analytics
            processing, such Machine Learning as part of Artificial Intelligence framework.
                The rise of Big Data Analytics (BDA) contributes significantly for Artificial
            Intelligence (AI) research development. AI works based on learning algorithm
            whereby  training  data  set  is  established  for  the  machine  to  learn  and  to
            understand the context of the knowledge. Some of popular algorithm in AI is
            Neural Network. This algorithm works by loading all data into memory and
            perform repetitive process in order to establish the model. The quality of the
            model will be based on the volume and variety of data, and the number of
            repetitions.
                This methodology requires a massive parallel processing to achieve the
            best  accuracy  model.  It  is  very  costly  to  perform  such  processing  using
            traditional technology framework and infrastructure. It is also the answer why
            neural network was not popular, until the development of Big Data Analytics
            has reached the maturity from the perspective of implementation.

            2.2 Scalable Distributed Resources
                BDA offers a scalable resources architecture. It allows user to have multiple
            resources from different machines to be aggregated as a single processing
            cluster. Scalability is the key of BDA implementation which has been adopted
            by many organizations especially the Social Media Provider such as Google,



                                                               112 | I S I   W S C   2 0 1 9
   118   119   120   121   122   123   124   125   126   127   128