Page 124 - Special Topic Session (STS) - Volume 2
P. 124
STS466 Sasongko Y.
Facebook, Twitter, etc. It has been proven as robust platform which able to
manage massive data processing.
Scalability is the ultimate benefit of BDA, be it from storage, memory and
CPU processing perspective. Aggregating resources into single cluster enable
user to perform a total in-memory processing as machines can be added to the
cluster easily. For example, to perform in-memory 5TB data size, we can deploy
20 machines with 256GB size of RAM each. BDA combine the power of RAM
from those 20 machines virtually as single memory for processing purpose. The
physical distribution of data into each of RAM will be done by the BDA software
without complex setting and configuration by the user.
2.3 Opensource for Cost Efficiency
BDA also offer great benefits in term of investment for organization. The
main component of BDA is opensource, developed under Apache community.
It license-free which allows organization to use with no limit of capacity. Since
it is opensource, there are continuous development for new features and
architecture by the community. Various forum also has been activated for
community to discuss any issues and potential development which can be
used as references.
3. Results
We have successfully established BDA architecture focusing in two
domains, Media Intelligence and Marketplace Intelligence. Both case studies
are leveraging on unstructured data captured from internet using crawling
methodology.
3.1 Media Intelligence
3.1.1 Public Perception Matters
Media has the capability to affect the world by playing around with public
opinion be it from people on the streets, government officials and decision
makers. It is common that financial investor evaluating company
performance and credibility can be swayed by positive or negative perception
about the said companies from the media, which directly will impact its
economy standing. Changes in market price are likely the responses to some
events reported in the news. It is not impossible for investors to act based on
these perceptions that they read in newspaper. In the long run, perception of
issues can cause massive chaos when it is not well managed.
3.1.2 Natural Language Processing
All data captured from the media are in the format of narrative text. It is
written in human language based on specific grammatical rules. We use
Natural Language Processing (NLP), a sub-field of Artificial Intelligence
113 | I S I W S C 2 0 1 9