Page 125 - Contributed Paper Session (CPS) - Volume 6
P. 125
CPS1847 Shariful I.
4.2 Open source Big Data Analysis Platforms and Tools
1 Hadoop
Without Hadoop no one can talk about big data. The Apache distributed
data processing software is so pervasive that often the terms ‘Hadoop’ and
‘big data’ are used synonymously. The Apache distributed data processing
software is so pervasive that often that often the terms ‘Hadoop and ‘Big Data’
are used synonymously. The Apache Foundation also sponsors a number of
related projects that extend the capabilities of Hadoop, and many of them are
mentioned below. In addition, numerous vendors offer supported versions of
Hadoop and related technologies. Operating system: Windows,
Linux, OS X,
2 MapReduce
Originally developed by Google, the Mapreduce website describes it as “a
programming model and software framework for writing applications that
rapidly process vast amounts of data in parallel on large clusters of compute
nodes”. It’s used by Hadoop, as well as many other data processing
applications. Operating System: OS Independent.
3 Gridgrain
Gridgrain offers an alternative to Hadoop’s Mapreduce that is compatible
with the Hadoop Distributed file system. It often in memory processing for fast
analysis of real time data. One can Download the open source version from
GitHub or purchase a commercially supported version from the link in
operating System: Windows, Linux, OS X.
4 HPCC
Developed by LexisNexis Risk Solutions, HPCC is short for "high
performance computing cluster." It claims to offer superior performance to
Hadoop. Both free community versions and paid enterprise versions are
available. Operating
System: Linux.
5 Storm
Owned by Twitter, Storm offers distributed real-time computation
capabilities and is often described as the "Hadoop of real-time." It's highly
114 | I S I W S C 2 0 1 9