Page 365 - Special Topic Session (STS) - Volume 2
P. 365
STS500 Fauzana I. et al.
From Price Acquisition to Price Lake, one of the many challenges pertaining
to Big Data is that to make the data structured so that from there, analysis can
be done. Data scraped online from the internet involves some online sources.
The challenge is that some online sources provide data at the national level
and each and every sources has different item category. Another source of
data is from data processing system of the Consumer Price Index in DOSM.
This type of data can provide data up to the state level as well as the location
level. The latter means that data from the survey that was conducted, put into
ftp and then into the Price Lake. At the same time, data from online (internet)
also will be into the Price Lake. This is what it means with the acquisition of
the data.
Online Price Data Acquisition
Instead of traditional way of collecting price data, web crawling and
scraping approach are now becoming popular way of price data acquisition
but when it comes to technology, we always know that there is always a price
for it. Not only in terms of the platform and the robot aspects, but the
challenges in the dynamism of the online sources itself.
Online Price Data Acquisition Architecture
In the general perspective architecture, DOSM’s data acquisition
components are described in diagram below. The sources to the internet are
from both online government agencies and online retailers.
Figure 1: Online Price Data Acquisition Architecture
The main function of this data acquisition is to collect and extract data from
various sources. DOSM are now crawling more than 50 websites with 22 of
them are from e-commerce websites and are the biggest web scrapper among
354 | I S I W S C 2 0 1 9