Page 365 - Special Topic Session (STS) - Volume 2
P. 365

STS500 Fauzana I. et al.
                From Price Acquisition to Price Lake, one of the many challenges pertaining
            to Big Data is that to make the data structured so that from there, analysis can
            be done. Data scraped online from the internet involves some online sources.
            The challenge is that some online sources provide data at the national level
            and each and every sources has different item category. Another source of
            data is from data processing system of the Consumer Price Index in DOSM.
            This type of data can provide data up to the state level as well as the location
            level. The latter means that data from the survey that was conducted, put into
            ftp and then into the Price Lake. At the same time, data from online (internet)
            also will be into the Price Lake. This is what it means with the acquisition of
            the data.

            Online Price Data Acquisition
                Instead  of  traditional  way  of  collecting  price  data,  web  crawling  and
            scraping approach are now becoming popular way of price data acquisition
            but when it comes to technology, we always know that there is always a price
            for  it.  Not  only  in  terms  of  the  platform  and  the  robot  aspects,  but  the
            challenges in the dynamism of the online sources itself.

            Online Price Data Acquisition Architecture
                In  the  general  perspective  architecture,  DOSM’s  data  acquisition
            components are described in diagram below. The sources to the internet are
            from both online government agencies and online retailers.
















                           Figure 1: Online Price Data Acquisition Architecture
                The main function of this data acquisition is to collect and extract data from
            various sources. DOSM are now crawling more than 50 websites with 22 of
            them are from e-commerce websites and are the biggest web scrapper among


                                                               354 | I S I   W S C   2 0 1 9
   360   361   362   363   364   365   366   367   368   369   370