Page 362 - Special Topic Session (STS) - Volume 2
P. 362

STS500 Fauzana I. et al.



                                 Use of web-scraping for the compilation of
                                Consumer Price Index: Malaysia's experience
                         Fauzana Ismail, Fuziah Md Amin, Wan Mohd Haffiz Mohd Nasir
                                          Department of Statistics Malaysia

                  Abstract
                  This paper describes how Malaysia, like many other countries, is inevitably
                  involves  in  a  phenomena  called  big  data.  Big  data  apparently  involves  a
                  voluminous amount of data, where the data types and structures are complex
                  and there is a speed of a new data creation and growth. Nowadays, there is
                  an increasing amount of digital data that flooding from various sources such
                  as from the web, email, videos and social network communication. As such,
                  big  data  with  its  main  focus  on  the  unstructured  data  requires  a  set  of
                  techniques and technologies with new forms of integration to reveal insights
                  from datasets that are diverse, complex, and of a massive scale. The definition
                  of Big Data Analytics (BDA) and its importance are also discussed in the paper.
                  BDA is the process of examining large amounts of data to uncover hidden
                  patterns, correlations and other insights and is very pertinent especially in the
                  business aspect. Most National Statistical Office Big Data Analytics involve the
                  price statistics. In Malaysia case, it is found that most data are already online.
                  Therefore, in promoting E-Commerce in Malaysia, the traditional collection
                  method  that  is  conducted  on  the  face-to-face  basis  is  in  the  need  of  a
                  modernisation so that it is more real time and more efficient. In addition to
                  this,  there  will  be  a  reduction  especially  in  terms  of  cost  and  burden  to
                  respondents. Inasmuch, one of the many StatsBDA initiatives in Malaysia, as
                  currently undergoing in the Department of Statistics Malaysia is called the
                  Price Intelligence Module. The initiative is to create an internal portal for Price
                  Intelligence  (PI).  It  involves  a  modernisation  of  data  collection  tools  for
                  improving  the  quality  of  Consumer  Price  Index  (CPI)  in  Malaysia.  The
                  modernization of data collection mainly scraping consists of the adoption of
                  web techniques to scrape price data from related website for CPI compilation.
                  The idea is to crawl data from hypermarkets and to be collected in the big
                  data project. As a result of this, analysis, data visualisation, data mining, reports
                  and  dashboards  as  well  as  alert  can  be  conducted.  The  Price  Intelligence
                  Module  in  the  Department  of  Statistics  Malaysia  encompasses  of  Price
                  Frequency, Distribution of Average Price by Strata Category, Trend of Average
                  Price by Strata Category, Trend of Average Price by State, Price Distribution by
                  State, Average Price by State and Price Mode. The paper concludes with the
                  challenges and journey of the Price Intelligence that Malaysia had experienced




                                                                     351 | I S I   W S C   2 0 1 9
   357   358   359   360   361   362   363   364   365   366   367