Page 362 - Special Topic Session (STS) - Volume 2
P. 362
STS500 Fauzana I. et al.
Use of web-scraping for the compilation of
Consumer Price Index: Malaysia's experience
Fauzana Ismail, Fuziah Md Amin, Wan Mohd Haffiz Mohd Nasir
Department of Statistics Malaysia
Abstract
This paper describes how Malaysia, like many other countries, is inevitably
involves in a phenomena called big data. Big data apparently involves a
voluminous amount of data, where the data types and structures are complex
and there is a speed of a new data creation and growth. Nowadays, there is
an increasing amount of digital data that flooding from various sources such
as from the web, email, videos and social network communication. As such,
big data with its main focus on the unstructured data requires a set of
techniques and technologies with new forms of integration to reveal insights
from datasets that are diverse, complex, and of a massive scale. The definition
of Big Data Analytics (BDA) and its importance are also discussed in the paper.
BDA is the process of examining large amounts of data to uncover hidden
patterns, correlations and other insights and is very pertinent especially in the
business aspect. Most National Statistical Office Big Data Analytics involve the
price statistics. In Malaysia case, it is found that most data are already online.
Therefore, in promoting E-Commerce in Malaysia, the traditional collection
method that is conducted on the face-to-face basis is in the need of a
modernisation so that it is more real time and more efficient. In addition to
this, there will be a reduction especially in terms of cost and burden to
respondents. Inasmuch, one of the many StatsBDA initiatives in Malaysia, as
currently undergoing in the Department of Statistics Malaysia is called the
Price Intelligence Module. The initiative is to create an internal portal for Price
Intelligence (PI). It involves a modernisation of data collection tools for
improving the quality of Consumer Price Index (CPI) in Malaysia. The
modernization of data collection mainly scraping consists of the adoption of
web techniques to scrape price data from related website for CPI compilation.
The idea is to crawl data from hypermarkets and to be collected in the big
data project. As a result of this, analysis, data visualisation, data mining, reports
and dashboards as well as alert can be conducted. The Price Intelligence
Module in the Department of Statistics Malaysia encompasses of Price
Frequency, Distribution of Average Price by Strata Category, Trend of Average
Price by Strata Category, Trend of Average Price by State, Price Distribution by
State, Average Price by State and Price Mode. The paper concludes with the
challenges and journey of the Price Intelligence that Malaysia had experienced
351 | I S I W S C 2 0 1 9