Page 368 - Special Topic Session (STS) - Volume 2
P. 368
STS500 Fauzana I. et al.
strategies to cater this matters. The Robot will be simulating to be declared as
a browser and being configured to adopt Security Socket Layer (SSL). The
characteristics of the website have been detected whether it use a captcha,
redirection, etc. The website that had been crawled will be randomly accessed
using multiple public Internet Protocol (IP). The crawling period also being
adjusted accordingly. On top of that, DOSM also implementing The Onion Ring
(TOR) based on Deep Web approach and Proxy implementation. All those
strategies will be implemented as integrated algorithm in the Crawler
scheduler.
Since there are risks resulted from target website layout changes and also
security deployed in the target website, a monitoring agent was used to ensure
that all components in the crawler engine runs well. Monitoring agent will send
alert and notification to us when there are issues such as no data captured from
specific website. Our developer has to check and analyses the cause of the issue
and doing problem solving as soon as possible. It was highly noted that the
data crawling and scraping are now running heavily for all the 50 sources. There
is no doubt that by having native programming to performing all the task, a
highly maintenance are needed by skillful and experience technical person.
DOSM’s Technical Person with collaboration from industry’s personnel are
monitoring and maintaining crawling and scraping processes with 10 person
are working at one shift time.
3. Results
All processes above resulted in the development of Price Intelligence
module in DOSM. There are a number of 100 selected items in the visualisation
of this module. The items among the most highly consumed by the households
in the country. Among the items that take into consideration are of fruits,
vegetables, fish, seafood and food away from home. Among the expected
output from the Price Intelligence are price and changes comparison between
online and existing data, average price by Malaysia, state, strata, most
expensive/cheapest price by item, by location. The suggestion of visualisation
for data survey are Price frequency, Product price distribution by CPI category,
Trend of average price by category, Trend of average price by state, Price
distribution by data type, Average price by state and Price mode. The
suggestion of visualisation for online data are Price frequency from item at six
digit Classification of Individual Consumption According to Purpose (COICOP)
code, Trend of average price by outlet category, Distribution of average price,
Trend of Average Price by area, Price Distribution by state, Average price by
state and Price mode. In the data visualisation, data can be filtered into 2, 4 and
6 digits of CPI. The value creation of this Price Intelligence platform is to obtain
a holistic information pertaining to online prices as well as offline prices.
357 | I S I W S C 2 0 1 9