Page 368 - Special Topic Session (STS) - Volume 2
P. 368

STS500 Fauzana I. et al.
                  strategies to cater this matters. The Robot will be simulating to be declared as
                  a  browser  and  being  configured  to  adopt  Security  Socket  Layer  (SSL).  The
                  characteristics of the website have been detected whether it use a captcha,
                  redirection, etc. The website that had been crawled will be randomly accessed
                  using multiple public  Internet Protocol  (IP).  The crawling  period  also  being
                  adjusted accordingly. On top of that, DOSM also implementing The Onion Ring
                  (TOR)  based  on  Deep  Web  approach  and  Proxy  implementation.  All  those
                  strategies  will  be  implemented  as  integrated  algorithm  in  the  Crawler
                  scheduler.
                     Since there are risks resulted from target website layout changes and also
                  security deployed in the target website, a monitoring agent was used to ensure
                  that all components in the crawler engine runs well. Monitoring agent will send
                  alert and notification to us when there are issues such as no data captured from
                  specific website. Our developer has to check and analyses the cause of the issue
                  and doing problem solving as soon as possible. It was highly noted that the
                  data crawling and scraping are now running heavily for all the 50 sources. There
                  is no doubt that by having native programming to performing all the task, a
                  highly maintenance are needed by skillful and experience technical person.
                  DOSM’s  Technical  Person  with  collaboration  from  industry’s  personnel  are
                  monitoring and maintaining crawling and scraping processes with 10 person
                  are working at one shift time.

                  3.  Results
                     All  processes  above  resulted  in  the  development  of  Price  Intelligence
                  module in DOSM. There are a number of 100 selected items in the visualisation
                  of this module. The items among the most highly consumed by the households
                  in  the  country.  Among  the  items  that  take  into  consideration  are  of  fruits,
                  vegetables,  fish,  seafood  and  food  away  from  home.  Among  the  expected
                  output from the Price Intelligence are price and changes comparison between
                  online  and  existing  data,  average  price  by  Malaysia,  state,  strata,  most
                  expensive/cheapest price by item, by location. The suggestion of visualisation
                  for data survey are Price frequency, Product price distribution by CPI category,
                  Trend  of  average  price  by  category,  Trend  of  average  price  by  state,  Price
                  distribution  by  data  type,  Average  price  by  state  and  Price  mode.  The
                  suggestion of visualisation for online data are Price frequency from item at six
                  digit Classification of Individual Consumption According to Purpose (COICOP)
                  code, Trend of average price by outlet category, Distribution of average price,
                  Trend of Average Price by area, Price Distribution by state, Average price by
                  state and Price mode. In the data visualisation, data can be filtered into 2, 4 and
                  6 digits of CPI. The value creation of this Price Intelligence platform is to obtain
                  a  holistic  information  pertaining  to  online  prices  as  well  as  offline  prices.



                                                                     357 | I S I   W S C   2 0 1 9
   363   364   365   366   367   368   369   370   371   372   373