Page 47 - Contributed Paper Session (CPS) - Volume 6
P. 47

CPS1490 Nehall Ahmed Farouk Mohamed



                           Big data predictive analytics using machine
                                   learning for official statistics
                                  Nehall Ahmed Farouk Mohamed
                      Central Agency for Public Mobilization and Statistics (CAPMAS), Egypt

            Abstract
            Now  one  of  the  most  argued  topic  in  the  statistical  field  is  big  data.
            Researches, data scientists, and statisticians worked to define it and evaluate
            the outcome of it over the statistical work on the different NSOs. Big data will
            not only enhance and improve the NSOs official statistical quality, but also
            help in predictive analytics in so many sectors. These predictive analytics will
            have a huge contribution in the national, international, and even the global
            development. As now it is not the time for knowing the current situation, but
            to know how the future will be. Big data predictive analytics has two main
            techniques based on the methodology, which are: regression techniques and
            machine  learning  techniques.  The  paper  starts  with  discussing  big  data
            projects,  applications,  and  the  challenges  that  appeared  according  to  the
            nature of big data. Then it focuses on big data analytics, especially predictive
            analytics. As it illustrate the challenges that result from the characteristics of
            big data in predictive analytics and propose methods to overcome it. Using
            the  methodologies  of  machine  learning  in  predictive  analytics  is  a  very
            important point to produce and to investigate its different techniques. The
            paper shows that deep learning is the most effective technique in machine
            learning while working with big data predictive analytics and shows some of
            its features and challenges. After that, the paper present suggestions to deal
            and solve machine learning challenges in big data predictive analytics. Also it
            shows is a modified prediction models over real-life hospital data collected
            from  central  China  2013-2015.finally  it  shows  a  huge  progress  in  machine
            learning in this manner through dealing with: (incompleteness-missing values-
            parallel  data  and  parallel  models-the  trained  data  –  storage  and  central
            processing).

            Keywords
            Big data and official statistics; Machine learning and big data; Deep learning;
            Official statistics

            1. Introduction
               Defining the word big data was a huge demand through the previous years.
            As big data is considered to be a very huge rapid stream of data with main
            characteristics, which are: (Volume – Variety – Velocity). However big data as


                                                                36 | I S I   W S C   2 0 1 9
   42   43   44   45   46   47   48   49   50   51   52