Page 47 - Contributed Paper Session (CPS) - Volume 6
P. 47
CPS1490 Nehall Ahmed Farouk Mohamed
Big data predictive analytics using machine
learning for official statistics
Nehall Ahmed Farouk Mohamed
Central Agency for Public Mobilization and Statistics (CAPMAS), Egypt
Abstract
Now one of the most argued topic in the statistical field is big data.
Researches, data scientists, and statisticians worked to define it and evaluate
the outcome of it over the statistical work on the different NSOs. Big data will
not only enhance and improve the NSOs official statistical quality, but also
help in predictive analytics in so many sectors. These predictive analytics will
have a huge contribution in the national, international, and even the global
development. As now it is not the time for knowing the current situation, but
to know how the future will be. Big data predictive analytics has two main
techniques based on the methodology, which are: regression techniques and
machine learning techniques. The paper starts with discussing big data
projects, applications, and the challenges that appeared according to the
nature of big data. Then it focuses on big data analytics, especially predictive
analytics. As it illustrate the challenges that result from the characteristics of
big data in predictive analytics and propose methods to overcome it. Using
the methodologies of machine learning in predictive analytics is a very
important point to produce and to investigate its different techniques. The
paper shows that deep learning is the most effective technique in machine
learning while working with big data predictive analytics and shows some of
its features and challenges. After that, the paper present suggestions to deal
and solve machine learning challenges in big data predictive analytics. Also it
shows is a modified prediction models over real-life hospital data collected
from central China 2013-2015.finally it shows a huge progress in machine
learning in this manner through dealing with: (incompleteness-missing values-
parallel data and parallel models-the trained data – storage and central
processing).
Keywords
Big data and official statistics; Machine learning and big data; Deep learning;
Official statistics
1. Introduction
Defining the word big data was a huge demand through the previous years.
As big data is considered to be a very huge rapid stream of data with main
characteristics, which are: (Volume – Variety – Velocity). However big data as
36 | I S I W S C 2 0 1 9