Page 51 - Contributed Paper Session (CPS) - Volume 6
P. 51
CPS1490 Nehall Ahmed Farouk Mohamed
of the outcomes variables. The two main techniques based on the
methodology are regression techniques and machine learning (ML)
techniques. The techniques based on the type of the outcomes variables
depend on whether the variables are continuous or discrete, like: linear
regression and random forest respectively.
There are main characteristics of big data that need investigation
while dealing with predictive analytics. The first one is heterogeneity,
which come as a result of different data sources or different
populations. This can be overcome through making use of the huge size
of data that might almost represent the population through producing
sophisticated techniques. From this paper author point of view to
overcome heterogeneity is to divide the huge set of data using stratum
technique and to create a pattern for each stratum. As the
characteristics of target groups in each stratum will be similar, so it will
be homogenous. The second characteristic is error accumulation where
simultaneous estimations of patterns occur. These consequences some
parameters that affect the model might be considered as error
accumulation. This might be overcome by dealing with each pattern
separately before the simultaneous estimations, in order to be able to
define the significant variables for each model but it might be quite
difficult. The third characteristic is spurious correlation, where
independent variables appear to be correlated as the size of data
increase according to the study of (Fan and Lv 2008). However making
classifications of each stream of data (granularity) and analysing it might
solve this. As in the analysis of Traffic Loop Detection Data, worked on
data according to certain time for example a crowded hour in the
morning or evening was assumed to have same characteristics and
analysis (Piet J.H. Daas and etc. 2015). The fourth one is incidental
endogeneity that” refers to a genuine relationship between variables
and the error term” (Amir Gandomi, and Murtaza Haider 2015).
2.3. Section 3: Machine learning and big data predictive analytics
These days NSOs and governments do not think only about the current
figure of the situation in the different fields, but also there is a huge trend to
predict the future. Predicting the future is important, whether it is based on
structured data or unstructured data, big data of data from statistics. The
previous section showed the main challenges that exist in big data predictive
analysis, also tries to mention different ways to overcome it. Briefly
overcoming big data predictive analytics are: using many patterns –
granularity- parallelism. Going in depth in using machine learning in big data
40 | I S I W S C 2 0 1 9