Page 103 - Contributed Paper Session (CPS) - Volume 3
P. 103
CPS1954 Vincent C. et al.
denotes a DP with concentration parameter > 0 and base distribution .
0
Here, one possible choice of is the normal-inverse-Wishart distribution.
0
To obtain meaningful results in any classification problems, we often
require that = for some ≠ so that each observation in the dataset
does not belong to a cluster of its own, i.e. = . The DP prior exhibits such
clustering property. Integrating out from (2.4), Blackwell and MacQueen
(1973) show that the conditional prior distribution induced on follows a
Pólya urn scheme
where is the point measure at . The parameter is generated by first
drawing a sample from the base distribution . Subsequent samples are then
0
obtained by setting to be either a random draw from the current pool of
parameters { , . . . , −1 } with probability proportional to − 1 or a new
1
sample from with probability proportional to . Since random draws from
0
a continuous have zero probability of being identical, a large value of
0
gives rise to a larger set of unique parameters { , . . . , } in { , . . . , }. Teh
1
(2011) show that for , ≫ 0,
indicating that the mean of is data driven for a fixed concentration
parameter and it scales logarithmically with the size of data . Clustering
effect of the DP as a result of the Pólya urn scheme in (2.5) makes it a popular
option to model multimodal distributions without having to specify the
number of components explicitly.
We have thus far treated the knot location vector as predetermined and
fixed across all children in the population. However, this is unrealistic in our
context as different children react differently to treatment interventions such
as the administration of vitamins or to negative experiences such as infections
which will likely occur at different time points. The heterogeneity in the timing
of treatment interventions or the occurrence of insults will likely cause
individual trajectories to change course at different time points. Furthermore,
fixing results in a biased estimate of the growth velocity in the broken
stick model as regression lines between two neighbouring segments are
connected at the internal knot. This would then affect the classification model
because we summarise the growth pattern of children by . Therefore, a
sensible approach is to model the knot location within the interval of [0, ] as
child specific random effects = ( , . . . , ) whose distribution is
expressed by
92 | I S I W S C 2 0 1 9