Page 141 - Invited Paper Session (IPS) - Volume 2
P. 141
IPS192 Hukum C. et al.
we use a subscript of to index those quantities associated with area . In
particular, and are used to represent the sample and population sizes in
area , respectively. We also assume that the underlying unit level variable of
interest is discrete, and in particular is either a binary value or is a non-
negative integer, and the aim is to estimate the corresponding small area
population proportions or population totals (i.e. counts). Let the total of in
area be denoted , and let and denote the corresponding sample and
non-sample counts for area respectively. We shall assume that area level
auxiliary information from secondary data sources, e.g., Census and
Administrative records, is available. Let be the p-vector of these covariates
for area from these sources. The area level version of the GLMM is then
defined as Pr( | ) ∝ , where
( ) = ƞ = + (1)
where g(·) is a known function, called the link function, = −1 (ƞ ), β is the
p-vector of regression coefficients, often referred to as the fixed effect
parameter of the GLMM, and ~ (0, ). The model (1) can be used to relate
2
the area level direct survey estimates to area level covariates. This type of
model is often referred to as ‘area-level’ model in SAE (Fay and Herriot,1979).
Collecting the area level models (1), we can write the model (1) as
() = ƞ = + , (2)
where = ( , … . . , ) , = ( , … . . , ) is a × matric and u =
1
1
( , … . . , ) is a vector of × 1 of area random effects which is normally
1
distributed with mean zero and variance ∑ = . Here, is an identity
2
matrix of order . When the variable of interest is binary, and unit level
values in area are independently and identically distributed, the sample
counts in area , has a Binomial distribution with parameters and ,
denoted by ~ Binomial( , ), where is now the probability of
occurrence of an event or probability of prevalence in area , often referred to
as the probability of a ‘success’. Similarly, the non-sample count in area
is such that ~ Binomial( - , ). That is, the counts and are
independent Binomial variables with then corresponding to a common
success probability. In this case, the link function g(·) is usually taken to be the
logit of the probability . The model (1) linking with the covariates is
then the GLMM with logistic link function given by logit( )= ln{ (1 −
) } = ƞ = + , with = exp(ƞ ){1 + (ƞ )} −1 = (ƞ ) =
−1
2
( + ) and ~ (0, ). Here, | ~Binomial ( , ( +
)) and | ~ Binomial ( − , expit( + )). The expected values of
and given are then = ( | ) = expit( + ) and =
( | ) = ( − )expit( + ). The population count in area can be
expressed as = + , where the first term , the sample count, is
known whereas the second term , the non-sample count, is unknown. A
128 | I S I W S C 2 0 1 9