Page 288 - Contributed Paper Session (CPS)

Page 288 - Contributed Paper Session (CPS) - Volume 7

P. 288

CPS2102 Iris Reinhard
In health services research it is common to encounter data characterized
by an abundance of zeros in combination with a continuous distribution for
positive values. Examples can be health care utilization, health care
expenditures or food consumption in a dietary study. In the first case the point
mass at zero represents a population of ‘non-users’ who therefore produce no
costs, while the continuous distribution represents the level of costs for those
people who use health services. For the identification of possibly influencing
factors on outcomes of this type, the two-part model is considered to
accommodate the discrete and continuous features of the data, and
investigated by a simulation study.

2. Methodology
Semicontinuous data can be regarded as a result of two stochastic
processes, one for the occurrence of zeros and the second for the observed
value given a non-zero response (Neelon et al., 2016). The two - part model
seems to be a perfect choice for the analysis of such data, because it is a
mixture model involving two basic components:
- Component 1: relates to the risk for the occurrence of a positive
outcome (binary outcome
model)
- Component 2: regresses the intensity or amount of non-zero
outcomes.
A normal distribution can be chosen to model the non-zero values, leading to

where Yij denotes the outcome for subject i at time tij (i=1, ...n; j=1, ..., m) from
a finite mixture (e.g. the cost of inpatient stays). The normal assumption can
be relaxed by using alternative distributions for the non-zero outcomes, e.g.
lognormal, gamma.
The logistic-normal two-part model can be written as

Extending the two-part model to the regression setting, two sets of
covariates/predictors can be modelled simultaneously. Let xij be a vector of
potential risk factors for the intensity of the outcome and zij another vector of
risk factors that are linked with the probability of having a positive response.
Then the model for longitudinal semicontinuous data is parameterized as
follows

275 | I S I W S C 2 0 1 9

283 284 285 286 287 288 289 290 291 292 293