Page 288 - Contributed Paper Session (CPS) - Volume 7
P. 288

CPS2102 Iris Reinhard
                   In health services research it is common to encounter data characterized
               by an abundance of zeros in combination with a continuous distribution for
               positive  values.  Examples  can  be  health  care  utilization,  health  care
               expenditures or food consumption in a dietary study. In the first case the point
               mass at zero represents a population of ‘non-users’ who therefore produce no
               costs, while the continuous distribution represents the level of costs for those
               people who use health services. For the identification of possibly influencing
               factors  on  outcomes  of  this  type,  the  two-part  model  is  considered  to
               accommodate  the  discrete  and  continuous  features  of  the  data,  and
               investigated by a simulation study.

               2.  Methodology
                   Semicontinuous data can be regarded as a result of two stochastic
               processes, one for the occurrence of zeros and the second for the observed
               value given a non-zero response (Neelon et al., 2016). The two - part model
               seems to be a perfect choice for the analysis of such data, because it is a
               mixture model involving two basic components:
                   -   Component 1: relates to the risk for the occurrence of a positive
                      outcome (binary outcome
                      model)
                   -   Component  2:  regresses  the  intensity  or  amount  of  non-zero
                      outcomes.
               A normal distribution can be chosen to model the non-zero values, leading to




               where Yij denotes the outcome for subject i at time tij (i=1, ...n; j=1, ..., m) from
               a finite mixture (e.g. the cost of inpatient stays). The normal assumption can
               be relaxed by using alternative distributions for the non-zero outcomes, e.g.
               lognormal, gamma.
               The logistic-normal two-part model can be written as




               Extending  the  two-part  model  to  the  regression  setting,  two  sets  of
               covariates/predictors can be modelled simultaneously. Let xij be a vector of
               potential risk factors for the intensity of the outcome and zij another vector of
               risk factors that are linked with the probability of having a positive response.
               Then  the  model  for  longitudinal  semicontinuous  data  is  parameterized  as
               follows





                                                                  275 | I S I   W S C   2 0 1 9
   283   284   285   286   287   288   289   290   291   292   293