Page 287 - Contributed Paper Session (CPS) - Volume 7
P. 287

CPS2102 Iris Reinhard

                             Too many zeros? Are two-part models a good
                             choice for the analysis of longitudinal data in
                                          health care research?
                                               Iris Reinhard
                  Biostatistics, Central Institute of Mental Health, Medical Faculty Mannheim / Heidelberg
                                             University, Germany

               Abstract
               In health care research it is common to encounter data characterized by a
               spike at zero followed by a bcontinuous distribution for the positive values.
               Examples include health care utilization and health care expenditures, or food
               consumption  in  a  dietary  study.  In  the  first  case  the  point  mass  at  zero
               represents a population of ‘non-users’ who therefore have no costs, while the
               continuous distribution represents the level of costs for those people who use
               health services. For statistical analyses in order to understand the influence of
               therapies, programs, demographic and disease-related variables, alternative
               approaches are needed to accommodate the discrete and continuous features
               of  the  data.  For  the  identification  of  possibly  influencing  factors  on  semi-
               continuous longitudinal data a two-part model is considered which is based
               on  a  two-stage  design.  The  first  stage  involves  modelling  the  risk  for  the
               occurrence of a positive outcome and the second stage models the intensity
               or the amount of nonzero outcomes. Within that model two sets of covariates
               / factors can be modelled simultaneously that contribute to separate stages.
               The hierarchical structure of the data is accounted for by including random
               effects. In a simulation study the performance of this model is evaluated in
               terms of type I error and the mean squared error (MSE) of the estimates, under
               different levels of sample size and correlation between covariates as well as
               correlation between random effects. The data generation process is thereby
               based on the distribution characteristics of an empirical data set coming from
               a controlled prospective intervention study which is investigating the cost-
               effectiveness of an intervention to reduce compulsory admission into inpatient
               psychiatric treatment. Finally, the results are compared to conventional linear
               mixed models. The proposed two-stage model performs well for the analysis
               of semicontinuous health care data which represent the structure of real cost-
               effectiveness data. With increasing sample size the performance improves. The
               classical  linear  mixed  model  has  to  be  discouraged  because  it  produces
               inflated type I error and much higher MSEs than the two-part model.

               Keywords
               Health  care  expenditures;  semicontinuous  data;  zero  modified  data;  linear
               mixed model; simulation study
               1.  Introduction
                                                                  274 | I S I   W S C   2 0 1 9
   282   283   284   285   286   287   288   289   290   291   292