Page 287 - Contributed Paper Session (CPS) - Volume 7
P. 287
CPS2102 Iris Reinhard
Too many zeros? Are two-part models a good
choice for the analysis of longitudinal data in
health care research?
Iris Reinhard
Biostatistics, Central Institute of Mental Health, Medical Faculty Mannheim / Heidelberg
University, Germany
Abstract
In health care research it is common to encounter data characterized by a
spike at zero followed by a bcontinuous distribution for the positive values.
Examples include health care utilization and health care expenditures, or food
consumption in a dietary study. In the first case the point mass at zero
represents a population of ‘non-users’ who therefore have no costs, while the
continuous distribution represents the level of costs for those people who use
health services. For statistical analyses in order to understand the influence of
therapies, programs, demographic and disease-related variables, alternative
approaches are needed to accommodate the discrete and continuous features
of the data. For the identification of possibly influencing factors on semi-
continuous longitudinal data a two-part model is considered which is based
on a two-stage design. The first stage involves modelling the risk for the
occurrence of a positive outcome and the second stage models the intensity
or the amount of nonzero outcomes. Within that model two sets of covariates
/ factors can be modelled simultaneously that contribute to separate stages.
The hierarchical structure of the data is accounted for by including random
effects. In a simulation study the performance of this model is evaluated in
terms of type I error and the mean squared error (MSE) of the estimates, under
different levels of sample size and correlation between covariates as well as
correlation between random effects. The data generation process is thereby
based on the distribution characteristics of an empirical data set coming from
a controlled prospective intervention study which is investigating the cost-
effectiveness of an intervention to reduce compulsory admission into inpatient
psychiatric treatment. Finally, the results are compared to conventional linear
mixed models. The proposed two-stage model performs well for the analysis
of semicontinuous health care data which represent the structure of real cost-
effectiveness data. With increasing sample size the performance improves. The
classical linear mixed model has to be discouraged because it produces
inflated type I error and much higher MSEs than the two-part model.
Keywords
Health care expenditures; semicontinuous data; zero modified data; linear
mixed model; simulation study
1. Introduction
274 | I S I W S C 2 0 1 9