The World of Zero-Inflated Models

  • Volume 1: Using GLM (2021)
  • Volume 2: Using GLMM (2024)
  • Volume 3: Using Multivariate GLMM and GLLVM (2024)

 

WorldOfZIMV1  WorldOfZIMV2  WorldOfZIMV3

 

Volume 1: Using GLM

Volume 1: Table of Contents, and pdf of Chapters 1 and 2.

In Chapter 2 we revise data exploration and multiple linear regression using red knot data. Stable isotope ratios of nitrogen in animal tissues are modelled as a function of 3 covariates. This chapter serves as a blueprint for all other chapters in the sense that it shows the general outline of a statistical analysis.

Chapter 3 starts with a revision of the Poisson distribution and the Poisson GLM for the analysis of count data. We use a small puffin data set. We also introduce the NB GLM and two relatively unknown, but useful, members of the family, namely the GP GLM and the CMP GLM. Surprisingly, the latter two models tend to perform better than the NB GLM in the case of overdispersion. The latter two can also be used to deal with underdispersion. Most models are fitted with the glmmTMB package in R. Model validation tools are explained, and the concept of simulating data from a model (to verify whether it complies with all assumptions of the model) is introduced. We first do the simulation steps ourselves, then quickly migrate to the DHARMa package, which is rapidly gaining popularity.

In Chapter 4 we introduce zero-inflated models for count data, and these are executed with the glmmTMB package. We start with a basic introduction using simulated data, and discuss zero-inflated Poisson (ZIP), zero-inflated NB (ZINB), zero-inflated generalised Poisson (ZIGP) and zero-inflated CMP (ZICMP) models. We then apply them all on the puffin data set. In Chapter 5 we analyse data on parasites in Brazilian sandperch. Such data nearly always bring you within zero-inflation territory. Now that we are familiar with Poisson, NB, GP, CMP models, and their zero-inflated cousins, it is time to learn how we can manoeuvre among them. How do we decide to apply an NB GLM or a ZIP model? In this chapter, we will keep the binary part of the model simple. Chapter 6 is about ZIGP models. Data on mistletoe tree infections are used. The ZIGP models contain covariates in both the count and binary parts of the model.

Hurdle models for count data are discussed in Chapter 7 using dolphin sighting data. In a hurdle model we perform 2 analyses. First, the sighting abundances are converted into absence/presence data, and a Bernoulli GLM is applied. Then the zero counts are set to NA (or dropped), and a truncated Poisson (or NB) GLM is applied. In the third step, the two components are combined to calculate the expected values of the hurdle model. Chapter 7 is relatively long as it contains many topics that may be relevant: Bernoulli GLM, quasi-separation, truncated Poisson and NB distributions, and zero-altered Poisson (ZAP) and zero-altered NB (ZANB) models.

In the last two chapters of this volume, we discuss models for the analysis of continuous data with an excessive number of zeros. Biomass of lobsters are analysed using Tweedie GLMs in Chapter 8, and a ZAG model is applied on the same data in Chapter 9. The ZAG is a hurdle model for continuous data. Our recommendation is to opt for the Tweedie GLM approach.

 

Volume 2: Using GLMM

Volume 2: Table of Contents

Although this book is published under the umbrella of 'The World of Zero-Inflated Models', it also provides a good introduction to ordinary linear mixed-effects models and GLMMs.

Chapter 11 contains an extensive explanation of linear mixed-effects models. Originally, we used a dataset of bears and ants, but after discovering that the covariates only explained 2% of the variation, we decided to completely rewrite this chapter with a different dataset on painted turtles. At that point, we had forgotten that the chapter on zero-inflated binomial GLMMs also uses painted turtle data. So, we hope you like turtles. In Chapter 12, we first introduce Poisson GLMM using a squirrel dataset and discuss marginal and conditional predicted values. The chapter also covers zero-inflated Poisson and generalised Poisson GLMMs. A zero-inflated Poisson GLMM is applied to a humphead fisheries dataset in Chapter 13.

In Chapter 14, we discuss how to handle nested and crossed random effects, as well as auto-correlation, using a dataset on zero-inflated tree hyrax count data. A detailed explanation of zero-inflated binomial GLMMs is provided in Chapter 15. We use a dataset on painted turtles and also touch upon beta-binomial models. In contrast, Chapter 16 utilises beta GLMMs, zero-inflated beta GLMMs, zero-altered beta GLMMs, and ordered beta GLMMs for the analysis of zero-inflated caribou data. Finally, Chapter Chap17 presents an application of the Tweedie GLMM to zero-inflated biomass data.

 

Volume 3: Using Multivariate GLMM and GLLVM

Text to follow

 

Data and R code

All data is freely available. All the R code is provided as well, except that a password is needed to open the zip files. The password is given in the Preface of each book.

Volume 1: Using GLM

Volume 2: Using GLMM

Volume 3: Using Multivariate GLMM and GLLVM

  • All data for Volume 3: In prep
  • All R code for Volume 3: In prep

We used the R-Markdown files from the book and removed all text except for the blocks with R code and the section headers (so you can see where you are in a chapter).

  • Unzip the files. The zip file with the R code is password protected. The password is in the Preface of each book.
  • Click on one of the Rmd files. It will open in RStudio.
  • Set the working directory with the setwd() function.
  • Option 1 to execute the code: Click on the knitting symbol in RStudio (the blue ball with the needle through it).
  • Option 2 to execute the code: Click on the green triangle of each so-called chunk.
  • Option 3 to execute the code: If you do not fancy RMarkdown code, then you can also copy-paste the R code within the chunks into an ordinary R file (you can also extract the R code from an RMarkdown document automatically, see this link).
  • Option 4: Copy all the RMardown code into ChatGPT, and ask it to remove all the chunk code.
  • Send us an email in case of errors.