Overdispersion: Models and Estimation

A Short Course for SINAPE 1998

 

John Hinde

MSOR Department, Laver Building, University of Exeter,

North Park Road, Exeter, EX4 4QE, UK

EMail: J.P.Hinde@exeter.ac.uk

Fax: +44 1392 264460

 

Clarice G.B. Demétrio

Departamento de Matemática e Estatística, ESALQ, USP

Caixa Postal 9

13418-900 Piracicaba, SP

EMail: Clarice@carpa.ciagri.usp.br

Fax: 019 4294346

 

Level: Masters

 

Outline

1. Introduction

    1. Models for proportion and count data

1.1.1.Binomial regression models

1.1.2. Poisson regression models

1.2. Overdispersion: causes and consequences

1.2.1. Overdispersion in glms

1.2.2. Causes of overdispersion

1.2.3. Consequences of overdispersion

1.3. Examples

1.3.1. Example: Germination of Orobanche seed

1.3.2. Example: Worldwide airline fatalities

2. Overdispersion models

2.1. Mean-variance models

2.1.1. Binomial data

2.1.2. Count data

2.2. Two-stage models

2.2.1. Binomial data

2.2.2. Count data

3. Estimation methods

3.1. Maximum likelihood

3.1.1. Beta-binomial distribution

3.1.2. Negative binomial distribution

3.1.3. Random effect in the linear predictor

3.2. Maximum quasi-likelihood

3.3. Extended quasi-likelihood

3.4. Pseudo-likelihood

3.5. Moment methods

3.6. Non-parametric maximum likelihood

3.7. Bayesian approach

4. Model selection and diagnostics

4.1. Model selection

4.1.1. Testing overdispersion

4.1.2. Selecting covariates

4.2. Diagnostics

5. Examples

5.1. Binary data

5.1.1. Orobanche germination data

5.1.2. Trout egg data

5.1.3. Rat survival data

5.1.4. Smoking and fecundability data

5.2. Count data

5.2.1. Pump failure data

5.2.2. Fabric fault data

5.2.3. Quine's data

6. Extended overdispersion models

6.1. Random effect models

6.2. Double exponential family

6.3. Generalized linear mixed models

 

 

Abstract

 

In applying standard generalized linear models it is often found that the data exhibit greater variability than is predicted by the implicit mean-variance relationship. This phenomenon of overdispersion has been widely considered in the literature, particularly in relation to the binomial and Poisson distributions. Failure to take account of this overdispersion can lead to serious underestimation of standard errors and misleading inference for the regression parameters. Consequently, a number of models and associated estimation methods have been proposed for handling such data. Overdispersion models for discrete data are considered and placed in a general framework. A distinction is made between completely specified models and those with only a mean-variance specification. Different formulations for the overdispersion mechanism can lead to different variance functions which can be placed within a general family. In addition, many different estimation methods have been proposed, including maximum likelihood, moment methods, extended quasi-likelihood, pseudo-likelihood and non-parametric maximum likelihood. We explore the relationships between these methods and examine their application to a number of standard examples for count and proportion data. A simple graphical method using half-normal plots is used to examine different overdispersion models.

 

Keywords: Generalized linear models; Overdispersion; Binomial; Beta-binomial; Poisson; Negative-binomial; Maximum likelihood; Moment method; Extended quasi-likelihood; Pseudo-likelihood; Diagnostics.