# Generalized linear (mixed) models, autumn 2022

## Messages

Februar 16: Note to self (or whoever takes over this course).

December 2: Preliminary solution to todays exam.

November 25: I have booked F3 from 9-15 on Wednesday, November 30 for "spørredag" (questionday) before the exam. I'll be available to answer any question you have before the exam. If I'm not there send me an email or text message (99705519).

You may want to read the frontpage of the exam now. Although R and Rstudio is available at the PCs at the exam facilities there will be no need for R beyond looking up quantiles or probabilities associated with certain distributions that are not given in Tabeller og formler i statistikk.

If you plan to use the Wacom Pen Display, make sure you are familiar with Microsoft OneNote that you need to use to write your answers. The exam office has also made this guide. Alternatively you can write on paper so bring a pencil and eraser.

A general tip for the exam is to make sure that you're able to state the assumptions of different models in suitable, consise and precise mathematical notation (different possibilities are discussed several places in the lecture notes). For example, describing a logistic regression model by stating that $\operatorname{logit}y_i = \beta_0 + \beta_1 x_1 + \epsilon_i$ would be wrong for several reasons. Why?

## Practical information

Lectures: See Course information.

In addition we will provide guidance for the projects via the Discourse forum and at the computer lab (Banachrommet) on the third floor.

We plan to do physical lectures only but the lecture notes will be made available.

Lecturer: Jarle Tufto

Teaching assistant: Philip Mostert.

Reference group: NN, NN, NN (send me an email if you want to be in the reference group).

## Tentative curriculum

Fahrmeir et. al. (2013) (freely available on springer link), ch. 2.1-2.4, B.4, 5.1-5.4, 5.8.2, 6, 7.1-7.3, 7.5, 7.7. We will also use some material from Wood (2015) Wood (2015), and some material from some Harville 1974, Agresti 2002 and Kristiansen 2016 (see below).

This covers ordinary linear and multiple regression (mostly repetition from TMA4267 Linear statistical models), binary regression, Poisson and gamma regression, the exponential family and generalised linear models in general, categorical regression (includes contingency tables and log-linear models, multinomial and ordinal regression), linear mixed effects models, generalized linear mixed effects models.

The lectures.

Also see the official ntnu course info.

## Obligatory projects

There will be three obligatory projects (counts 30% towards final grade). We will provide assistance with the projects in Banachrommet each Wednesday between 14:15-15 (or a bit longer if needed)

The report should be a self-contained document written in english or norwegian written as a proper mathematical text and including R-code with corresponding output. The report should be structured with sections correponding to the different subtasks 1a, 1b and so on.

## Lectures (tentative plan)

August 24: Introduction to glms (ch. 2.1-2.3), the exponential family (ch. 5.4.1). .

August 25: More on the exponential family (ch. 5.4.1). Review of theory of linear models (ch. 3).

August 31: Geometric views of the linear model. Sampling distributions associated with the linear model (the chi-square-, t- and F-distribution).

September 1: Testing and fitting linear hypotheses (via quadratic form for $C\hat\beta-d$ - Box 3.13 in Fahrmeir) or via F-test based on sums of squares for each model alternative (the restricted model fitted via Lagrange method (pp. 172-173) or using the solution to problem 2 below). Design matrices for interactions between numeric and categorical covariates.

September 7: Binary regression (ch. 5). Logit, probit links cloglog models. Score function of binary regression model.

September 8: Some general properties of the expected log likelihood (sec. 4.1 in Wood (2015)). Expected and observed Fisher information and iterative computation of MLEs for binary regression (Fisher scoring algorithm). Binary regression continued.

September 14: A minimal example of divergence of the Fisher scoring algorithm. Asymptotic properties of MLEs. Likelihood ratio, Wald, and score tests. Deviance and testing goodness-of-fit.

September 15: More on the deviance and the saturated model. Deviance residuals. Estimating the overdispersion parameter. We'll also go through a sketch of a proof for the asymptotic distribution of LR test statistic (section 4.4 in Wood).

September 21,22: No lectures, work on project 1.

September 28: Example (lung cancer rates) illustrating model selection via AIC, model parsimony, Wald and likelihood ratio testing. Theory behind AIC (Wood sec. 4.6). Poisson regression. Fisher scoring vs. Newton-Raphson for poisson-regression with non-canonical identity link (see R code for further illustrations).

September 29: Poisson-regression with identity link cont. constraints on beta. Gamma and lognormal regression. GLMs in general and IRLS (ch. 5.4, 5.8.2). Generalised least squares via QR-decomposition.

October 5: Quasi-likelihood models (ch. 5.5). Estimating the overdispersion parameter. The F-test between nested alternatives (see Venables and Ripley 2002, eq. 7.10)). Linear separation. Offset variables, profile likelihood confidence intervals.

October 6: Multinomial models (VGLMs). Multinomial models as discrete choice latent utility models.

October 12: Ordinal regression models. General treatment of multinomial VGLMs.

October 13: Work with project 2

October 19: Linear mixed models (LMMs) (Ch. 7.1 and 7.2)

October 20: Mixed model continued. ML and restricted likelihood based on $\mathbf{A}\mathbf{y}$ (REML) estimation (7.3). Bayesian/marginal likelihood interpretation of the restricted likelihood (Harville 1974).

October 26: Mixed models continued. The connection between the profile and restricted likelihood (7.3.2) (again, see Harville 1974). Hypothesis testing (7.3.4) (for details, see Stram and Lee 1996).

October 27: Mixed models continued: Conditional distributions/BLUPs of random effects (7.3.1, 7.3.3).

November 2: Generalized linear mixed models (GLMMs). Differences between individual/cluster- and population-level effect sizes (i.e. marginal vs. conditional models) including approximation in Agresti, 2002, eq. 12.8.

November 3: Methods of inference for GLMMs: Laplace approximation (Kristensen et.al. 2016) of the marginal likelihood and its computation via automatic (see e.g. Wood ch. 5.5.3) as opposed to numerical and symbolic differentiation. Restricted likelihood (REML) for GLMMs and its Laplace approximation (available in R-package glmmtmb). Numerical integration of the marginal likelihood via adaptive Gauss-Hermite quadrature (Liu and Pierce, 1994).

November 8, 9, 16, 17: No lectures, instead work on project 3. The exercise class at Banachrommet goes as planned.

November 16, 1723, 24: Summary of the course. I'll go through the exam in 2019, problem 2, ST2304 2013 problem 3, ST2304 august 2015 problem e,f,g.

If stuck, post a message describing what you have tried in the Discourse forum and we will help you. We will usually not post complete solutions.

1. Determine whether or not the gamma, the inverse Gaussian, the negative binomial and the uniform distribution on $(0,\theta)$ belong to the exponential family (Fahrmeier, ch. 5.4.1) and find their mean and variance via the general formulas $EY=b'(\theta)$ and $\operatorname{Var}Y=\frac\phi w b''(\theta)$

1b. Show that that the third central moment of a random variable $X$ equals the third cumulant.

1c. Show that a zero-truncated Poisson distribution belongs to the exponential family, that is, the distribution of $X\sim \operatorname{Poisson}(\lambda)$ conditional on $X>0$.

2. For the linear model $Y=X\beta + \epsilon$, one method for estimating the parameters under a linear hypothesis $C\beta = d$ ($r$ linear constraints on the $p$ model parameters) is given in Fahrmeier, p. 172-173. Show that an alternative approach is to rewrite the model into an alternative form $Y' = X_0\beta_0+\epsilon$ where $X_0$ has dimension $n \times (p-r)$ and $\beta_0$ dimension $(p-r) \times 1$ and $Y'$ is equal to the original $Y$ minus some offset vector. Hint: https://en.wikipedia.org/wiki/Block_matrix#Block_matrix_multiplication.

2b. When testing nested linear model alternatives, show that the $F$-test is equivalent (leads to the same conclusions) as a test using the likelihood ratio as the test statistic.

3. Assume that the $n\times p$ matrix $X$ has rank $p$ and let $H=X(X^T X)^{-1}X^T$. Show that $H$ and $I-H$ are both idempotent and symmetric matrices. Use this to show that the eigenvalues of both matrices are either 1 or 0.

4. Suppose that $Y$ is Bernoulli distributed with parameter $q$ and that $X|Y=i \sim N(\mu_i,\sigma^2)$ for $i=0,1$. Show that the logit of the conditional probability $P(Y|X=x)$ can be written as $\beta_0 + \beta_1 x$. Find $\beta_0,\beta_1$ expressed in terms of $q,\mu_1,\mu_2,\sigma$.

5. In R, fit the probit model to the juul data (the help page ?juul describes the data) using

library(ISwR) # Install the package if needed
data(juul)
juul$menarche <- factor(juul$menarche, labels=c("No","Yes"))
juul.girl <- subset(juul,age>8 & age<20 & complete.cases(menarche))
attach(juul.girl)
plot(age, menarche)
detach(juul)

and compute estimates of mean $\mu$ and standard deviation $\sigma$ of the underlying normal distribution of the latent time of menarche from estimates of $\beta_0$ and $\beta_1$ of the glm. Also find approximate estimates of standard errors of $\hat \mu$ and $\hat \sigma$ using the delta method.

5c. If using the logit- instead of the probit-link in problem 5, what implicit assumption are we making about the underlying distribution of time of menarche? Compute the mean and standard deviation of the latent times for this alternative model. Which estimates do you trust the most?

5d. Show that the cloglog choice of link function for problem 5 and log transforming age would correspond to the assumption that the underlying latent age of menarche have a Weibull distribution.

5b. For linear models, the regression can be forced through the origin by omitting the first column of the design matrix $\mathbf{X}$. Is there any way to force a binomial regression using the logit link through the origin such that $p_i=0$ for $x_i=0$?

7. A model is identifiable if there is a one-to-one mapping between the parameter vector $\beta$ and the distribution of the data. If $X$ has rank $p-1$, why is a glm not identifiable? In such cases, what shape will the log likelihood surface $l(\beta)$ take? (Consider the no-intercept model $\eta_i = \beta_1 x_{i1} + \beta_2 x_{i2}$ as an example). How does this agree with and why is the observed Fisher information only be positive semi-definite?

8. For a glm with the canonical choice of link function, write the log likelihood on exponential family form and derive the observed Fisher information $H(\beta)$ . Assuming that $X$ has full rank, is $H(\beta)$ positive definite? Based on this, what can we say about the overall shape of the log likelihood function? How many optima are possible at the interior of the parameter space?

9. For the linear model $y = X\beta + \epsilon$, $\epsilon \sim N(0,\sigma^2 I_n)$ where $X$ has rank $p<n$, derive the restricted log likelihood $l_R(\sigma^2)$ based on the error contrasts $w = A^T y$ where $A$ satisfies $A^T A=I_{n-k}$, $AA^T=I_n-X(X^T X)^{-1}X^T$, and $A^T X=0$. Use this to find the REML estimate of $\sigma^2$. Hint: Recall that $I-H = (1-H)^2=(1-H)^T(1-H)$.

10. Let $Y \sim \text{Gamma}(a,b)$. For which parameter values is $\ln Y$ skewed to the left, to the right and without skew? Hint: Derive the third central moment of $\ln Y$ via the moment and cumulant generating functions. The polygamma function is available in R under the name psigamma.

11. For a general linear model $y = X\beta + \epsilon$, $\epsilon \sim N(0,V)$, show that the ordinary least squares estimator $\hat\beta_\text{OLS}=(X^T X)^{-1}X^T y$ is unbiased for $\beta$ but that $\hat\beta_{OLS}$ is less efficient than the general least squares estimator $\hat\beta_\text{GLS}=(X^T V^{-1} X)^{-1}X^T V^{-1} y$ in the sense that $\operatorname{Var}\hat\beta_\text{OLS}\ge \operatorname{Var}\hat\beta_\text{GLS}$ where the notation $A\ge B$ means that $A-B$ is positive semi-definite. Hints: 1) If $A$ and $B$ are positive definite, then $A-B\ge 0$ is equivalent to $B^{-1}-A^{-1}\ge 0$. 2) If $P$ is an orthogonal projection matrix, then $P = P P^T$.

12. Apply the Gram-Schmidt algorithm on the polynomials $1,x,x^2,x^3$ to derive the 4 first Hermite polynomials $p_0(x),p_1(x),p_2(x),p_3(x)$ using $w(x)=e^{-x^2}$ as the weight function. Use this to determine the resulting weights and quadrature points of the Gauss-Hermite quadrature approximation $\sum_{i=1}^3 w_i f(x_i)$ of $\int f(x)e^{-x^2/2}dx$.

## Exam

Tuesday December 2, 15:00-19:00

Previous exams can be found at previous exams out of which 2013-2016 are not the most relevant.

Any questions on the exponential family in this years exam will be based on the notation in Fahrmeir. The solutions to exams in 2008 and 2009 is based on a slightly more general definition of the exponential family and different notation (Dobson & Barnett 2008) writing the pdf of pmf as as $\exp(b(\theta)a(y) + c(\theta) + d(y))$ instead of $(\exp((y\theta - b(\theta))w/\phi - c(y,\phi,w)$ in Fahrmeir. Thus

• $b(\theta)$ is the canonical parameter corresponding to $\theta$ in Fahrmeir
• $a(y)$ corresponds to $y$,
• $c(\theta)$ (implicitly a function of the canonical parameter $b(\theta)$) corresponds to $b(\theta)$. Implicit differentiation then leads to a different formula for $EY=-c'(\theta)/b'(\theta)$.
• $d(y)$ corresponds to $c(y,\phi,w)$.

Note that if a random variable $Y$ satisfies the definition of Dobson & Barnett, then $a(Y)$ satisfies the definition in Fahrmeir.