# TMA4315 Generalized linear models, autumn 2019

## Messages

Jan. 21: The exam and a tentative solution is available at previous exams. Grades:

AAAAAAAAAA BBBBBBBBBBBBBB CCCCCCCC DDDDDDD EEEEEE

Dec. 6: As you can see on studweb, you will (need to) use PCs provided by NTNU at the exam so there is no need to bring your own computer. You'll have access to R and Rstudio on these PC, but none of the questions on the exam requires using R and Rstudio. There will be one multiple choice problem. For the other problems, you can write on the sheets of paper provided at the exame or you can fill in the answer electronically via the Inspera exam browser. Also note that in cases where a quantile or probability is not tabulated for the degrees of freedom you need in "Tabeller of formler i statistikk" you're allowed to round up or down to the nearest degrees of freedom that is tabulated. Or you may use R for this.

Dec. ~~2~~3: Any questions on the exponential family in this years exam will be based on the notation in Fahrmeir. The solutions to exams in 2008 and 2009 is based on a slightly more general definition of the exponential family and different notation (Dobson & Barnett 2008) writing the pdf of pmf as as \(\exp(b(\theta)a(y) + c(\theta) + d(y))\) instead of \((\exp((y\theta - b(\theta))w/\phi - c(y,\phi,w)\) in Fahrmeir. Thus

- \(b(\theta)\) is the canonical parameter corresponding to \(\theta\) in Fahrmeir
- \(a(y)\) corresponds to \(y\),
- \(c(\theta)\) (implicitly a function of the canonical parameter \(b(\theta)\)) corresponds to \(b(\theta)\). Implicit differentiation then leads to a different formula for \(EY=-c'(\theta)/b'(\theta)\).
- \(d(y)\) corresponds to \(c(y,\phi,w)\).

Note that if a random variable \(Y\) satisfies the definition of Dobson & Barnett, then \(a(Y)\) satisfies the definition in Fahrmeir.

**Nov. 21: I'll be available to answer questions before the exam on Tuesday, December 10 in R4 most of the day between 9 and 16. If I'm not there, send me an email.
**
Nov. 14: I have corrected an error in question 7 of problem 2, obligatory exercise 3. See updated version.

Oct. 11: I'll be available at Nullrommet to help with project 2 next thursday (Oct. 17) from about 10:30 (I have another appointment at 10).

Oct. 10: You can get your graded project 1 back by coming to Michails office (room number 1229 in SII).

Sept. 11: If you don't have access to Nullrommet, please fill in your information in this form asap!!

Sept. 10: If you want to be in the reference group, let me know in the lecture or by email.

Aug. 29: We'll consider moving the weekly exercises to another time point because of a collision with the schedule in statistical inference. Please fill your available time points in this google spreadsheet

## Practical information

Lectures: Wednesdays 10:15-12:00 in R90 and Fridays 10:15-12:00 in R4.

Exercises: ~~Gallery 4 byggteknisk at 14:15-15:00 on wednesday~~ Nullrommet on Thursdays from 10-11.

Lecturer: Jarle Tufto

Teaching assistant: Michail Spitieris.

Reference group: Amir Ahmed (Bmat), NN

## Obligatory exercises

There will be three obligatory exercises (counts 30% of final grade).

Obligatory exercise 1 Solution

Obligatory exercise 2 Solution

Obligatory exercise 3 Solution

You can get assistance with the exercises on Thursdays from 10-11 at Nullrommet located here.

## Tentative curriculum

Fahrmeir et. al. (2013) (freely available on springer link), ch. 2.1-2.4, B.4, 5.1-5.4, 5.8.2, 6, 7.1-7.3, 7.5, 7.7. We will also use some material from Wood (2015), see below.

This covers ordinary linear and multiple regression (mostly repetition from Linear statistical models), binary regression, Poisson and gamma regression, the exponential family and generalised linear models in general, categorical regression (includes contingency tables and log-linear models, multinomial and ordinal regression), linear mixed effects models, generalized linear mixed effects models.

Also see the official ntnu course info.

## Exam

December 12, 15:00-19:00

The exam will be digital but you can write on ordinary paper if you like, see general information on innsida. You ~~can/should(?) bring you own laptop or~~will use a pc provided by NTNU ~~(you will only be able ton safe exam browser during the exame which means that you will not able run R or Rstudio during the exam)~~. Typical exam questions will include interpreting R output (model summaries, plots) of a fitted LM, GLM, LMM, or GLMM. Permitted aids are yellow a5 sheet with handwritten notes, permitted calculator, "Tabeller og formler i statistikk, Tapir forlag" and "K. Rottmann, Matematisk formelsamling".

Previous exams can be found at previous exams out of which 2013-2016 are not the most relevant.

## Lecture summary

21/8: Introduction to glms (ch. 2.1-2.3), the exponential family (ch. 5.4.1)

23/8: More on the exponential family (ch. 5.4.1). Review of theory of linear models (ch. 3).

28/8: Geometric views of the linear model. Sampling distributions associated with the linear model (the chi-square-, t- and F-distribution).

30/8: Testing and fitting linear hypotheses (via quadratic form for \(C\hat\beta-d\) - Box 3.13 in Fahrmeir) or via F-test based on sums of squares for each model alternative (the restricted model fitted via Lagrange method (pp. 172-173) or using the solution to problem 2 below). Design matrices for interactions between numeric and categorical covariates.

4/9: Binary regression (ch. 5). Logit, probit and cloglog models with examples.

6/9: Binary regression continued. Score function of binary regression model. Some general properties of the expected log likelihood (sec. 4.1 in Wood (2015)). Expected and observed Fisher information and iterative computation of MLEs for binary regression (Fisher scoring algorithm).

11/9: Binary regression continued. Asymptotic properties of MLEs. Likelihood ratio, Wald, and score tests. Deviance and testing goodness-of-fit.

13/9: A minimal example of divergence of the Fisher scoring algorithm. More on the deviance and the saturated model. Deviance residuals. Estimating the overdispersion parameter. We'll also go through a sketch of a proof for the asymptotic distribution of LR test statistic (section 4.4 in Wood).

18/9: More on divergence of Fisher scoring algorithm. Model selection.

20/9: Theory behind AIC (Wood sec. 4.6). Poisson regression. Fisher scoring vs. Newton-Raphson for poisson-regression with non-canonical identity link (see R code for further illustrations).

25/9: Poisson regression continued. Example (Exam in ST2304, 2015, problem 2a-d). Gamma and lognormal regression. Glms in general and IRLS (ch. 5.4, 5.8.2).

27/9: Quasi-likelihood models (ch. 5.5). The F-test between nested alternatives (see Venables and Ripley 2002, eq. 7.10)). QAIC (see Burnham & Anderson 2002 p. 70). Linear separation

2/10, 4/10: No lectures. Work on obligatory exercise 2.

9/10: Offset variables, profile likelihood confidence intervals. Categorical regression models (ch. 6).

11/10: Categorical regression models continued. Multinomial models as discrete choice latent utility models.

16/10: Ordinal regression models.

18/10: Introduction to mixed models (ch. 7.1 and 7.2)

23/10: Mixed model continued. ML and restricted likelihood based on \(\mathbf{A}\mathbf{y}\) (REML) estimation (7.3). Bayesian/marginal likelihood interpretation of the restricted likelihood (Harville 1974).

25/10: Mixed models continued. The connection between the profile and restricted likelihood (7.3.2) (again, see Harville 1974). Hypothesis testing (7.3.4).

30/10: No lecture.

1/11: Mixed models continued: Conditional distributions/BLUPs of random effects (7.3.1, 7.3.3). Model selection.

6/11: Generalized linear mixed models (GLMMs). Differences between individual/cluster- and population-level effect sizes (i.e. marginal vs. conditional models) including approximation in Agresti, 2002, eq. 12.8

8/11: Methods of inference for GLMMs:
Marginal likelihood based on numerical integration (Gauss-Hermite quadrature (TMA4215)).
Laplace approximation (Kristensen et.al. 2016) of the marginal likelihood
and its computation via automatic (see e.g. Wood ch. 5.5.3) as opposed to numeric and symbolic differentiation.
Laplace approximation of the restricted likelihood (REML) for GLMMs (available in R-package glmmtmb). ~~Penalized quasi joint likelihood.~~

13/11: Summary of course. I'll go through some previous exams: 2017, problem 1, ST2304 2016, problem 3 (offset variables).

15/11: 2006, oppgave 1 (norwegian version) (english translation) (ordinal regression)++

20/11: Summary.

## Recommended exercises

1. Determine whether or not the gamma, the inverse Gaussian, the negative binomial and the uniform distribution on \((0,\theta)\) belong to the exponential family (Fahrmeier, ch. 5.4.1) and find their mean and variance via the general formulas \(EY=b'(\theta)\) and \(\operatorname{Var}Y=\frac\phi w b''(\theta)\)

2. For the linear model \(Y=X\beta + \epsilon\), one method for estimating the parameters under a linear hypothesis \(C\beta = d\) (\(r\) linear constraints on the \(p\) model parameters) is given in Fahrmeier, p. 172-173. Show that an alternative approach is to rewrite the model into an alternative form \(Y' = X_0\beta_0+\epsilon\) where \(X_0\) has dimension \(n \times (p-r)\) and \(\beta_0\) dimension \((p-r) \times 1\) and \(Y'\) is equal to the original \(Y\) minus some offset vector. Hint: https://en.wikipedia.org/wiki/Block_matrix#Block_matrix_multiplication

3. Assume that the \(n\times p\) matrix \(X\) has rank \(p\) and let \(H=X(X^T X)^{-1}X^T\). Show that \(H\) and \(I-H\) are both idempotent and symmetric matrices. Use this to show that the eigenvalues of both matrices are either 1 or 0.

4. Suppose that \(Y\) is Bernoulli distributed with parameter \(q\) and that \(X|Y=i \sim N(\mu_i,\sigma^2)\) for \(i=0,1\). Show that the logit of the conditional probability \(P(Y|X=x)\) can be written as \(\beta_0 + \beta_1 x\). Find \(\beta_0,\beta_1\) expressed in terms of \(q,\mu_1,\mu_2,\sigma\).

5. In R, fit the probit model to the juul data (the help page `?juul`

describes the data) using

library(ISwR) # Install the package if needed data(juul) juul$menarche <- factor(juul$menarche, labels=c("No","Yes")) juul.girl <- subset(juul,age>8 & age<20 & complete.cases(menarche)) attach(juul.girl) plot(age, menarche) glm(menarche ~ age, binomial(link="probit")) detach(juul)

and compute estimates of mean \(\mu\) and standard deviation \(\sigma\) of the underlying normal distribution of the latent time of menarche from estimates of \(\beta_0\) and \(\beta_1\) of the glm. Also find approximate estimates of standard errors of \(\hat \mu\) and \(\hat \sigma\) using the delta method.

6. Derive the likelihood, score function and the expected and observed Fisher information for the binary regression model \(y_i\sim \mbox{bin}(1,\pi_i)\) for the probit choice of link function. Express the observed Fisher information in terms of the standard normal cdf \(\Phi()\) and pdf \(\phi()\). Note also that \(\frac d{dz}\phi(z)=-z\phi(z)\). Verify that your answer for \(H(\beta)\) is correct by taking the expectation of the final expression.

7. A model is identifiable if there is a one-to-one mapping between the parameter vector \(\beta\) and the distribution of the data. If \(X\) has rank \(p-1\), why is a glm not identifiable? In such cases, what shape will the log likelihood surface \(l(\beta)\) take? (Consider the no-intercept model \(\eta_i = \beta_1 x_{i1} + \beta_2 x_{i2}\) as an example). How does this agree with and why is the observed Fisher information only be positive semi-definite?

8. For a glm with the canonical choice of link function, write the log likelihood on exponential family form and derive the observed Fisher information \(H(\beta)\) . Assuming that \(X\) has full rank, is \(H(\beta)\) positive definite? Based on this, what can we say about the overall shape of the log likelihood function? How many optima are possible at the interior of the parameter space?

9. For the linear model \(y = X\beta + \epsilon\), \(\epsilon \sim N(0,\sigma^2 I_n)\) where \(X\) has rank \(p<n\), derive the restricted log likelihood \(l_R(\sigma^2)\) based on the error contrasts \(w = A^T y\) where \(A\) satisfies \(A^T A=I_{n-k}\), \(AA^T=I_n-X(X^T X)^{-1}X^T\), and \(A^T X=0\). Use this to find the REML estimate of \(\sigma^2\). Hint: Recall that \(I-H = (1-H)^2=(1-H)^T(1-H)\).