TMA4315 Generalized linear models, autumn 2020

Messages

January 15: The grades are out. The solution is also available through the link at the bottom of the page.

Grade Count
A8
B12
C6
D4
E0
F1

November 1627: The exam will be a graded digital home exam (via Inspera). All written aids, computer aids and use of written sources on the internet are permitted. The exam questions will (us usual) aim to test understanding rather than memorisation of the course material. I will be available via zoom during the exam in case any of the questions need clarification (you will have to wait in a "zoom waiting room" if you want assistance).

Any form of collaboration is not allowed including giving assistance to other students, see these regulations for further details. The exam problems given to different students may differ but I will not disclose further details on this. In accordance with regulation for the IE-faculty, after the grades have been announced, you may be required to attend a meeting to verify that what you have handed in is your own work. There will be no control interview after the exam.

November 16: Todays and tomorrows lecture will be digital only (no physical lecture). We will provide guidance for project 3 probably both via physical attendance at the Banachroom and via zoom (use this link) (in addition to the piazza forum).

November 2: In addition to Michail form 11-12, I'll be available to answer question about project 3 the next Wednesdays from 10-11. Next lectures are on november 16 and 17.

October 27: The link to Wood (2015) was broken and has been fixed.

October 26: As of now, we hope that the exam can take place as planned on December 15 but it remains to be seen if the NTNU administration will be able to provide enough room such that this can happen within the regulations of the health authorities. Plan B is a graded 4 hour home exam (no collaboration allowed).

August 31: When you attend lectures, follow these instructions.

August 24: Fill in preferred timepoints for the weekly exercise/project via the link provided below.

August 11: In accordance with NTNU Covide-19 regulations, please fill in the form at https://tinyurl.com/glmntnu _every_ time you attend a lecture or exercise.

Practical information

Lectures: Tuesdays 10:15-12:00 in R9 and Mondays 16:15-18:00\(^*\) in KJL1 (* only until 17:00 in week 39-41).

Guidance with exercises: Wednesdays 11-12 in Sentralbygg I, room 265 until week 37.Time and place to be decided - please fill in possible timepoints here. In addition we will use this piazza forum (I think you need to sign up with your @stud.ntnu.no email adresse).

If you can't attend lectures or exercises physically, you this link. You need to authenticate yourself using "Sign in with SSO" and entering "ntnu" as "Your company name" and then your ntnu username and password.

Lecturer: Jarle Tufto

Teaching assistant: Michail Spitieris.

Reference group: NN, NN, NN (send me an email if you want to be in the reference group).

Obligatory exercises

There will be three obligatory exercises (counts 30% of final grade).

Project 1 Tenative solution

Project 2 Tentative solution

Project 3 Tentative solution

You'll get help with the projects in the Banachroom on the third floor (each Wednesday 11-12 from week 38). If you don't have access to Nullrommet, please fill in your information in this form. Due to the pandemic, the number of students than can be present in this room is limited to 25. We will make necessary arrangements if we reach this limit.

Tentative curriculum

Fahrmeir et. al. (2013) (freely available on springer link), ch. 2.1-2.4, B.4, 5.1-5.4, 5.8.2, 6, 7.1-7.3, 7.5, 7.7. We will also use some material from Wood (2015) Wood (2015), see below.

This covers ordinary linear and multiple regression (mostly repetition from Linear statistical models), binary regression, Poisson and gamma regression, the exponential family and generalised linear models in general, categorical regression (includes contingency tables and log-linear models, multinomial and ordinal regression), linear mixed effects models, generalized linear mixed effects models.

Also see the official ntnu course info.

Lectures

R code from the lectures (markdown) (2019 version)

Lecture notes (handwritten in notability)

August 18: Introduction to glms (ch. 2.1-2.3), the exponential family (ch. 5.4.1). Video (first half has bad audio).

August 24: More on the exponential family (ch. 5.4.1). Review of theory of linear models (ch. 3). Video (only second half)

August 25: Geometric views of the linear model. Sampling distributions associated with the linear model (the chi-square-, t- and F-distribution). Video

August 31: Testing and fitting linear hypotheses (via quadratic form for \(C\hat\beta-d\) - Box 3.13 in Fahrmeir) or via F-test based on sums of squares for each model alternative (the restricted model fitted via Lagrange method (pp. 172-173) or using the solution to problem 2 below). Design matrices for interactions between numeric and categorical covariates. Video (unfortunately, screen sharing in zoom on my ipad would not work after the break)

September 1: Binary regression (ch. 5). Logit, probit and cloglog models with examples. Binary regression continued. Score function of binary regression model.Video.

September 7: Some general properties of the expected log likelihood (sec. 4.1 in Wood (2015)). Expected and observed Fisher information and iterative computation of MLEs for binary regression (Fisher scoring algorithm). Video.

September 8: A minimal example of divergence of the Fisher scoring algorithm. Binary regression continued. Asymptotic properties of MLEs. Likelihood ratio, Wald, and score tests. Deviance and testing goodness-of-fit. Video.

September 14: More on the deviance and the saturated model. Deviance residuals. Estimating the overdispersion parameter. We'll also go through a sketch of a proof for the asymptotic distribution of LR test statistic (section 4.4 in Wood). Video.

September 15: Example (lung cancer rates) illustrating model selection via AIC, model parsimony, Wald and likelihood ratio testing. Video

September 21: No lecture.

September 22: Theory behind AIC (Wood sec. 4.6). Poisson regression. Fisher scoring vs. Newton-Raphson for poisson-regression with non-canonical identity link (see R code for further illustrations). See e.g. this paper for motivation. Video (some technical problems this time).

September 28 (1 hour only): Gamma and lognormal regression. Glms in general and IRLS (ch. 5.4, 5.8.2). Video.

September 29: Quasi-likelihood models (ch. 5.5). The F-test between nested alternatives (see Venables and Ripley 2002, eq. 7.10)). QAIC (see Burnham & Anderson 2002 p. 70). Linear separation. Video.

October 5 (1 hour only): Offset variables, profile likelihood confidence intervals. Video.

October 6: Categorical regression models (ch. 6). Categorical regression models continued. Multinomial models as discrete choice latent utility models. Video.

October 12: Ordinal regression models. General treatment of multinomial GLMs. Video.

October 13: Introduction to mixed models (ch. 7.1 and 7.2) (I must have forgotten to press record so no video so see lecture notes).

October 19: Mixed model continued. ML and restricted likelihood based on \(\mathbf{A}\mathbf{y}\) (REML) estimation (7.3). Bayesian/marginal likelihood interpretation of the restricted likelihood (Harville 1974). Video.

October 20: Mixed models continued. The connection between the profile and restricted likelihood (7.3.2) (again, see Harville 1974). Hypothesis testing (7.3.4). Video.

October 26: Mixed models continued: Conditional distributions/BLUPs of random effects (7.3.1, 7.3.3). Video.

October 27: Generalized linear mixed models (GLMMs). Differences between individual/cluster- and population-level effect sizes (i.e. marginal vs. conditional models) including approximation in Agresti, 2002, eq. 12.8. Video.

November 2: Methods of inference for GLMMs: Marginal likelihood based on numerical integration (Gauss-Hermite quadrature (TMA4215)). Laplace approximation (Kristensen et.al. 2016) of the marginal likelihood and its computation via automatic (see e.g. Wood ch. 5.5.3) as opposed to numeric and symbolic differentiation. Laplace approximation of the restricted likelihood (REML) for GLMMs (available in R-package glmmtmb). Penalized quasi joint likelihood.. Video.

November 3, 9 and 10: Work with project 3

November 16: Summary of course. I'll go through some previous exams: 2017, problem 1.

November 17: 2006, oppgave 1 (norwegian version) (english translation) (ordinal regression)++ ST2304 2016, problem 3 (offset variables) Video.

1. Determine whether or not the gamma, the inverse Gaussian, the negative binomial and the uniform distribution on \((0,\theta)\) belong to the exponential family (Fahrmeier, ch. 5.4.1) and find their mean and variance via the general formulas \(EY=b'(\theta)\) and \(\operatorname{Var}Y=\frac\phi w b''(\theta)\)

1b. Show that that the third central moment of a random variable \(X\) equals the third cumulant.

2. For the linear model \(Y=X\beta + \epsilon\), one method for estimating the parameters under a linear hypothesis \(C\beta = d\) (\(r\) linear constraints on the \(p\) model parameters) is given in Fahrmeier, p. 172-173. Show that an alternative approach is to rewrite the model into an alternative form \(Y' = X_0\beta_0+\epsilon\) where \(X_0\) has dimension \(n \times (p-r)\) and \(\beta_0\) dimension \((p-r) \times 1\) and \(Y'\) is equal to the original \(Y\) minus some offset vector. Hint: https://en.wikipedia.org/wiki/Block_matrix#Block_matrix_multiplication.

2b. When testing nested linear model alternatives, show that the \(F\)-test is equivalent (leads to the same conclusions) as a test using the likelihood ratio as the test statistic.

3. Assume that the \(n\times p\) matrix \(X\) has rank \(p\) and let \(H=X(X^T X)^{-1}X^T\). Show that \(H\) and \(I-H\) are both idempotent and symmetric matrices. Use this to show that the eigenvalues of both matrices are either 1 or 0.

4. Suppose that \(Y\) is Bernoulli distributed with parameter \(q\) and that \(X|Y=i \sim N(\mu_i,\sigma^2)\) for \(i=0,1\). Show that the logit of the conditional probability \(P(Y|X=x)\) can be written as \(\beta_0 + \beta_1 x\). Find \(\beta_0,\beta_1\) expressed in terms of \(q,\mu_1,\mu_2,\sigma\).

5. In R, fit the probit model to the juul data (the help page ?juul describes the data) using

library(ISwR) # Install the package if needed
data(juul)
juul$menarche <- factor(juul$menarche, labels=c("No","Yes"))
juul.girl <- subset(juul,age>8 & age<20 & complete.cases(menarche))
attach(juul.girl)
plot(age, menarche)
glm(menarche ~ age, binomial(link="probit"))
detach(juul)

and compute estimates of mean \(\mu\) and standard deviation \(\sigma\) of the underlying normal distribution of the latent time of menarche from estimates of \(\beta_0\) and \(\beta_1\) of the glm. Also find approximate estimates of standard errors of \(\hat \mu\) and \(\hat \sigma\) using the delta method.

5c. If using the logit- instead of the probit-link in problem 5, what implicit assumption are we making about the underlying distribution of time of menarche? Compute the mean and standard deviation of the latent times for this alternative model. Which estimates do you trust the most?

5d. Show that the cloglog choice of link function for problem 5 and log transforming age would correspond to the assumption that the underlying latent age of menarche have a Weibull distribution.

5b. For linear models, the regression can be forced through the origin by omitting the first column of the design matrix \(\mathbf{X}\). Is there any way to force a binomial regression using the logit link through the origin such that \(p_i=0\) for \(x_i=0\)?

6. Derive the likelihood, score function and the expected and observed Fisher information for the binary regression model \(y_i\sim \mbox{bin}(1,\pi_i)\) for the probit choice of link function. Express the observed Fisher information in terms of the standard normal cdf \(\Phi()\) and pdf \(\phi()\). Note also that \(\frac d{dz}\phi(z)=-z\phi(z)\). Verify that your answer for \(H(\beta)\) is correct by taking the expectation of the final expression.

7. A model is identifiable if there is a one-to-one mapping between the parameter vector \(\beta\) and the distribution of the data. If \(X\) has rank \(p-1\), why is a glm not identifiable? In such cases, what shape will the log likelihood surface \(l(\beta)\) take? (Consider the no-intercept model \(\eta_i = \beta_1 x_{i1} + \beta_2 x_{i2}\) as an example). How does this agree with and why is the observed Fisher information only be positive semi-definite?

8. For a glm with the canonical choice of link function, write the log likelihood on exponential family form and derive the observed Fisher information \(H(\beta)\) . Assuming that \(X\) has full rank, is \(H(\beta)\) positive definite? Based on this, what can we say about the overall shape of the log likelihood function? How many optima are possible at the interior of the parameter space?

9. For the linear model \(y = X\beta + \epsilon\), \(\epsilon \sim N(0,\sigma^2 I_n)\) where \(X\) has rank \(p<n\), derive the restricted log likelihood \(l_R(\sigma^2)\) based on the error contrasts \(w = A^T y\) where \(A\) satisfies \(A^T A=I_{n-k}\), \(AA^T=I_n-X(X^T X)^{-1}X^T\), and \(A^T X=0\). Use this to find the REML estimate of \(\sigma^2\). Hint: Recall that \(I-H = (1-H)^2=(1-H)^T(1-H)\).

10. Let \(Y \sim \text{Gamma}(a,b)\). For which parameter values is \(\ln Y\) skewed to the left, to the right and without skew? Hint: Derive the third central moment of \(\ln Y\) via the moment and cumulant generating functions. The polygamma function is available in R under the name psigamma.

11. For a general linear model \(y = X\beta + \epsilon\), \(\epsilon \sim N(0,V)\), show that the ordinary least squares estimator \(\hat\beta_\text{OLS}=(X^T X)^{-1}X^T y\) is unbiased for \(\beta\) but that \(\hat\beta_{OLS}\) is less efficient than the general least squares estimator \(\hat\beta_\text{GLS}=(X^T V^{-1} X)^{-1}X^T V^{-1} y\) in the sense that \(\operatorname{Var}\hat\beta_\text{OLS}\ge \operatorname{Var}\hat\beta_\text{GLS}\) where the notation \(A\ge B\) means that \(A-B\) is positive semi-definite. Hints: 1) If \(A\) and \(B\) are positive definite, then \(A-B\ge 0\) is equivalent to \(B^{-1}-A^{-1}\ge 0\). 2) If \(P\) is an orthogonal projection matrix, then \(P = P P^T\).

Solutions (updated December 6, 2020.

Exam

Previous exams can be found at previous exams out of which 2013-2016 are not the most relevant.

Any questions on the exponential family in this years exam will be based on the notation in Fahrmeir. The solutions to exams in 2008 and 2009 is based on a slightly more general definition of the exponential family and different notation (Dobson & Barnett 2008) writing the pdf of pmf as as \(\exp(b(\theta)a(y) + c(\theta) + d(y))\) instead of \((\exp((y\theta - b(\theta))w/\phi - c(y,\phi,w)\) in Fahrmeir. Thus

  • \(b(\theta)\) is the canonical parameter corresponding to \(\theta\) in Fahrmeir
  • \(a(y)\) corresponds to \(y\),
  • \(c(\theta)\) (implicitly a function of the canonical parameter \(b(\theta)\)) corresponds to \(b(\theta)\). Implicit differentiation then leads to a different formula for \(EY=-c'(\theta)/b'(\theta)\).
  • \(d(y)\) corresponds to \(c(y,\phi,w)\).

Note that if a random variable \(Y\) satisfies the definition of Dobson & Barnett, then \(a(Y)\) satisfies the definition in Fahrmeir.

2021-01-15, Jarle Tufto