ST2304 Statistical modelling for biologists/biotechnologists, spring 2013

Messages

  • May. 21: Foreslår at de som har spørsmål stikker innom meg (eller eventuellt Erik Blystad) i 12. etg. SII, rom 1232. Jeg (vi) kan stort sett treffes tirsdag, onsdag og torsdag mellom 10 og 16 (ikke mellom 12 og 13 og ikke mellom 14 og 15 på torsdag).
  • Feb. 6.: A small section on model parsimony and AIC has been added to handout 1.
  • Jan. 30. The assignment should from now on be submitted via It's learning.
  • Jan. 23. If you have signed up for the wednesday 12-14 group, you may consider moving to one of the other groups which have fewer students. I will also be present part of the time if needed. You may also ask questions at the its learning discussion forum.
  • Jan. 21. Send e-post hvis du vil sitte i referansegruppa.
  • Jan. 21. The deadline for assignment 1 is thursday 24, 24:00. Submit your assigment by email for now (see "Schedule…" to the left for details).
  • Jan. 17. You should get access to computerlab 524 in Høyskoleringen 3 shortly. If the problem persists, let me know.
  • Jan. 16: To get a more useful character encoding (e.g. to make norwegian characters work) in RStudio on Mac OS X you may need to execute the command 'defaults write org.R-project.R force.LANG en_US.UTF-8' in a terminal window, restart RStudio and then set default text encoding to 'UTF-8' in the RStudio preference dialog.
  • Jan. 15: All lectures are moved from S4 to S5 (including tomorrows lecture at 8:15).
  • Jan. 9: The first lecture is monday january 14, 15:15-17:00 in S4. Also sign up for one of the computer lab groups by following the link in the menu in the left margin.
  • Jan. 9: For biologist also interested in theoretical population dynamics and population genetics we recommend the course ST2302 given by Steinar Engen this spring.
  • Jan. 9: You must pass 6 out of 12 assignments to be admitted to the final exam.
  • Jan. 9: When handing in your report on each assignment the report must be in the form of a single word or pdf-file and must (of course) include your name and your email address. The report should be clearly written in norwegian or english and formatted so that it is intelligible to your peers (use complete sentences). The R-code you have used should also be included.
  • RSS-feed for thus webpage and subpages are available through the orange icon (choose "current namespace") to the right in the address bare of most web browswers.

Plan and course content (preliminary)

Part*ThemeAssignment
1Dalgaard, kap. 1, Dalg. kap 3. Assignment 1 Solution
2Plotting functions and parametric curves. Linear regression, residuals, prediction and confidence bands (Dalg. 6.1-6.3). Multiple regression (Dalg. 11.1, 11.2, Løvås, 7.5). Dummy variables. Assignment 2 Solution
3The F-distribution. Comparison of variances (Dalg. 5.4). One- and two-way analysis of variance with balanced design (Løvås 8.3, Dalg. 7). Factors encoded as dummy variables (Dalg. 12.3)Assignment 3 Google docs data file Solution
4Linear models without balanced design, model selection ( Handout 1, Dalg. kap 11.3) Assignment 4 (problem 2 is currently being revised) Solution
5The multinomial distribution, contingency tables, chi-square tests (Løvås 5.9.4, 8.5, Dalg. 8, Handout 2 not including 2.2.3).Assignment 5 Google docs spreadsheet for problem 1 Solution
6Generalized linear models: Logistic regression, deviance (Dalg. 13, excluding 13.3, Handout 4). The delta method Handout 3Assignment 6 Solution
7Probit- og cloglog-link, offset-variables Handout 4 Assignment 7 Google docs spreadsheet Solution
8Linear separation (Handout 4, sect. 4). Poisson response (Dalg. 1515.2). Overdispersion (Handout 4, sect. 6). Assignment 8 Updated solution
9 (week 11)Interaksjon mellom kovariater (Dalg. 12.5, 12.7.2). Litt om programmering (Dalg. 2.3) Assignment 9 Solution
Biology/Biotech on excursions and easter-vacation (in week 12, 13 og 14), assignment 9 continues in week 15
10 (week 15Numerical maximisation of the likelihood, asymptotic theory for approximate standard errors and likelihood ratio tests (Handout 5) and section 2.2.3 in handout 2.Assignment 10 Solution
11 (week 16)Simulation based methods (Handout 5, sect. 4)Assignment 11Solution
12 (week 17)Power and calculation of sample size (Dalg. 9). Simulation based power calculations (Handout 5, sect 4.5)Assignment 12 Solution
13 (week 18) Summary of the course (no lecture on International Workers' day)

* The lectures for part 1 is in week number 3 with the associated assignment in week 3/4 and so on. Each assignment should be handed in by friday at 12:00 (in week 4 for assignment 1 and so on) by email according to the student assistants (more information coming soon).

Final exam

May 24. The final exam will take the usual form, that is, without the aid of a computer or R. Permitted aids are a pocket calculator, Tabeller og formler i statistikk (Tapir forlag), Matematisk formelsamling (Rottmann), one handwritten yellow a4-sheet. Problems may include:

  • Interpretation of the summary of a fitted model. This may include plot of residuals etc. What is the meaning of the various parameter estimates? What assumptions does the model involve. How may the model be improved?
  • A data set may be presented with a description of the different variables and a brief description of the biological context. You should then propose a suitable statistical model (for example a generalized linear model with a certain link function) which can be used to analyse that data. This should include a rationale behind your choice of model, what assumptions the model involves etc.
  • A simple problem involving writing an expression or an R function which carries out a simple computation which demonstrates that you have understood vectorized operations, indexing, data frames, selection based on logical vectors, how standard distributions are handled (the different d-, p-, q- and r- functions), the relationship between mathematical and symbolic notation for model formulae etc.
  • Some simple mathematical derivation based on probability theory or principles for statistical inference covered here and in ST0103.

Remember that the final exam is not the objective of this course, the objective is to gain the practical experience with analysing data necessary in later research during your master degree and in order to be able to read and understand the primary literature.

2013-05-21, Jarle Tufto