# TMA4255 Applied Statistics, spring 2015

## Messages

[09/06/15] Dear students, as you know the exam result in TMA4255 are available at studweb.

I'm very impressed with your exam papers in TMA4255. I have seen many very good solutions - and given out many good grades (sadly there were also some Fs, but only 2). The grade frequency for TMA4255 ended up to be:

A: 11% B: 39% C: 33% D: 9% E: 4% F: 4%

In most courses we write a grading document, which is to be made available together with the grading: Gradings.pdf. Here you see in detail how the scores are given and the grade scale used.

[20/05/15] Error in the tentative solution has been found, please see corrected solution: TENTATIVE SOLUTION.

[16/05/15] Todays exam problem and tentative solutions are available here: english version of exam, bokmål version of exam, nynorsk version of exam and TENTATIVE SOLUTION.

[8/5/15] **For those who would like to sit down together and work on exam problems, and with lecturer present, I have booked G1 Wednesday 13.05.2015, 10:15-12:00.** You are also welcome to send email or come to my office for questions (email me before).

[8/5/15] Fractional factorial design is not part of the reading list (see now the reading list)

[8/5/15] (You need 32 out of 80 possible points in the written exam, to pass the exam)

[6/5/15] We meet at **G1** Thursday 7 of May 12.15-ca.14.00. We will go through (some) these problems:

- Which degrees of freedom to use in the t-tests
- V2013 Problem 4 (Maybe Problem 5)
- H2014 Problem 3 2
- V2014 Problem 2 3
- V2012 Problem 1 and 2
- H2012 Problem 1 and 3

[29/04/15] Check that you are on the project result list RESULTS, if you have send in a project (this year or from last year). If not, send me an email.

[29/04/15]

- Thursday 7 of May 12.15-ca.14.00 (maybe longer), I will go through some exam problems on the blackboard.

- Which problems will be given on this page early next week, so that you can have a look at them before hand if you wish.
- Place to meet will be announced on the wiki page.
**If you have anything else that you want to repeated**, please let me an email beforehand.- You can then afterwards sit down and solve exam problems
- If needed we can arrange an additional time where you can sit down and solve exam problems (with TA or lecturer attending)

[09/04/15] The minutes from the 2. and 3. meeting with the reference group is available at:

Minutes from reference group meeting 19.03.15, Minutes from reference group meeting 23.04.15

and sluttrapport:

Final rapport from reference group

[22/04/2015] **DOE project grades:**
The grading is finished, and the results are found at RESULTS

If you do not find your result you either must have entered the wrong candidate number, or I have not received your project - or I have made a mistake. Please contact me if think there is a mistake. The maximum score on the DOE project is 20. I'm utterly impressed with the creativity, good writing and good quality of the DOE projects. I feel that you have learned a lot from working with the project.

[22/04/14] The (third) meeting with the reference group will be on Thursday April 23 at 11.30 (realfagskantina). If you want to join the reference group or the meeting just email the lecturer. If you have feedback on textbook, lectures, exercises, voting, services you would like, etc. contact the reference group or the lecturer. Minutes (referat) from the meeting will be posted.

[20/4/15] The **project scores** will be posted on this page some time this week.

[15/4/15] Since the lecture Friday 10/04/15 was cancelled and some students are gone on excursion 16 April and would like to follow the summary lectures, here is a new plan for the last 3 lectures.

**Plan for the 3 last lectures**

- Thursday 16/04: last lecture on non-parametric tests (Wilcoxon signed-rank and rank-sum)

- Friday 17/04: First theme: Compare, monitor and/or test parameters in one or two populations. Parts 1, 5, 6, and 7

- Thursday 23/04: Second theme: Model response mean as a function of observed covariates. Parts 2, 3, and 4.

**If someone disagree about this plan, please talk to me!**

**Activities leading up to the exam on May 16th will be decided in the lecture Friday 17/04.**

The lectures the plan for the summary (2 last lectures) are as follows:

- Present problems (some taken from old exam problems), and discuss which statistical method(s) is(are) suitable?
- Perform the analyses (or just comment on what to do) - and stress important theoretical properties.
- Look at interpretations of outputs (plots and results) from statistical analyses of the problems.
- Make mindmaps connecting topics, ``situations'' and methods from the different parts of the course.

The topics of the course can be divided into two themes:

- First theme: Compare, monitor and/or test parameters in one or two populations. Parts 1, 5, 6, and 7
- Second theme: Model response mean as a function of observed covariates. Parts 2, 3, and 4.

My plan is to focus on the first theme on the Friday April 17 lecture and the second theme on the Friday April 23 lecture. I find this very hard, and I'm not sure if this will be a success.

There will be no new stuff at these last two lectures, so you will probably not miss out on anything if you don't attend:-)

The hard part of statistics is to select the best method to analyse a given situation and data set. This is the focus of the last two lectures of the course - show you examples and you reflect on which method to use.

Last lecture will then be Thursday 23/04.

[15/4/15]

[09/04/15] Due to sickness, the **lecture Friday 10/04/15 is cancelled**. (if possible/and need to we will have a lecture 23/04 to have time for the last lecture in topic 7 (non parametric tests) and the summary lectures)
Activities leading up to the exam on June 7th will be decided in the lecture tomorrow.

[08/04/15]

**This week (week 15), we go through Part 7-nonparametric tests (last topic) Thursday 09.04 and Friday 10.04**.- Next week (16), we will do a summary of the course (by choosing a few exam questions (from various topics) to work with).

[18/03/15] The second meeting with the reference group will (probably) be on Thursday 19th of March at 14.15 in room 1126, 11th floor Sentral building 2. If you want to join the reference group or the meeting just email the lecturer. If you have feedback on textbook, lectures, exercises, voting, services you would like, etc. contact the reference group or the lecturer https://wiki.math.ntnu.no/tma4255/2015v/start. Minutes (referat) from the meeting will be posted.

[18/03/15] You don’t need to consider under Keyword for Compulsory project point 4.Choice of design: **2 k-p fractional factorial or other design?** and
**Desired resolution of the design?** as resolution of design is connected to fractional design that is not part of the reading list.

[16/03/15] There will be no new exercise this week. But the teacher assistant will attend the exericse to answer questions about the project.

[03/03/15] New license key for Minitab: If you have installed Minitab on your own Windows computer, Minitab will before March 1 need a new license file. You find the lincence file here: https://software.ntnu.no/cgi-bin/pdw.py?action=list&path=./MiniTab saved to your computer (the default place to save is 'c:\program files (x86)\minitab'). If Minitab at start-up does not find this new lisence file it will ask you where you saved the file, and then you just navigate to where you saved the file. If you experience problems with this please contact the Orakel Support Services: orakel@ntnu.no or telephone 91500.

[03/03/15] Here is the workflow for the DOE that was in lectured 20.02.15:

- Set up factorial design (respons, factors: full 2^k or repeat)
- Randomize runs
- Perform experiments, write response values into worksheet
- Fit the full model. If you do not have replicates⇒suggest reduced model
- Fit reduced model (or full if replicates):⇒look at RESIDUALS to assess model fit⇒if transformation of y is needed: lig y, √y, 1/y ⇒ refit
- Assess significance
- Interpret your results. Main and interaction effect plot.

[03/03/15] Remember see and read wiki.math.ntnu page for the course for your DOE project.

*Under the "Message" tab:*

- Two messages [03/03/15] about residuals and the points for the DOE

*Under the "Compulsory project” tab:*
- Practical information
- Structure may follow the main (relevant) points made in the 'Keywords" given.
Etc.

*Under the "Lectures" tab:*

-Important to discuss:

1. Each experiments is a **genuine run replicate**, that is reflects the total variability of the experiment. (each trial should be a performed independently and constitue a full trial)

2. The run order is random (randomized) so that potential external factors are not confused (confounded) with experimental factors.

Look at the Workflow of the DOE.

The effect of orthogonality.

Anna

[03/03/15] Remember that in the interaction plots, if the lines are parallel this indicates no interaction effects. If they are not parallel, this indicates interaction effects. They do not necessarily need to cross for there to be an interaction effect, only not parallel.

[03/03/15] We encourage that the report on the project is written in English (to learn how to write scientific papers). But can be handed in Norwegian since both lecturer and TA, who will do the gradings (0-20 points), are Norwegian.

[03/03/15] For those of you that do project with 3 factors and need to do two repetitions, remember randomizations for all 16 experiments. If using blocking, remember to randomize within blocks (or Minitab do this for you).

[03/03/15] Fractional Factorial Designs will not be part of the curriculum/reading list of the course. So chapter 12 from Box, Hunter og Hunter, Statistics for experimenters are not part of the reading list of the course. The Note: Notat: Factorial experiments at two levels are still on the reading list for DOE.

[03/03/15] You find examples of old projects here: Compulsory project. **You are allowed to do the same experiments as the ones done previous years (the examples given)**.
You will do the experiment (maybe choosing your own low and high levels of factors) and getting your own response values and results.

[03/03/15] Now you have to the end of week 13 to finish your DOE project - deadline for hand-in is Friday March 27. **If you can not make this deadline you need to inform me about it and make a plan for when you may hand-in the project.** Remember to read the info at Compulsory project and also study the last slides 26-30 from lecture L11: L11.pdf

[03/03/15] Plotting of residuals and normality assumption:

* When solving linear regression models (using the least square method) we assume that the random errors (epsilon_i) are linear independent, Normally distributed with equal variance (linear, homogenous, normal and not correlated)

* We can not observe this random error term, and we therefore use the residuals (e_i=y_i-y^_i) to test the normality assumption from the fitted regression.

* We plot the residuals (often studentized residuals) to see if they are normally distributed and have equal variance (and linear).

- We plot the qq plot for residuals and histogram of residuals to look for normality.
- Plot residuals against predicted value of response y and regressors to look for equal variance.
- Plot residuals against observation order to look for non random error (in time).

* If we find that the residuals are not Normally distributed and have equal variance. This means that our model do not fit the data, so data do not fit the normal distribution assumption for the error term. You can also look at the Darling Anderson test to see how good your nornmal assumption is.

* This may lead to for instance biased standard errors that may lead to making wrong conclusions in hypothesis testing.

*Approach to handling data with non-normal random errors :

- Direct transformation of data to make random errors approximately normal (transformation may often help both non-normality and non-constant variation of the random errors)

- Box-Cox transformation of Y (find this in MINITAB): this transformation finds the best transformation to simultaneously stabilize the random error variance, make the distribution of the error term more normally distributed, and straighten the relationship between Y and the regressors (x`s).

* You can then fit and validate the model in the transformed variables.

* If you transform the response variable Y, you may want to transform the predicted values back into the original units using the inverse of the transformation applied to the response variable.

Anna

[16/02/15]We will try to give the ** exercises** (week 8-17) in R2 (http://www.ntnu.no/kart/kart-over-ntnu/gloeshaugen/realfagbygget/del-c-u1/r2/) Tuesdays 16.15-17.00 and R10 Wednesday 16.15-17.00 http://www.ntnu.no/kart/index.php?id=1523

**Bring your own computer!**

[13/02/15]** Due to sickness, the lecture (today) Friday 13/2/15 is cancelled.
**

[12/02/15] Remember to **vurderinsgsmelde** (registrer for exam) and **undervisningsmelde** (registrer to teaching/following the lectures in this course) to this course on the studentweb (deadline 15 February?).

[12/02/14] You find examples of old projects here: Compulsory project. You are allowed to do the same experiments as the ones done previous years (the examples given). You will do the experiment (maybe choosing your own low and high levels of factors) and getting your own response values and results.

[12/2/15]

* It will be possible (I encourage you) to talk with me/email me of what you plan to do in your experiments (your design) before performing the experiments, so I can “approve” your design before you perform your trails.

* it is important that you do not start with collecting data before we have been through the DOE introductory lectures (including week 8)

* read more info under Compulsory project

* you can work up to 2 in collaboration.

* Submission dates: before Easter, on Friday March 27 (week 13) at 12.00 at the latest.

The **compulsory project** in TMA4255 is described in detail at the “Compulsory project” tab on the left. Maybe you already may think of what your experiment should be, since we now understand the terms “response”, “normally distributed” and “covariate”. A factor is a covariate that is discrete - and we will only consider factors with two possible values (e.g. male/female, high/low temperature).

The project consists of designing, performing and analysing a socalled factorial experiment - which means that we do multiple linear regression with 3 or 4 covariates that are factors with two levels each. This is NOT an observational study - you should collect the observations yourself.

As an example assume I want to study factors that affect the height of plant sprouts (“from seed to a plant”)

1) You need to perform an multiple regression experiment consisting of 16 trials - that is, n=16 observations. For the plan example: you need to plant 16 seeds.

2) The response that is measure should be continuous, so that the response itself or a transformation of the response in a multippel regression model can be seen to be normally distributed.

It is also possible to assume that a response with at least 7 ordered categories can be seen as continuous. (If you have for example a taste panel, you have to have at least 7 ordered categories of how good the for example cake tasted, 1."Bad", 2."Not so good"…..7"Good". Project earlier year used independent rating scores (1-20) from (mean of) two persons were used as taste-result. These individuals tasted the cake "blind" so that the look of the cake should not interfere with the taste).

For the plan experiment we assume that this is the height of each plant after 5 days of growing.

3) You choose 3 or 4 factors with two levels each that might influence your response (it is possible to choose more factors, but then you need to do a socalled fractional factorial design). For the plant experiment we may choose covariate (factor) A=two different types of seed (sunflower or broccoli seeds), B=watering (coffee or water), C=growth medium (cotton or soil).

4) If you choose 3 factors you need to perform all possible combinations of the 3 factors two times (2*2*2=8), if you choose 4 factors you need to perform all possible combinations only once (2*2*2*2=16). If you choose more than 4 factors you need to study the “factional factorials” to find out which of the possible combinations you perform. For the plant experiment we then have 2 plants that will have the same combinations for factors A,B and C.

5) A very important aspect of performing the 16 trials is that the trials should be independent and performed in a randomized order. For the plant experiment this just means that we have 16 plant which are handled in the same manner. This is often the difficult part!

For other types of experiments, like if you want to test yourself by measuring your pulse rate when running up a hill with different factors being with/without heavy backpack, with/without sports shoes, running backwards or forwards, the order that you perform the experiments will matter - and then you need to do this in random order. The ideal would to do one measurement each day, but that might be difficult, and then you instead need to do a few runs every day. Then there might be a day effect - which may be handled with more advance theory that we call blocking - that means, that you need to wait until after we have covered this topic.

6) Each trial should be a performed independently of the other 15, and constitue a full trial. I will try to explain this with a common mistake done. Assume you want to study factors that affect the taste of muffins. Then you really need to make 16 different muffins that are made from 16 doughs and baked in the oven one at a time. If you only make one dough and bake all muffins at the same time you have much less variability than the experiment in real life will have. If you for practical reasons need to handle more than one trial together this is called blocking and should be taken into account (you then first need to learn about blocking).

I suggest you now look at the list of experiments that students have done previously - listed on the www-address above - and talk to me or the TA or the Student assistant before you start performing the trials.

You may talk to us at the exercise supervision, or come to my office (office hrs Thursdays 10-11, or email me if you want to come at another time).

Anna

[09/2/15] The minutes from the meeting 1 with the reference group is available at:

Minutes from reference group meeting 05.02.15

If you have comments or observations that have not been made, please contact the reference group or lecturer.

In the end of the minutes there are action point some of these are:

To accommodate this request for more examples and problem solving, the teacher will try to set up an extra half hour (this has to be preferable before or after an lecture) where exam problems are solved. This may be not be every week.

Excercises: We may try to move the exercises to an auditorium, with a projector and students bring their own computer.

[02/02/15]
The meeting with the **reference group** will be on **Thursday 5th of February at 10.00 in room 734**, 7th floor Sentral building 2. If you want to join the reference group or the meeting just email the lecturer. If you have feedback on textbook, lectures, exercises, voting, services you would like, etc. contact the reference group or the lecturer https://wiki.math.ntnu.no/tma4255/2015v/start. Minutes (referat) from the meeting will be posted.

[02/02/15] About the **exam** and the aids permitted to bring with you

- Date: 9.00-13.00, May 16, 2015.
- Duration: 4 hrs.
- Written.
- Makes up 80% of the final grade in TMA4255 (a compulsory project gives the final 20 %).
**Permitted aids**:**Approved Calculator****"Tabeller og formler i statistikk", Tapir forlag (http://www.akademikaforlag.no/node/766)****One yellow sheet (A4 with stamp) with your own handwritten formulae and notes. (You can get this page from the department office in 7th floor of Sentralbygg 2. You may write on both sides!)**

[26/01/15] Because of Teknodagen 2015 29.01.15 I have agreed to change room for the Thursday lecture as they would like to use R9.
**Thursday lecture 29.01, 12.15-14.00 are therefor in room S2 (http://www.ntnu.no/kart/gloeshaugen/sentralbygg-1/1-etasje/s2/).**

[21/01/15] There will be a extra exercise hour at **Wednesdays (from today 21.01.15) 16.15-17 at computerlab H3,421 Rall** http://www.ntnu.no/kart/index.php?id=2499. We see how many who turns up in the following exercises, and change the hours if not appropriate.

[19/01/15] Exercise 2, Introduction course to R, will be held at room K26 (Kjemi 4 (romnr: 4.108))http://www.ntnu.no/kart/gloeshaugen/kjemi-4/1-etasje/k26/ Tuesday, 20.01.2015, 16:15-17:00.
**This is not a computerlab so you need to bring your computer with R and Rstudio (preferable)**

[12/01/15] Remember to **vurderinsgsmelde** (registrer for exam) and **undervisningsmelde** (registrer to teaching/following the lectures in this course) to this course on the studentweb.

[16/12/14] First lecture is Thursday January 8 12.15-14 in R9 (Realfagbygget (romnr: AU1-101))