# TMA4255 Applied Statistics, spring 2016

[23/08/16] The results for the H2016 should soon be or are available at studweb. Again I'm very impressed with your exam papers in TMA4255.

In most courses we write a grading document, which is to be made available together with the grading: Gradings.pdf. Here you see in detail how the scores are given and the grade scale used.

A corrected tentative solution can be found here: TENTATIVE SOLUTION H2016.

[22/06/16] Dear students, as you know the exam result in TMA4255 should soon be or are available at studweb.

I'm very impressed with your exam papers in TMA4255. I have seen many very good solutions - and given out many good grades (sadly there were also some Fs, but only 2). The grade frequency for TMA4255 ended up to be:

A: 22% B: 37% C: 19% D: 10% E: 8% F: 4%

In most courses we write a grading document, which is to be made available together with the grading: Gradings.pdf. Here you see in detail how the scores are given and the grade scale used.

A corrected tentative solution can be found here: TENTATIVE SOLUTION.

[22/06/16] The end rapport from the reference group for the course:

Final rapport from reference group

[03/06/16] Todays exam problem and tentative solutions (hopefully without errors) are available here: english version of exam, bokmål version of exam, nynorsk version of exam and TENTATIVE SOLUTION.

[25/5/16] Friday 27 of May 13.30 in S1, we continue on solving exam problems. You are also able to come at the office hour Thursday to ask questions (and next week Tuesday and Thursday).

[23/5/16] The exam preparations lecture Monday 23 of May 13.30 is cancelled. Maybe Friday 27 of May 13.30 is a new time?

[20/5/16] We will continue the exam problems that we started on Thursday 19/5 on Monday 23 of May 13.30 G21. (I will probably arrange a time on Thursday 26 of May for some extra guidance)

[12/5/16] We meet at **S1** Thursday 19 of May 12.15-ca.14.00 (maybe longer). We will go through (some) these problems:

- Which degrees of freedom to use in the t-tests
- V2013 Problem 4 (Maybe Problem 5)
- H2014 Problem 3 2
- V2014 Problem 2 3
- V2012 Problem 1 and 2
- H2012 Problem 1 and 3

[05/05/16] Thursday 19 of May 12.15-ca.14.00 (maybe longer), I will go through some exam problems on the blackboard.
**If this is not a good date and time for you let me know!**

- Which problems will be given on this page early next week, so that you can have a look at them before hand if you wish.
- Place to meet will be announced on the wiki page.
- If you have anything else that you want to repeated, please let me an email beforehand.
- You can then afterwards sit down and solve exam problems
- If needed we can arrange an additional time where you can sit down and solve exam problems (with TA or lecturer attending)

[04/05/16] The minutes from the meeting 3 with the reference group is available at:

Minutes from reference group meeting 03.05.16

[02/05/2016] **DOE project grades:**
The grading is finished, and the results are found at RESULTS

If you do not find your result you either must have entered the wrong candidate number, or I have not received your project - or I have made a mistake. Please contact me if think there is a mistake.
The maximum score on the DOE project is 20. I'm utterly impressed with the creativity, good writing and good quality of the DOE projects. I feel that you have learned a lot from working with the project.
**For those that have got IKKE BESTÅTT/NOT PASSED please send me an email/contact us as soon as possible if you would like to correct this before the exam or else you are not allowed to take the exam (at least 8 points to pass). **

[02/05/16] The third meeting with the reference group will be on Tuesday 3th of May at 14.00 (room 738 , 7. floor Sentral building II). If you want to join the reference group or the meeting just email the lecturer. If you have feedback on textbook, lectures, exercises, voting, services you would like, etc. contact the reference group or the lecturer https://wiki.math.ntnu.no/tma4255/2016v/start. Minutes (referat) from the meeting will be posted.

[30/04/16] The minutes from the meeting 2 with the reference group is available at:

Minutes from reference group meeting 07.04.16

[08/04/16] If you planning to use your last year (2015) (approved) project to be used for this years exam, please let me know! This is for people that have taken the course before!

[07/04/16] Deadline for handing in the project is tomorrow! If you have problems handing it in by tomorrow, let me know!

[07/04/16] The datasets associated with the exercises in the textbook, Probability and statstics, are available for download at datasets (this address is found at page XV in the textbook)

[05/04/16] Due to sickness the lecture today is cancelled. Please read chapter 17.5 and 17.6 for Control charts for attributes and cusum control charts (Statistical process control)

[01/04/16] The second meeting with the reference group will be on Thursday 7th of April at 14.00 S1. If you want to join the reference group or the meeting just email the lecturer. If you have feedback on textbook, lectures, exercises, voting, services you would like, etc. contact the reference group or the lecturer https://wiki.math.ntnu.no/tma4255/2016v/start. Minutes (referat) from the meeting will be posted.

[01/03/16] Maria Helene Kalkvik is interested in finding a partner for the DOE project, if you don’t have a partner and are interested in having one, please contact her on mkalkvik [at] gmail [dot] com.

[25/02/16] The minutes from the meeting 1 with the reference group is available at:

Minutes from reference group meeting 18.02.16

If you have comments or observations that have not been made, please contact the reference group or lecturer.

[25/02/16] **Now you have to the end of week 14 to finish your DOE project** - deadline for hand-in is Friday April 8 (week 14) at 12.00 at the latest. **If you can not make this deadline you need to inform me about it and make a plan for when you may hand-in the project.** Remember to read the info at Compulsory project and also study the last slides 26-30 from lecture L11: L11.pdf

[25/02/16] Here is the **workflow** for the DOE that was in lectured 23.02.16:

- Set up factorial design (respons, factors: full 2^k or repeat)
- Randomize runs
- Perform experiments, write response values into worksheet
- Fit the full model. If you do not have replicates⇒suggest reduced model
- Fit reduced model (or full if replicates):⇒look at RESIDUALS to assess model fit⇒if transformation of y is needed: lig y, √y, 1/y ⇒ refit
- Assess significance
- Interpret your results. Main and interaction effect plot.

[25/02/16] * Remember see and read wiki.math.ntnu page for the course for your DOE project.

* *Under the "Message" tab:*

- Two messages [25/02/16] about residuals and the points for the DOE

* *Under the "Compulsory project” tab:*
- Practical information
- Structure may follow the main (relevant) points made in the 'Keywords" given.
Etc.

*Under the "Lectures" tab:*

-Important to discuss:

1. Each experiments is a **genuine run replicate**, that is reflects the total variability of the experiment. (each trial should be a performed independently and constitue a full trial)

2. The run order is random (randomized) so that potential external factors are not confused (confounded) with experimental factors.

Look at the Workflow of the DOE.

The effect of orthogonality.

* For those of you that do project with **3 factors and need to do two repetitions**, remember randomizations for all 16 experiments. If using blocking, remember to randomize within blocks (or Minitab do this for you).

* Remember that in the interaction plots, if the lines are parallel this indicates no interaction effects. If they are not parallel, this indicates interaction effects. They do not necessarily need to cross for there to be an interaction effect, only not parallel.

* We encourage that the report on the project is written in English (to learn how to write scientific papers). But can be handed in Norwegian since both lecturer and TA, who will do the gradings (0-20 points), are Norwegian.

[25/02/16]
**Plotting of residuals and normality assumption:**

* When solving linear regression models (using the least square method) we assume that the random errors (epsilon_i) are linear independent, Normally distributed with equal variance (linear, homogenous, normal and not correlated)

* We can not observe this random error term, and we therefore use the residuals (e_i=y_i-y^_i) to test the normality assumption from the fitted regression.

* We plot the residuals (often studentized residuals) to see if they are normally distributed and have equal variance (and linear).

- We plot the qq plot for residuals and histogram of residuals to look for normality.
- Plot residuals against predicted value of response y and regressors to look for equal variance.
- Plot residuals against observation order to look for non random error (in time).

* If we find that the residuals are not Normally distributed and have equal variance. This means that our model do not fit the data, so data do not fit the normal distribution assumption for the error term. You can also look at the Darling Anderson test to see how good your nornmal assumption is.

* This may lead to for instance biased standard errors that may lead to making wrong conclusions in hypothesis testing.

*Approach to handling data with non-normal random errors :

- Direct transformation of data to make random errors approximately normal (transformation may often help both non-normality and non-constant variation of the random errors)

- Box-Cox transformation of Y (find this in MINITAB): this transformation finds the best transformation to simultaneously stabilize the random error variance, make the distribution of the error term more normally distributed, and straighten the relationship between Y and the regressors (x`s).

* You can then fit and validate the model in the transformed variables.

* If you transform the response variable Y, you may want to transform the predicted values back into the original units using the inverse of the transformation applied to the response variable.

Anna

[22/02/15] This week (8) we will go through the last part of DOE. First we finish the DOE-note, available from the lectures tab L10, Note. What is left in the note is from page 7 -15, variance estimation, replicated experiments and blocking.

Then we turn to the last topic of DOE, which is how to perform fractions of a full experiment. Then we will use ch 12 from the famous book by Box, Hunter and Hunter, Statistics for experimenters. Due to copyright issues I have filed the pdf in Its learning (TMA4255/Handouts folder), so you need to go there - or email me at anna.holand@math.ntnu.no if you have troubles finding the file.

[16/02/16] You find examples of old projects here: Compulsory project. You are allowed to do the same experiments as the ones done previous years (the examples given). You will do the experiment (maybe choosing your own low and high levels of factors) and getting your own response values and results.

[16/2/16]

* It will be possible (I encourage you) to talk with me/email me of what you plan to do in your experiments (your design) before performing the experiments, so I can “approve” your design before you perform your trails.

* it is important that you do not start with collecting data before we have been through the DOE introductory lectures (including week 8)

* read more info under Compulsory project

* you can work up to 2 in collaboration.

* Submission dates: after Easter, on Friday April 8 (week 14) at 12.00 at the latest.

The compulsory project in TMA4255 is described in detail at the “Compulsory project” tab on the left. Maybe you already may think of what your experiment should be, since we now understand the terms “response”, “normally distributed” and “covariate”. A factor is a covariate that is discrete - and we will only consider factors with two possible values (e.g. male/female, high/low temperature).

The project consists of designing, performing and analysing a socalled factorial experiment - which means that we do multiple linear regression with 3 or 4 covariates that are factors with two levels each. This is NOT an observational study - you should collect the observations yourself.

As an example assume I want to study factors that affect the height of plant sprouts (“from seed to a plant”)

1) You need to perform an multiple regression experiment consisting of 16 trials - that is, n=16 observations. For the plan example: you need to plant 16 seeds.

2) The response that is measure should be continuous, so that the response itself or a transformation of the response in a multippel regression model can be seen to be normally distributed.

It is also possible to assume that a response with at least 7 ordered categories can be seen as continuous. (If you have for example a taste panel, you have to have at least 7 ordered categories of how good the for example cake tasted, 1."Bad", 2."Not so good"…..7"Good". Project earlier year used independent rating scores (1-20) from (mean of) two persons were used as taste-result. These individuals tasted the cake "blind" so that the look of the cake should not interfere with the taste).

For the plan experiment we assume that this is the height of each plant after 5 days of growing.

3) You choose 3 or 4 factors with two levels each that might influence your response (it is possible to choose more factors, but then you need to do a socalled fractional factorial design). For the plant experiment we may choose covariate (factor) A=two different types of seed (sunflower or broccoli seeds), B=watering (coffee or water), C=growth medium (cotton or soil).

4) If you choose 3 factors you need to perform all possible combinations of the 3 factors two times (2*2*2=8), if you choose 4 factors you need to perform all possible combinations only once (2*2*2*2=16). If you choose more than 4 factors you need to study the “factional factorials” to find out which of the possible combinations you perform. For the plant experiment we then have 2 plants that will have the same combinations for factors A,B and C.

5) A very important aspect of performing the 16 trials is that the trials should be independent and performed in a randomized order. For the plant experiment this just means that we have 16 plant which are handled in the same manner. This is often the difficult part!

For other types of experiments, like if you want to test yourself by measuring your pulse rate when running up a hill with different factors being with/without heavy backpack, with/without sports shoes, running backwards or forwards, the order that you perform the experiments will matter - and then you need to do this in random order. The ideal would to do one measurement each day, but that might be difficult, and then you instead need to do a few runs every day. Then there might be a day effect - which may be handled with more advance theory that we call blocking - that means, that you need to wait until after we have covered this topic.

6) Each trial should be a performed independently of the other 15, and constitute a full trial. I will try to explain this with a common mistake done. Assume you want to study factors that affect the taste of muffins. Then you really need to make 16 different muffins that are made from 16 doughs and baked in the oven one at a time. If you only make one dough and bake all muffins at the same time you have much less variability than the experiment in real life will have. If you for practical reasons need to handle more than one trial together this is called blocking and should be taken into account (you then first need to learn about blocking).

I suggest you now look at the list of experiments that students have done previously - listed on the www-address above - and talk to me or the TA or the Student assistant before you start performing the trials.

You may talk to us at the exercise supervision, or come to my office (office hrs Thursdays 10-11, or email me if you want to come at another time).

Anna

[15/02/16] The meeting with the reference group will be on Thursday 18th of February at 14.00 S1. If you want to join the reference group or the meeting just email the lecturer. If you have feedback on textbook, lectures, exercises, voting, services you would like, etc. contact the reference group or the lecturer https://wiki.math.ntnu.no/tma4255/2016v/start. Minutes (referat) from the meeting will be posted.

[02/02/16] Exercises will be held on Thursday 10.15-11 (EL3) and Fridays 09.15-10.00 (S2).

[27/01/16] The exercise hours was decided by voting, and we found the best fit to be Thursday 10.15-11 (?) and Friday 9.15-11 (S2). I have not yet got a room from ntnu for the Thursday exercise. So this week (week4) we will have the exercise only on Friday 29/01 9.15-11 in (S2).

[15/12/15] We will discuss the timetable for the lectures in the first lecture.

[15/12/15] First lecture is Tuesday January 12 13.15-15 in S2 (Sentralbygg 1 (romnr: 116)).