# TMA4255 Applied Statistics, spring 2014

## Messages

[27/06/14] Dear students, as you know the exam result in TMA4255 are available at studweb.

I'm very impressed with your exam papers in TMA4255. I have seen many very good solutions - and given out many good grades. But sadly there were also some Fs. The grade frequency for TMA4255 ended up to be:

A: 7% B: 27% C: 29 % D: 27% E: 0% F: 10%

In most courses we write a grading document, which is to be made available together with the grading: Gradings.pdf. Here you see in detail how the scores are given and the grade scale used.

[30/05/14] Todays exam problem and tentative solutions are available here: english version of exam, bokmål version of exam, nynorsk version of exam and TENTATIVE SOLUTION.

[25/05/14] The minutes from the (3.) meeting with the reference group is available at:

Minutes from reference group meeting 27.03.14

and sluttrapport:

Final rapport from reference group

[19/05/14] The (third) meeting with the reference group will be on Thursday May 20 at 11.15 in room 734, 7th floor Sentral building II. If you want to join the reference group or the meeting just email the lecturer. If you have feedback on textbook, lectures, exercises, voting, services you would like, etc. contact the reference group or the lecturer. Minutes (referat) from the meeting will be posted.

[13/05/2014] The guidance are cancelled today, 8.15-11 room 734. You are able to sit there and work, but the lecturer will not be there. You can email or come at the lecturer office if you have any questions.

[12/05/2014] Lecturer/TA available before the exam:** May 12 at 9.15-11 and
May 13 at 08.15-10 in room 734**, 7th etg., sentralbygg 2, elevator
next to Tapir Food store. This is a room where you may sit
together and work - and get help. Or, email/stop by the lecturer
or TA at any time if you have questions.

[09/05/2014] DOE project grades: The grading is finished, and the results are found at RESULTS

If you do not find your result you either must have entered the wrong candidate number, or I have not received your project - or I have made a mistake. Please contact me if think there is a mistake. The maximum score on the DOE project is 20. I'm utterly impressed with the creativity, good writing and good quality of the DOE projects. I feel that you have learned a lot from working with the project.

[04/04/14] Week 18: last lectures with summing up etc.

- Tuesday 13.15-15.00 in
**F2**(not K27) - Wednesday 09.15-11.00 in
**EL2**

[03/04/14] Hand in (last day) of project is Friday April 10. If you can not make this deadline you need to inform me about it and make a plan for when you may hand-in the project. Remember to read the info at Compulsory project, on this Message page and also study the last slides 26-30 from lecture L11: L12.pdf .

[03/04/14] Although there are no lecture next week (15), the TA will show up at the supervision hours (Tuesdays 15.15-17.00) and you may ask Erik (and Vaclav) about exercises or DOE project. Send me an email or call me if you have any questions regarding DOE project.

[03/04/14] The minutes from the (2.) meeting with the reference group is available at:

Minutes from reference group meeting 27.03.14

If you have comments or observations that have not been made, please contact the reference group or lecturer.

[02/04/14] The last week of lecture with summing up and concluding remarks, exam preparation has been moved from week 17 to **week 18**. Last day of lecture may be Wednesday April 30.
Let me know if there are any strong objections against this. Time and place will be posted.

[02/04/14] Summing up note for chi-square tests (Goodness of fit, test of independence and test of homogeneity): what to do when expected value is less than 5:Notat: Chi-square tests, expected value less than 5

[02/04/14] Here is the workflow for the DOE that was in lectured 19.02.14:

- Set up factorial design (respons, factors: full 2^k or repeat)
- Randomize runs
- Perform experiments, write response values into worksheet
- Fit the full model. If you do not have replicates⇒suggest reduced model
- Fit reduced model (or full if replicates):⇒look at RESIDUALS to assess model fit⇒if transformation of y is needed: lig y, √y, 1/y ⇒ refit
- Assess significance
- Interpret your results. Main and interaction effect plot.

[25/03/14] The (second) meeting with the reference group will be on Thursday March 27 at 13.00 in room 738, 7th floor Sentral building II. If you want to join the reference group or the meeting just email the lecturer. If you have feedback on textbook, lectures, exercises, voting, services you would like, etc. contact the reference group or the lecturer. Minutes (referat) from the meeting will be posted.

[20/03/14] Remember see and read wiki.math.ntnu page for the course for your DOE project.

*Under the "Message" tab:*

- Two messages [13/2/14], about residuals and the points for the DOE

*Under the "Compulsory project” tab:*
- Practical information
- Structure may follow the main (relevant) points made in the 'Keywords" given.
Etc.

*Under the "Lectures" tab:*

-Important to discuss:

1. Each experiments is a **genuine run replicate**, that is reflects the total variability of the experiment. (each trial should be a performed independently and constitue a full trial)

2. The run order is random (randomized) so that potential external factors are not confused (confounded) with experimental factors.

Look at the Workflow of the DOE.

The effect of orthogonality.

Anna

[19/03/14] It is time for second meeting between the lecturer, TA and the reference group. This meeting will be this week (week 12) or next week (week 13). We need input on everything you might mean something about, including the text book, the lectures, the exercises, the voting, the www-pages, project, the information in general++.

[6/03/14] New license key for Minitab: If you have installed Minitab on your own Windows computer, Minitab will before March 1 need a new license file. The license file is called 'minitab_17.0.lic' and can be downloaded from https://www.progdist.ntnu.no/ under Minitab and saved to your computer (the default place to save is 'c:\program files (x86)\minitab'). If Minitab at start-up does not find this new lisence file it will ask you where you saved the file, and then you just navigate to where you saved the file. If you experience problems with this please contact the Orakel Support Services: orakel@ntnu.no or telephone 91500. If you are running MINITAB by remote desktop to cauchy the new license is already installed, and will also be installed at Fraggle.

[03/03/14] There will be no lecture tomorrow, Tuesday 04/03 (week 10). In addition, there will not be any supervision on the Thursday exercise.

[26/02/14] Remember that in the interaction plots, if the lines are parallel this indicates no interaction effects. If they are not parallel, this indicates interaction effects. They do not necessarily need to cross for there to be an interaction effect, only not parallel.

[26/02/14] We encourage that the report on the project is written in English (to learn how to write scientific papers). But can be handed in Norwegian since both lecturer and TA, who will do the gradings (0-20 points), are Norwegian.

[25/02/14] For those of you that do project with 3 factors and need to do two repetitions, remember randomizations for all 16 experiments and you may also need to do blocking of the repetitions. Talk to me before doing experiments with repetition.

[24/02/14] Fractional Factorial Designs will not be part of the curriculum/reading list of the course. So chapter 12 from Box, Hunter og Hunter, Statistics for experimenters are not part of the reading list of the course. The Note: Notat: Factorial experiments at two levels are still on the reading list for DOE.

[24/02/14] You find examples of old projects here: Compulsory project. **You are allowed to do the same experiments as the ones done previous years (the examples given)**.
You will do the experiment (maybe choosing your own low and high levels of factors) and getting your own response values and results.

[24/02/14] Although there are no exercise this week (week 9), the TA will show up at the supervision hours (Tuesdays 15.15-17.00 and Thursday 14.15-16.00) and you may ask Erik and Vaclav about their opinion on your plan for your DOE project. If you want to talk to me, just send me an email or show up at my office door - I'm surely in on Thursday at 10-11 (office hrs) and in week 9 at the lecture hrs (Tuesday 13.15-15.00 and Wednesday 9.15-11.00).

[24/02/14] Now you have to the end of week 15 to finish your DOE project - deadline for hand-in is April 11 (week 15). If you can not make this deadline you need to inform me about it and make a plan for when you may hand-in the project. Remember to read the info at Compulsory project and also study the last slides 26-30 from lecture L11: L12.pdf

[18/02/14] This week (8) we will go through the last part of DOE. First we finish the DOE-note, available from the lectures tab L10, Note. What is left in the note is from page 7 -15, variance estimation, replicated experiments and blocking.

Then we turn to the last topic of DOE, which is how to perform fractions of a full experiment. Then we will use ch 12 from the famous book by Box, Hunter and Hunter, Statistics for experimenters. Due to copyright issues I have filed the pdf in Its learning (TMA4255/Handouts folder), so you need to go there - or email me at anna.holand@math.ntnu.no if you have troubles finding the file.

[13/2/14] Plotting of residuals and normality assumption:

* When solving linear regression models (using the least square method) we assume that the random errors (epsilon_i) are linear independent, Normally distributed with equal variance (linear, homogenous, normal and not correlated)

* We can not observe this random error term, and we therefore use the residuals (e_i=y_i-y^_i) to test the normality assumption from the fitted regression.

* We plot the residuals (often studentized residuals) to see if they are normally distributed and have equal variance (and linear).

- We plot the qq plot for residuals and histogram of residuals to look for normality.
- Plot residuals against predicted value of response y and regressors to look for equal variance.
- Plot residuals against observation order to look for non random error (in time).

* If we find that the residuals are not Normally distributed and have equal variance. This means that our model do not fit the data, so data do not fit the normal distribution assumption for the error term. You can also look at the Darling Anderson test to see how good your nornmal assumption is.

* This may lead to for instance biased standard errors that may lead to making wrong conclusions in hypothesis testing.

*Approach to handling data with non-normal random errors :

- Direct transformation of data to make random errors approximately normal (transformation may often help both non-normality and non-constant variation of the random errors)

- Box-Cox transformation of Y (find this in MINITAB): this transformation finds the best transformation to simultaneously stabilize the random error variance, make the distribution of the error term more normally distributed, and straighten the relationship between Y and the regressors (x`s).

* You can then fit and validate the model in the transformed variables.

* If you transform the response variable Y, you may want to transform the predicted values back into the original units using the inverse of the transformation applied to the response variable.

Anna

[13/2/14] The compulsory project in TMA4255 is described in detail at the “Compulsory project” tab on the left. Maybe you already may think of what your experiment should be, since we now understand the terms “response”, “normally distributed” and “covariate”. A factor is a covariate that is discrete - and we will only consider factors with two possible values (e.g. male/female, high/low temperature).

The project consists of designing, performing and analysing a socalled factorial experiment - which means that we do multiple linear regression with 3 or 4 covariates that are factors with two levels each. This is NOT an observational study - you should collect the observations yourself.

As an example assume I want to study factors that affect the height of plant sprouts (“from seed to a plant”)

1) You need to perform an multiple regression experiment consisting of 16 trials - that is, n=16 observations. For the plan example: you need to plant 16 seeds.

2) The response that is measure should be continuous, so that the response itself or a transformation of the response in a multippel regression model can be seen to be normally distributed.

It is also possible to assume that a response with at least 7 ordered categories can be seen as continuous. (If you have for example a taste panel, you have to have at least 7 ordered categories of how good the for example cake tasted, 1."Bad", 2."Not so good"…..7"Good". Project earlier year used independent rating scores (1-20) from (mean of) two persons were used as taste-result. These individuals tasted the cake "blind" so that the look of the cake should not interfere with the taste).

For the plan experiment we assume that this is the height of each plant after 5 days of growing.

3) You choose 3 or 4 factors with two levels each that might influence your response (it is possible to choose more factors, but then you need to do a socalled fractional factorial design). For the plant experiment we may choose covariate (factor) A=two different types of seed (sunflower or broccoli seeds), B=watering (coffee or water), C=growth medium (cotton or soil).

4) If you choose 3 factors you need to perform all possible combinations of the 3 factors two times (2*2*2=8), if you choose 4 factors you need to perform all possible combinations only once (2*2*2*2=16). If you choose more than 4 factors you need to study the “factional factorials” to find out which of the possible combinations you perform. For the plant experiment we then have 2 plants that will have the same combinations for factors A,B and C.

5) A very important aspect of performing the 16 trials is that the trials should be independent and performed in a randomized order. For the plant experiment this just means that we have 16 plant which are handled in the same manner. This is often the difficult part!

For other types of experiments, like if you want to test yourself by measuring your pulse rate when running up a hill with different factors being with/without heavy backpack, with/without sports shoes, running backwards or forwards, the order that you perform the experiments will matter - and then you need to do this in random order. The ideal would to do one measurement each day, but that might be difficult, and then you instead need to do a few runs every day. Then there might be a day effect - which may be handled with more advance theory that we call blocking - that means, that you need to wait until after we have covered this topic.

6) Each trial should be a performed independently of the other 15, and constitue a full trial. I will try to explain this with a common mistake done. Assume you want to study factors that affect the taste of muffins. Then you really need to make 16 different muffins that are made from 16 doughs and baked in the oven one at a time. If you only make one dough and bake all muffins at the same time you have much less variability than the experiment in real life will have. If you for practical reasons need to handle more than one trial together this is called blocking and should be taken into account (you then first need to learn about blocking).

I suggest you now look at the list of experiments that students have done previously - listed on the www-address above - and talk to me or the TA or the Student assistant before you start performing the trials.

You may talk to us at the exercise supervision, or come to my office (office hrs Thursdays 10-11, or email me if you want to come at another time).

Anna

[04/2/14] Due to sickness, the lecture (today) Tuesday 4/2 is cancelled.

[03/2/14] There will be an extra teaching assistent on the regular Tuesdays exercises and an extra exercise day at Thursdays 14:15 -16:00 at H3 Datasal 411Rill (H3 Datasal 411Rill Høgskoleringen 3, http://www.ntnu.no/studieinformasjon/rom/?gr=1&exact=1&romnr=358411). The extra TA on Tuesdays and the Thursday exercise will be regulated after how many students that shows up at the exercises.

[03/2/14] The room MA24 proved to be not so good for the lectures on Tuesdays 13.15-15-00. So we move back to the original room K27.

The Tuesdays lectures will therefore be held at room K27.

[30/1/14] The minutes from the meeting with the reference group is available at:

Minutes from reference group meeting 30.01.14

If you have comments or observations that have not been made, please contact the reference group or lecturer.

In the end of the minutes there are action point some of these are:

To accommodate the different knowledge background, extra guidance will be given. This will be given on Wednesdays 8.30-9.00, where students before hand can send an email about some problems/questions and the lecturer will try to answer this on Wednesday mornings.

Excercises: An extra TA will attend the regular exercise on Tuesdays (for some weeks), and in addition (if a datalab is available) an extra exercise supervision will be given on Thursdays 14.15-16.00. The extra supervision will depend also on how many who shows up at the exercises in the future.

[27/1/14] The meeting with the reference group will be on Tuesday January 30 at 13.00 in room 1126, 11.etg. sentralbygg 2. If you want to join the reference group or the meeting just email the lecturer. If you have feedback on textbook, lectures, exercises, voting, services you would like, etc. contact the reference group or the lecturer. Minutes (referat) from the meeting will be posted.

[24/1/14] An additional time/day for exercise supervision will be arranged. I will talk with you on the Tuesdays lecture to try to establish what time/day would be preferable for this extra supervision. For those who can not be on the Tueseday lecture please email me if you want to give an preferable time for this extra supervision.

[24/1/14] If you have feedback on textbook, lectures, exercises, voting, services you would like, etc. contact the reference group or the lecturer. Minutes (referat) from the meeting will be posted.

[24/1/14] It is time for a meeting between the lecturer, TA and the reference group. This meeting will be next week (week 5). We need input on everything you might mean something about, including the text book, the lectures, the exercises, the voting, the www-pages, the information in general++. Additional agenda will be: Extra guidance: it is desirable with a questions-session, say 8.30 to 9 Wednesdays where I will try to answer questions. Preferable email before the session with questions from the chapters in the textbook so far in the course. Or is it enough with office hours 10-11 Thursdays where you can come and ask questions or email me if you want to come at another time.

[23/1/14] A note regarding finding critical values of the F-distribution in MINITAB with an example on a hypothesis test, testing if the variances of two different populations are equal or not, can be found here Finding critical values of the F-distribution in MINITAB.

[23/1/14] The room K27 for the lectures on Tuesdays 13.15-15-00 have proven to be to small for us and I have found another lecture room that hopefully will be better. This is room MA 24 (Grønnbygget (romnr: 003), http://www.ntnu.no/kart/index.php?id=6062).

The Tuesdays lectures from (including) week 5 will therefore be held at MA24.

[8/1/14] A introduction to MINITAB (exercise 1) and R (exercise 2) will be given week 3 (and 4) Tuesdays 15.15-16.00 (17.00) in R 52.

[6/1/14] First lecture is Tuesdays January 7 13.15-15 in K27 (Kjemi 1 building)