# TMA4255 Applied Statistics, spring 2013

## Messages

[27/5/13] Wednesday May 29 at 14.15 in F2: Exam 2010 spring

This is just a gentle reminder that I will be working through the May 2010 exam in TMA4255 on Wednesday at 14.15 - 16 (max) in F2. If you are able to understand the solutions to this exam there is no need for you to attend - you will not be missing out on important information. You may of cause bring any question to the presentation, and if you already now have a question you may write it here:

https://docs.google.com/document/d/1Gssu8MiWDpT4hqG-2RY5dNdLMfxHsI9qP1IhgSf543U/edit so I can link it to the exam presentation. The exam you find under the exam tab to the left of this page.

[31/4/13] The teaching semester is now finished. In the lecture today we decided on the following activities before the exam on June 7, 9-13.

- Questions in general: pose you questions at the discussion forum at It's learning, or directly to Shahrukh or Mette (email or office).
- Wednesday May 29, 14.15-16.00 in F2: Exam presenation of May 2010. Specially for those of you who are not sure you will pass this exam (getting 32 out of 80 possible points in the written exam). I will go through as many problems/item of the May 2010 exam needed to pass the exam. The reason for not choosing the last 3 exams is that these exams have very extensive solutions available. If you are OK with reading the solutions to May 2010 yourself - or you are not afraid that you will not pass the written exam - you should NOT attend this exam-presentation. You may of cause bring any questions you have to ask in plenum.
- Tueday June 4 10.15-12 and Wednesday June 5 14.15-16 in room 734, 7etg. sentralbygg 2, elevator next to Tapir Mat: individual supervision. Instead of queuing outside the office door of the lecturer you sit and work in a room - and the lecturer will go around answering your questions.

[29/4/13] This week we have the following activities in TMA4255:

- Fraggle Monday 15.15-16 and Tueday 12.15-14: supervision of all exercises – this is you last chance to ask questions about the exercises. All solutions are available at the course www-page. (If nobody turns up at 12.15- and the first 30 minutes on Tuesday, Shahrukh will go to her office.)
- The last lecture is Tuesday 10.15-12 in G21, where we continue summing up the topics we have worked with. We went through parts 1+5+6+7 last week, and tomorrow we go through parts 2+3+4 - regression, DOE and ANOVA,

Tuesday April 30 is the last day of the teaching semester.

Activities leading up to the exam on June 7th will be decided in the lecture tomorrow.

[25/4/13]

- Mindmap for Part 1 and 7 is now available at the lectures tab, L25 slot. We started on this in the lecture and I finished it after the lecture.
- Regarding Taylor expansion and "Brukerkurs A+B i Matematikk" (MA0001 and MA0002). You only need to know the "tanget line approximation" on page 194 of the textbook of that course. The chapter 7.6 Taylor approximation was not on the reading list of the courses - but due to your comments today - it will be on the list from the autumn of 2013.
- For the last lecture, focus is on MLR, DOE and ANOVA.
- As we talked about in the lecture today, the hard part of statistics is to select the best method to analyse a given situation and data set. This is the focus of the last two lectures of the course - show you examples and you reflect on which method to use.

[24/4/13] We have now been through all the topics on our reading list, and for the last two lectures the plan is as follows:

- Present problems (some taken from old exam problems), and discuss
- which statistical method(s) is(are) suitable?
- Perform the analyses (or just comment on what to do) - and stress important theoretical properties.
- Look at interpretations of outputs (plots and results) from statistical analyses of the problems.
- Make mindmaps connecting topics, ``situations'' and methods from the different parts of the course.

The topics of the course can be divided into two themes:

- First theme: Compare, monitor and/or test parameters in one or two populations. Parts 1, 5, 6, and 7
- Second theme: Model response mean as a function of observed covariates. Parts 2, 3, and 4.

My plan is to focus on the first theme on the Thursday April 25 lecture and the second theme on the Tueday April 30 lecture. I find this very hard, and I'm not sure if this will be a success. If you are most intersted in regression-type methods, you skip the Thursday lecture.

There will be no new stuff at these last two lectures, so you will probably not miss out on anything if you don't attend:-)

The course the course evalution has so far mainly been answered by the students with high attendance at lectures (there is one question where you estimate you attenance), so to get at more correct picture of the opinions of all students that plan to sit for the exam, I hope you can answer the evaluation - expecially if you have dropped coming to the lectures for some specific reason (which you may tell about or not) : https://ime.wufoo.com/forms/tma4255-applied-statistics-course-evaluation/

[20/4/2013] Course evaluation: please help us evaluate the course by answering this course evaluation. You may skip any entry that you do not want to answer or do not have an opinion on. There is one question on activities in the period before the exam. The evaluation will be used in the planning of the course for next year, and your input is highly appreciated.

[18/4/2013] DOE project grades: The grading is finished, and the results are found at RESULTS

If you do not find your result you either must have entered the wrong candidate number, or I have not recieved your project - or I have made a mistake. Please contact me if think there is a mistake. The maximum score on the DOE project is 20. I'm utterly impressed with the creativity, good writing and good quality of the DOE projects. I feel that you have learned a lot from working with the project.

[15/4/2013] I hope you are all back in study-mode after Easter and excursions (even though the attendance at lectures and exercise supervision do not support that hypothesis…). We have now only 5 lectures and 2 exercises left of TMA4255, and only 3-3.5 of the lectures with new material.

- This week we finish part 6 on contingency tables and the chi-square test (Kahoot! 5 min quiz in the end of the Tuesday lecture),
- and start on part 7 (the last topic) on nonparametric tests.
- Part 7 will be finished next week, and
- then we will do a summary of the course by choosing a few exam questions (from various topics) to work with.

In connection to the last two lectures (lecture 25 on Thursday April 25 and lecture 26 on Tuesday April 30) I would like you to write down 1-3 questions that you want me to answer in the following Google document:

In this way you may all see what the other students are wondering about (but anonymously) and I may answer the questions at the appropriate place in my summary of the course. Do not think to much - just write down what what first comes to mind!

[10/3/2013] Activities before Easter: For week 11, starting at March 11, we will finish our lecture work with ANOVA (Part 4). The Tuesday lecture will cover 13.6-13.8 (Multiple testing and Randomized block designs), while on Thursday we will work with two factor ANOVA (14.1-14.3). The Thursday lecture will be the last lecture before Easter.

Supervision of E6+7+DOE project will be on Mon 15.15-16., Tues 12.15-14 and Thurs 10.15-11, both in week 11 and week 12. E8 for ANOVA will not be supervised until after Easter, but will be made available in a few days.

Remember that the DOE project should be handed in on Friday March 22 at 12.00 at the latest. For details see Compulsory project.

For those of you who are going away on excursions - I wish you a safe trip, and hope to see you back at the lectures on Thursday April 4, when we start with Part 5 on process control.

[22|02|2013] New license key for Minitab: If you have installed Minitab on your own Windows computer, Minitab will before March 1 need a new license file. The license file is called 'minitab.lic' and can be downloaded from '\\progdist.ntnu.no\progdist\campus\MiniTab' and saved to your computer (the default place to save is 'c:\program files (x86)\minitab'). If Minitab at start-up does not find this new lisence file it will ask you where you saved the file, and then you just navigate to where you saved the file. If you experience problems with this please contact the Orakel Support Services: orakel@ntnu.no or telephone 91500. If you are running MINITAB by remote desktop to cauchy the new license is already installed, and will also be installed at Fraggle.

[21/02/2013] In week 9 we will go through the last part of DOE. First we finish the DOE-note, available from the lectures tab L10, Note What is left in the note is from page 11 -15, replicated experiments and blocking.

Then we turn to the last topic of DOE, which is how to perform fractions of a full experiment. Then we will use ch 12 from the famous book by Box, Hunter and Hunter, Statistics for experimenters. Due to copyright issues I have filed the pdf in Its learning (Handouts folder), so you need to go there - or email me at Mette.Langaas at math dot ntnu dot no if you have troubles finding the file.

As usual we will also have supervision of exercise E6 (DOE exercicse) in week 9 and you may of cause ask Shahrukh and Ewa about their opinion on your plan for your DOE project. Supervision hrs are as usual in Fraggle on Monday 15.15-16, Tuesday 12.15-14 and Thursday 10.15-11. If you need to talk to me, just send me an email or show up at my office door - I'm surely in on Thursday at 10-11 (office hrs) and probably also after the lectures on Tuesday.

You now have 4 weeks to finish your DOE project - deadline for hand-in is March 22. If you can not make this deadline you need to inform me about it and make a plan for when you may hand-in the project. Remember to read the info at Compulsory project and also study the last slides 26-30 from lecture L11: L11.pdf

[18/02/2013] Overview of activities in week 8:

- Monday 15.15-16: Supervision of exercise 5 in Fraggle
- Monday 17.15-18: Session in Norwegian for students with ST0101 background "bring three questions from ch 1-12 and the lecturer will try to answer on the blackboard".
- Tuesday 10.15-12: Lecture in G21: DOE-note, orthogonality, 2^k full factorials, the DOE-project, inference in 2^k factorials.
- Tuesday 12.15-14: Supervision of exercise 5 in Fraggle
- Thursday 8.15-10: Lecture in F2: DOE-note, inference in 2^k factorials, blocking.
- Thursday 10.15-11: Supervision of exercise 5 in Fraggle.

AND, if you want me to schedule a "bring three questions"- session for students with other backgrounds whan ST0103 send me an email. The session will be given, but if only 3 students want to attend (as of now) we do this in my office at another time slot than 17 on Monday. If more than 5 want to attend we schedule if for Mon 25 in G21. And, today we will see if such a session can be a success – which depends both on the questions asked and the answers from the lecturer…

[14/02/2013] The minutes from the meeting with the reference group is available at:

If you have comments or observations that have not been made, please contact the reference group or lecturer.

In the end of the minutes there are action point - and one of these is that we schedule 1 or 2 sessions on Monday 17.15 in G21 where students ask questions from the introductory statistics course and TMA4255 so far (which will be chapters 1-12 in our textbook).

Monday 18 Febr 17.15 is for those of you that did ST0103 for your introductory course. This course was in Norwegian with Løvås: Statistikk as textbook. This session will be held in Norwegian.

At the lecture today 3 students wanted a questions-session aimed at the students from TMA4245/40 or with other introductory courses. If more students are interested we schedule this for February 25, if only 3 students are interested we do this at the lecturers office at a time point that suits the students (anyway - this will be announced). Please report back if you want to join, and if more students will join we schedule this for G21.

[06/02/2013] The meeting with the reference group will be on Tuesday Febr 12 at 9.00 in room 1236, 12.etg. sentralbygg 2. If you want to join the reference group or the meeting just email the lecturer. If you have feedback on textbook, lectures, exercises, voting, services you would like, etc. contact the reference group or the lecturer. Minutes (referat) from the meeting will be posted.

[05/02/2013]

The compulsory project in TMA4255 is described in detail at the "Compulsory project" tab on the left. Maybe you already may think of what your experiment should be, since we now understand the terms "response", "normally distributed" and "covariate". A factor is a covariate that is discrete - and we will only consider factors with two possible values (e.g. male/female, high/low temperature).

The project consists of designing, performing and analysing a socalled factorial experiment - which means that we do multiple linear regression with 3 or 4 covariates that are factors with two levels each. This is NOT an observational study - you should collect the observations yourself.

As an example assume I want to study factors that affect the height of plant sprouts ("from seed to a plant")

1) You need to perform an multiple regression experiment consisting of 16 trials - that is, n=16 observations. For the plan example: you need to plant 16 seeds.

2) The response that is measure should be continuous, so that the response itself or a transformation of the response in a multippel regression model can be seen to be normally distributed.

It is also possible to assume that a response with at least 7 ordered categories can be seen as continuous.

For the plan experiment we assume that this is the height of each plant after 5 days of growing.

3) You choose 3 or 4 factors with two levels each that might influence your response (it is possible to choose more factors, but then you need to do a socalled fractional factorial design). For the plant experiment we may choose covariate (factor) A=two different types of seed (sunflower or broccoli seeds), B=watering (coffee or water), C=growth medium (cotton or soil).

4) If you choose 3 factors you need to perform all possible combinations of the 3 factors two times (2*2*2=8), if you choose 4 factors you need to perform all possible combinations only once (2*2*2*2=16). If you choose more than 4 factors you need to study the "factional factorials" to find out which of the possible combinations you perform. For the plant experiment we then have 2 plants that will have the same combinations for factors A,B and C.

5) A very important aspect of performing the 16 trials is that the trials should be independent and performed in a randomized order. For the plant experiment this just means that we have 16 plant which are handled in the same manner. This is often the difficult part!

For other types of experiments, like if you want to test yourself by measuring your pulse rate when running up a hill with different factors being with/without heavy backpack, with/without sports shoes, running backwards or forwards, the order that you perform the experiments will matter - and then you need to do this in random order. The ideal would to do one measurement each day, but that might be difficult, and then you instead need to do a few runs every day. Then there might be a day effect - which may be handled with more advance theory that we call blocking - that means, that you need to wait until after we have covered this topic.

6) Each trial should be a performed independently of the other 15, and constitue a full trial. I will try to explain this with a common mistake done. Assume you want to study factors that affect the taste of muffins. Then you really need to make 16 different muffins that are made from 16 doughs and baked in the oven one at a time. If you only make one dough and bake all muffins at the same time you have much less variability than the experiment in real life will have. If you for practical reasons need to handle more than one trial together this is called blocking and should be taken into account (you then first need to learn about blocking).

I suggest you now look at the list of experiments that students have done previously - listed on the www-address above - and talk to me or the TA or the Student assistant before you start performing the trials.

You may talk to us at the exercise supervision, or come to my office (office hrs Thursdays 10-11, or email me if you want to come at another time).

-Mette

[05/02/2013] It is time for a meeting between the lecturer, TA and the reference group. We need input on everything you might mean something about, including the text book, the lectures, the exercises, the voting, the www-pages, the information in general++. In addition I have a question: usually in this course we have one week without lectures and only supervision of the project, that would be the week after Part 3 is finished (this year probably week 10, starting at March 4), but this year I have gotten feedback that many of you are doing on excursions the week before Easter (week 12). Should I instead not give lectures in week 12 (and give in week 10)? We will an any way provide you with many possibilities for supervision of the project.

[04/02/2013] Activities in week 6:

- Monday 15.15-16: individual supervision of Exercise 3 in Fraggle.
- Tuesday 10.15-12 in G21: Lecture, simple linear regression predition 11.6, multiple linear regression 12.1-12.4.
- Tuesday 12.15-13: individual supervision of Exercise 3 in Fraggle.
- Thursday 8.15-10 in F2: Lecture, multiple linear regression ANOVA, prediction, residuals: 12.4-12.6+12.10.
- Thursday 10.15-11: individual supervision of Exercise 3 in Fraggle.

I wish you all a productive - and enjoyable week

[22/01/2013] Dear students, the lecture on Thursday 24 at 8.15-10, is as previously announced sadly cancelled due to travelling. Please use the time to work with Part 1, get up to date on chapters 8-10 in the textbook. You may also start on Exercise 2, which is mostly hypothesis testing - some theoretical problems and some using the computer. I hope you observed that Exercise 1 was not so much new statistics, mainly getting to know the MINITAB software.

The next lecture is Tuesday January 29 at 10.15, where we start on Regression (chapter 11 and 12).

From next week (and for the rest of the semester) we have a new room for the Thursday 8.15 lectures = F2 (with larger blackboard and not a flat floor).

[21/01/2013] We need 3-4 students for the Reference group in TMA4255. Would one student from BGEOL, one PhD student or foreign student, and 1-2 students from other study programmes volunteer for the job? Just email the lecturer, email at Course Description tab.

[20/01/2013] Remember MINITAB intro in G21 at 17.15 tomorrow Monday January 21. Bring your laptop with MINITAB or remote desktop connection (and wifi) installed. The handouts for the intro and exercise 1 is available at the Exercises tab to the left.

Also, the first supervision in Fraggle Høyskoleringen 3, H3-Datasal 524 will be on Tuesday January 22 at 12.15-13.

From next week supervision will be on Mondays 15.15-16 and Tuesdays at 12.15-13 in Fraggle. I am working to get more supervision help to also offer supervision on Thursdays at 10.15-11.

You may of cause go to the Fraggle lab at any time it is open, and I have booked it for TMA4255 (then you may kick out other people if the seats are taken) at Mondays 14.15-16, Tuesdays 12.15-14 and Thursdays 10.15-12, but will sadly not offer supervision at all these times. Mondays at 14.15 is not allowed for scheduled supervision (tillitsrep time).

[18/01/2013] Results from the questionary, after 44 students have answered.

- 38/44 plan to go to the lectures, but 8/44 do not have a smart device to bring to the lectures to be used for electronic voting.
- When it comes to using other channels for information than the course www-page, 9/44 does not need that and 33/44 wants announcements at Its learning, 4/44 on Facebook and 1/44 on Innsida. So, are you really not using Innsida?
- For the discussion forum, 27/44 does not have a need for that, while 17/44 would like that, mostly using Its learning.

Action: I try to make announcements of new info on Its learning, and open a discussion forum at Its learning. More info on the new time slots for exercise supervision at Fraggle to come sooooon.

[16/01/2013] Change in computer lab - and decide on time for supervision:

- We have been instructed to use the computer lab Fraggle: Høyskoleringen 3, H3-Datasal 524 (and not Vegas as previously announced). More info to come. Students should have access to Fraggle automatically, but not PhD students - who need to email me with their name and card number (NTNU card), lower left corner. Foreign students: please check if you can enter Fraggle and report back to me with name and card number if not.
- Remember to answer the questionary (if you have not yet done that) so that we find two one-hour time slots for exercise supervision at Fraggle: https://ime.wufoo.com/forms/tma4255-applied-statistics-startup/. I will examine the answers on Friday (tomorrow) and book Fraggle.

[15/01/2013] Today we started the lectures in TMA4255, with intro to the course and started on the normal inference (Part 1) and the normal plot. You may look at the last slides from today for yourself. On Thursday at 8.15 we continue with remembering estimation and sample distribution (chisq and t), and look at hypothesis testing and p-values in general before ending with the t-test. If you are up to date on these topics you may skip the Thursday lecture (browse the slides if you want). New stuff for next week, Monday:MINITAB intro and Tuesday:two samples and the F-distribution. You have until the end of Thursday January 17 to answer the short questionary (link given below) - then I analyse the results and decide when the supervision in Vegas will be given.

[14/01/2013] I would like you to answer this short questionary to help me tailor the course to your needs: https://ime.wufoo.com/forms/tma4255-applied-statistics-startup/. Be sure to fill in your study programme and your desired time for exercise supervision in the Vegas lab. Thank you!

[11/01/2013] We will use electronic voting (student response) in this course. This means that if you have a device with an internet browser (smart phone, iPad, Galaxy Tab, laptop, ..) please bring it to class. It would also be very helpful if you bookmark the address clicker.math.ntnu.no which will be used for the voting. The system does not work with Internet Explorer (the system uses websockets which is currently not supported by Internet Explorer). You may also check that you are able to connect to the internet, either via 3G or preferably (economically) via eduroam. More about connecting to eduroam you find at the orakel pages at Innsida: norsk=https://innsida.ntnu.no/wiki/-/wiki/Norsk/Tr%C3%A5dl%C3%B8st+nett, English=https://innsida.ntnu.no/web/guest/wiki/-/wiki/English/Wireless+network

[06/12/2012] Pages available. First lecture is Tuesday January 15, 10.15-12.00 in G21 (cellar of Geology building).