TMA4255 Applied Statistics, spring 2012

Compulsory project

The aim of the project is to

  • plan,
  • perform,
  • analyse and
  • report

the results from a self selected experiment. This will make up 20 % of the grade in TMA4255.

The grading of the project work has now finished, and the results are very impressive. The maximum score given was 20 and the minimum 16. We can not give individual detailed information about theses small deductions in the score, but in general we may say that the following could be improved: explanation (sometimes hard to understand what was done), why randomization was (not) done and how that influenced the results, genuine run replicates, model check (normality of residuals, transformations), interpretation of the results.

Scores for the TMA4255 V2012 project

Practical information

Guidance: 28.02 11-13 in Vegas, 01.03 12-14 in R10, 02.03 14-16 in F2 and 06.03 11-13 in Vegas.

Submission: To exercise teacher. The report can preferably delivered ON PAPER in his mailbox at the Department of Mathematical Sciences, 7th floor of Sentralbygg II, or to the lecturer at the lectures. Note! The report should be identified by CANDIDATE NUMBER, not name.

Submission dates: Hand in report asap (this is in accordance with the amount of work it is in analysing data and writing the report), but the deadline for the report is before Easter (Friday March 30 at 12.00 at the latest).

Collaboration: You can work up to 2 in collaboration.

Report: You shall write a report (preferably in English) on the work that is done. The report should be simple, and the structure may follow the main points made in the 'Keywords'. The length should be 6-7 pages, and preferably not exceed 10 pages. Since the report will not be returned before the exam, it may be useful to take a copy of it for personal use. A handwritten report may be as good as a nicely printed one! Also: Pictures are of course nice to have, but not needed if you can explain things without them!

Scoring: The submitted report will be graded and will count 20% of the grade for the course. Note that both the obligatory exercise and exam needs to be passed in order to achieve the Passing grade of the subject.

About the exercise

The theme for the exercise is design of experiments (DOE). The purpose is to provide insight and training in planning, performing and analyzing a statistical experiment, as well as to report the results.

The task

Carry out a k-factor two-level experiment where the goal is to determine how the various factors influence a response. You should yourself decide what kind of experiment to perform. This may be a laboratory experiment, or be from a problem in your daily life.

Alternatively, you may do a different statistical analysis, using multiple linear regression or another suitable method, using your own data. In this case you should present a brief sketch to the course teacher before the project starts.

Keywords

  1. Issues to be addressed:
    • Describe the problem you want to study.
    • Why is this interesting?
    • What prior knowledge do you have?
    • What do you want to achieve?
  2. Selection of factors and levels:
    • Which factors do you think are relevant to the problem described above?
    • Which of these factors do you think is active/inert?
    • Do you expect an interaction between some of the factors?
    • Which levels should be used, and why do you think these are reasonable?
    • How can you control that the factors really are at the desired level?
  3. Selection of response variable:
    • Which response variable will provide information about the problem described above?
    • Are there several response variables of interest?
    • How should the response be measured?
    • What can you say about the accuracy of these measurements?
  4. Choice of design:
    • 2 k factorial,
    • 2 k-p fractional factorial or other design?
    • Desired resolution of the design?
    • Is it necessary or desirable to use a blocked design?
    • Is it necessary or desirable with replicates?
  5. Implementation of the experiment:
    • Randomization.
    • Describe any problems with the implementation.
  6. Analysis of data
    • Calculation of effects and assessment of statistical significance.
    • Check the assumptions.
    • Do you need to do several experiments (e.g. to resolve alias structures)?
  7. Conclusion and recommendations:
    • Which conclusions can you draw from the experiment?
    • Remember that plots are illustrative and very useful for demonstrations.

Examples

NEW: I have compiled a list of the project done in 2011: here

Below are given some examples of problems that have been examined by former students:

  • Baking of pie: Importance of the type of flour, type of berries and appearance (with or without cover), regarding taste, consistency and experience - rated by a taste panel
  • Treatment of welding to avoid fatigue: The importance of voltage, frequency and hammering on the life of a welding
  • Corona-based electric field probe: Importance of air flow across and through a probe that measures the electrical field, with application in warning systems for lightning in helicopters.
  • Sound level for fireworks: Importance of quantity and fill volume of gunpowder, different mix ratio of gunpowder and wall thickness of the rockets, regarding sound level (dB)
  • Maximum performance in weight-lifting: Importance of kreatin-intake, physical exhaustion, position and grip on the number of lifts.
  • Yield of pyridin: Importance of pH,% methanol, the number of equivalents NaHSO 3 and t temperature on the yield of pyridin
  • Optimizing the use of heat pump in propane-propene distillation: Importance of pressure in the column, reflux ratio and temperature change over the vessel on energy costs
  • Perfect 'popping' of popcorn: Importance of the type of popcorn, the type and amount of oil in comparison to the amount of popcorn.
  • Reaction speed in an S N 2-reaction between 1-brompropan and NaOH: Importance of reaction temperature, amount of solvent and start concentration of reactant on the reaction speed
  • Purification of waste water: Meaning of focculation time, focculation intensity and sedimentation time for the remaining concentration of small particles in the cleaned water
  • Basket ball shots: Importance of distance, type of defense and shot position on the number of points scored in basketball
  • Study of line widths in photoresist: Importance of exposure, development and baking on the line width
  • Development of fish food for use in conditioning of fish: Significance of the amount of alginat, the concentration of calcium chloride, stirring time after adding alginat and curing temperature on the consistency of fish
  • Economic aspects of burning candles: Importance of price, style and color of the burning time

More examples from US course (pdf).

About DOE

WHO will benefit from statistical design of experiments?

Anyone who performs or considers to perform, and who wants to systematize the experiments in order to obtain more information by using fewer attempts. Dette kan være både forsøk i laboratoriet eller i produksjon (fysiske forsøk) eller simuleringer på datamaskin (numeriske forsøk). These can be either laboratory experiments, experiments in production (physical effort) or computer experiments (simulatios).

WHAT is typical for DOE?

In a wide range of application areas are carried out trials/experiments to investigate or verify certain properties of a process or a system. This can be seen as a learning process: We have opinions and thoughts on the process (process A is better than process B, increased temperature will provide increased yield, etc.). We carry out experiments to find out whether our opinions and thoughts coincide with reality. After the results of the experiment are analyzed, new questions may pop up, or we may need to reconsider our original opinions and thoughts (e.g. process A and B are equally good, when increased temperature is favorable, we tend to believe that also be increased pressure is favorable, etc.) Experimental design is expected to help us to get this learning process to converge as rapidly as possible. The experiments that are carried out are really our questions to the process, while the response that is observed is the process’ answers to us. In this connection, it is clear that good questions will provide better answers to what we ask for. Here are three important points which we have to address:

  • What to ask about?
  • How to ask?
  • How to interpret the answer?

The first point is clear since it relates to the knowledge you have about the process or system. DOE is mainly about the second and third point. If you are to carry out a statistical analysis of the results of the experiment, it will clearly be useful to plan the experiment with such an analysis in mind. Or to put it in other words: If the experiment is properly planned, then often the analysis of the data will be easy. In the course we consider experiment designs that are well suited for different situations, and we demonstrate how data from such experiments can be analyzed. At the same time we also need to get an understanding of what can go wrong during an experiment and how we can avoid these problems (randomization, blocking and replicates).

WHY will one need statistical experimental design?

There are often three properties that are highlighted as important when talking about the importance of DOE:

  • more information
  • fewer experiments
  • iterative learning process

Experiments performed today will often be costly and the conclusions drawn from these typically have large consequences. It is therefore essential to plan ones experiments in a systematic way so that one can get more information from the collected data, while at the same time keeping the number of experiments to a minimum. Yet it is believed that the vast majority of users of experimental design appreciate in particular the systematism concerning the process, identifying what to investigate and why.

2012-04-30, Mette Langaas