Project and Master theses supervised by Stefanie Muff
Contact information: https://www.ntnu.no/ansatte/stefanie.muff
Supervision will be in English. My topics are most suitable for project/master thesis, but might be a bit too comprehensive for Bachelor theses. If you are interested in a Bachelor thesis, you may contact me and we can see whether we can adjust the topic accordingly.
My research interest lie at the interface between Bayesian statistics and evolutionary biology / movement ecology. Particular research areas include
- Quantitative genetics: This area offers a lot of statistical problems, mainly in the context of multivariate generalized linear mixed models (GLMMs). One main aim is to estimate variance components from genetic versus environmental components, that is, the total variance of all phentypes seen in a population is split up. Complex dependency structures between related individuals (derived from pedigree or genomic data), or environmental factors of wild study populations need to be properly accounted for. We usually take a Bayesian approach using INLA. More recently, genomic data has opened new opportunities and statistical challenges, where master students can work on.
- Genomic prediction: A related statistical problem in evolutionary biology is the prediction of a trait (e.g., mass) of an animal, given its genomic data. There exists a variety of methods, and the models for prediction are still being improved, especially for wild animal populations.
- Methods to analyse telemetry data: As it has become cheaper in the past years to equip wild animals with GPS collars to understand their resource preferences, there is a growing need in improved quantitative tools to analyse and interpret these data. There exists a wild variety of approaches to analyze such data, which are often assumed to be generated according to an inhomogeneous Poisson process. Bachelor students could work with real world examples and reproduce results from published papers. If you are a project/master student we can find an open question to work on.
- I have also been working extensively on the problem of measurement error in variables of regression methods, namely for GLMMs and survival models. There are some interesting open questions regarding the effect and methods to account for such errors. Importantly, there are many mechanisms by which measurement error (often understood as measurement uncertainty) can emergy, but the most fundamental difference is that between classical and Berkson measurement error. I have coded two Shiny apps here:
https://stefaniemuff.shinyapps.io/MEC_ChooseL/
https://stefaniemuff.shinyapps.io/MEB_ChooseL/
The appealing thing about a measurement error project is that you can choose among a wide variety of statistical models and questions - virutally all methods in applied statistics are affected by it.
Specific projects (but contact me and we can find something else)
- Machine learning (ML) for genomic prediction: One task of interest in the presence of large genomic datasets is to predict the phenotype (e.g., weight, height) of an indidivual from genomic data. So far we are mostly relying on linear modeling assumptions, but the complexity of genomic data has made the use of machine learning techniques, such as neural networks or boosted regression trees, more attractive. In the proposed project the student can apply and/or further develop such a technique using a real dataset from our collaborators at the Centre for Biodiversity Dynamics (CBD). I'm currently particularly interested in interpretable methods that overcome the "black box" problem that is commony critiziced in ML methods, and in methods that explicitly incorporate interactions between the genes and the environment (GxE interactions). All methods can be benchmarked against one or several state-of-the-art methods.
- Variable importance in quantitative genetics for non-continuous traits: There is a close link between variable importance (variance explained by fixed effects are of interest) and the animal models in quantitative genetics (variance explained by random effects is of interest). The student can look closer into this connection and potentially suggest methods to improve quantitative genetic analyses based on statistical insight. The idea is to look at binary and count (Poisson) traits, since a previous student already worked on linear models. An additional outcome of the thesis can be an R package that implements these methods.
- Genomic methods in quantitative genetics: Different approaches to model quantitative traits using the large genomic datasets have different strenghts. In particular, the genetic architecture (i.e., the distribution of effect sizes across the genome that affect a trait of interest) may affect the decision which method to use. In this project, the student can investigate different traits, for example traits that are only determined by one or a few genes, such as fur or eye color, but also highly complex traits such as body mass or bone lengths, and systematically compare existing and novel methods. The thesis can either been based on real data, or on simulation studies - or both.
Requirements
- Students writing a master thesis with me should have a solid foundation in statistical learning, generalized linear mixed models and statistical computing. Recommended courses include
- Statistical Learning: https://www.ntnu.edu/studies/courses/TMA4268#tab=omEmnet
- Computer-intensive Statistical Methods: https://www.ntnu.edu/studies/courses/TMA4300#tab=omEmnet
- Generalized Linear Models: https://www.ntnu.edu/studies/courses/TMA4315#tab=omEmnet
- As "fordypningsemne", I usually recommend to get a background in Evolutionary and Ecological Genetics: https://www.ntnu.edu/studies/courses/BI3083#tab=omEmnet, but this an be discussed.