Project and master thesis work in statistics supervised by Erlend Aune
In addition to the projects suggested on this page, I am open to project proposals within statistical learning. My main interests are within
- Learning when little training data is available
- Neural networks for time series modeling
- Applying models to novel datasets
The most relevant subjects for the these I supervise are:
- Statistical learning: https://wiki.math.ntnu.no/tma4268
- Intelligent Text Analytics: and Language Understanding https://www.ntnu.edu/studies/courses/TDT4310#tab=omEmnet
- TMA4285 Tidsrekkemodeller (autumn semester)
The suggestions for theses presented here have clear applications. However, many of the problems will require substantial modification of existing methodology that may generalize to other problems. Whether the focus of the project/thesis is mainly methodological or on the application is up to the individual student.
I will update this page with specific project proposals continuously.
Machine Learning with Food
What is a good recipe? Which ingredients are likely to match with each other? Can I generate a meaningful recipe that I would like to try out? These are possible questions that you could try to answer in this project.
A simple search on Google for “Coq au vin” will give myriads of recipes for this traditional french dish. In this project, you will be working with ten thousands of recipes, some of which have reviews, to extract useful information and analyse this information to hopefully learn more about food. The specifics of the project is based on personal interests, but some examples are: Classifying how good individual recipes are and extract the essential information that makes one recipe “good” or “bad”. Extract and harmonize ingredients/quantities and other information of interest from recipes. This is closely related to Named Entity Recognition in natural language processing. Based on ingredient list, automatically generate cooking instructions for that recipe.
In this thesis, you will be using state-of-the-art deep neural network models for text, e.g. LSTM, trellis networks and/or transformers. Decent python experience is expected.
Which wines are likely to outperform their peers? Are producers, regions, grape varieties and vintages king, or does name and label matter?
In this project, you will be working with wine metadata and sales numbers in the Norwegian market.
Example questions of interest are
- How early can we detect wine trends?
- What are important attributes for a wine to perform better than its peers?
- When is a wine likely to fall out the basic selection?
This project will use data provided by Grapespot. It is likely that you will be using time series models, deep neural networks (such as LSTMs) or similar models in this project.
Active Learning is one way of dealing with limited training data. The underlying goal is to only label data points that a model is uncertain about. I’m primarily interested in the following aspects of Active Learning:
- What are good uncertainty measures for a model? Can we use statistics to find better uncertainty measures?
- One-Shot Active Learning (with heterogeneous data)
- Applying active learning to information extraction
The baseline models are typically flexible models, such as deep neural networks. It is well known that these are data intensive to train, and Active Learning may in many cases help with achieving good performance on such models with less training data.
Deep learning for time series data
Time series are everywhere. From sensors existing in mobile phones, sensors governing and monitoring industry processes, weather and climate data to stock markets and demographic data. These times series influence both political and industrial decisions every day, yet there is a surprising lack of research for deep learning for time series - especially in situations where data is scarce or of low-quality.
Forecasting, change point detection, anomaly detection, imputation, denoising, and resampling/super-resolution are standard tasks within time series, and a project will typically focus on one or more of these. Some suggested topics would be:
- Transfer learning for short time series
- One-shot learning for time series
- Uncertainty quantification in deep learning for time series.
Channel separation in audio or sensor data
When listening to music or multi-speaker audio, the raw wave sound is a function of several instruments or instruments. Similarly, when observing the values for a specific sensor in, e.g., an IoT-setting, this sensor may record a nonlinear superposition of many signals. In many applications it is desirable to decompose observed signal into its individual components (say, instruments, speakers, physical observables). This project is about creating machine learning models for extracting these individual components.