Project and master thesis work in statistics supervised by Erlend Aune

In addition to the projects suggested on this page, I am open to project proposals within statistical learning. My main interests are within

  • Learning when little training data is available
  • Neural networks for time series modeling
    • Generative modeling for time series
    • Representation learning for time series
  • Applying models to novel datasets
  • Injecting context / auxiliary information into neural networks

The most relevant subjects for the theses I supervise are:

The suggestions for theses presented here have clear applications. However, many of the problems will require substantial methodological improvements that may generalize to other problems. Whether the focus of the project/thesis is mainly methodological or on the application is up to the individual student.

I supervise in both Norwegian and English - depending on preference. I primarily supervise master theses and specialization projects.

Deep learning for time series data

Time series are everywhere. From sensors existing in mobile phones, sensors governing and monitoring industry processes, weather and climate data to stock markets and demographic data. These times series influence both political and industrial decisions every day, yet there is a surprising lack of research for deep learning for time series - especially in situations where data is scarce or of low-quality.

Forecasting, change point detection, anomaly detection, imputation, denoising, and resampling/super-resolution are standard tasks within time series, and a project will typically focus on one or more of these. Some suggested topics would be:

  • Generative modeling
    • Masked modeling ("language modeling" for time series)
    • Diffusion models
  • Representation learning
  • One-shot learning for time series
  • Uncertainty quantification in deep learning for time series.
  • Mixed frequency modeling
  • Imputing missing data and denoising of noisy data
  • Deep learning for high frequency financial data

Theses within this topic will be connected to the project Machine Learning for Irregular Time-Series (ML4ITS), funded by the Research Council.

Machine Learning with Food

What is a good recipe? Which ingredients are likely to match with each other? Can I generate a meaningful recipe that I would like to try out? These are possible questions that you could try to answer in this project.

A simple search on Google for “Coq au vin” will give myriads of recipes for this traditional french dish. In this project, you will be working with ten thousands of recipes, some of which have reviews, to extract useful information and analyse this information to hopefully learn more about food. The specifics of the project is based on personal interests, but some examples are: Classifying how good individual recipes are and extract the essential information that makes one recipe “good” or “bad”. Extract and harmonize ingredients/quantities and other information of interest from recipes. This is closely related to Named Entity Recognition in natural language processing. Based on ingredient list, automatically generate cooking instructions for that recipe.

A natural possibility in this domain is fine-tuning a large language model such as llama (https://huggingface.co/meta-llama) or mistral (https://huggingface.co/mistralai) to perform the task.

In this thesis, you will be using state-of-the-art deep neural network models for text, e.g. transformers. Decent python experience is expected.

Wine Intelligence

Which wines are likely to outperform their peers? Are producers, regions, grape varieties and vintages king, or does name and label matter?

In this project, you will be working with wine metadata and sales numbers in the Norwegian market.

Example questions of interest are

  • How early can we detect wine trends?
  • What are important attributes for a wine to perform better than its peers?
  • When is a wine likely to fall out the basic selection?

This project will use data provided by Grapespot. It is likely that you will be using time series models, deep neural networks (such as transformers) or similar models in this project.

Active Learning

Active Learning is one way of dealing with limited training data. The underlying goal is to only label data points that a model is uncertain about. I’m primarily interested in the following aspects of Active Learning:

  • What are good uncertainty measures for a model? Can we use statistics to find better uncertainty measures?
  • One-Shot Active Learning (with heterogeneous data)
  • Applying active learning to information extraction

The baseline models are typically flexible models, such as deep neural networks. It is well known that these are data intensive to train, and Active Learning may in many cases help with achieving good performance on such models with less training data.

Channel separation in audio or sensor data

When listening to music or multi-speaker audio, the raw wave sound is a function of several instruments or instruments. Similarly, when observing the values for a specific sensor in, e.g., an IoT-setting, this sensor may record a nonlinear superposition of many signals. In many applications it is desirable to decompose observed signal into its individual components (say, instruments, speakers, physical observables). This project is about creating machine learning models for extracting these individual components.

Contact info

2023-11-09, Erlend Aune