Statistics seminars at Department of Mathematical Sciences, NTNU
Spring 2023
Seminars will be on Thursdays at 14.15-15.00 in room 656, 6th floor, Sentralbygg 2, Gløshaugen. <https://link.mazemap.com/9e00pTUZ>
- January 19: Ekaterina Poliakova: "Confidence intervals for the parameter of the scaled-uniform model" DOI
- February 2: Ron Togunov: 'Behavioural ecology of a vulnerable Arctic predator in a dynamic and changing environment'
- February 16: Randi Hammervold (NTNU Handelshøyskolen): "A brief introduction to Structural Equation Modeling SEM. Does choice of software matter?"
- March 16: Ekaterina Poliakova: "Invariant Confidence Sets with Smallest Expected Measure" DOI
Do you want to give a talk, or have suggestions for topics you would like to hear about or speakers to invite? Let us know at Jo.Eidsvik@ntnu.no and/or Mette.Langaas@ntnu.no
Autumn 2022
The general time for seminars this semester is Thursdays at 13.15-14.00. Room will be announced and will be a mix of R3 (Realfagsbygget) and Simastuen (Sentralbygg 2, room 656)
- August 24th - Klaus Mosegaard (University of Copenhagen, Denmark.): "Accurate Inversion Based on Approximate Models"
- August 25th - All-day emeritus celebration for Henning Omre.
- September 15th - All-day emeritus celebration for Bo Lindqvist.
- October 20th - 13.15-14.00 in R3. Jo Eidsvik (NTNU): "Methods for conducting robotic sampling of spatio-temporal ocean variables"
- October 28-29th - Symposium at Bårdshaug. Statistics Symposium
- November 10th - 13.15-14.00 Taraldsen, G.: "Mathematics of improper priors and posteriors" Abstract
- November 17th - 17.00 room 1329 Sentralbygg 2: Medlemsmøte NSF avd. Trondheim with talk by Bekkemoen on XAI.
- December 8th - 13.15-14.00. Johannes Mauritzen (NTNU Handelshøyskolen) "Teaching applied statistics to business students… the hard way." Simastuen rom 656 Sentralbygg 2.
- December 15th - 15. "Needles in space - Bertrand's paradox', Omre, H.
13th floor Sentralbygg 2.
Time/place: Thursday, December 8, 2022, 13:15-14, - Location: 656, Sentralbygg 2
Speaker: Johannes Mauritzen (NTNU Handelshøyskolen)
Title: "Teaching applied statistics to business students… the hard way."
Abstract:: I will share experiences and ask for feedback after teaching applied statistics at NTNU Business School for the first time this fall. The course has used a simulation based approach with the Python programming language and associated
A few topics I hope to cover
- Textbook(s): I use Gelman, Hill and Vehtari. "Regression and other stories" https://avehtari.github.io/ROS-Examples/, and other supplemental books
- Teaching data management, cleaning, transformation, descriptive statistics and hypothesis generation (data science?) as a part of statistics course
- Experience with a "flipped" set-up
- Visualisation as a primary technique in all phases of an analysis
- Understanding uncertainty through simulation
- Including Bayesian analysis in a lower-level statistics course/use of Decision Analysis
- Evaluation form: Term project vs. exam
- Student feedback: The good, the bad and the ugly
Course materials can be found on the course website (in Norwegian) or a separate page with links to the "labs" https://jmaurit.github.io/anv_statistikk/ https://jmaurit.github.io/stats/
Spring 2022
Tentatively every second Monday at 13.15-14.00. All employees in the statistics group will get an announcement by e-mail. Send an e-mail to Sara Martino if you wish to contribute with a talk or be added to the mailing list.
Overview
- February 7th - Erick Orozco-Acosta (Public University of Navarre, Spain.): "An approach to Bayesian modeling in spatio-temporal disease mapping" "An approach to Bayesian modeling in spatio-temporal disease mapping"
- February 21st - Michail Spitieris (NTNU): “Bayesian Calibration of imperfect computer models using Physics-informed priors”.
- March 28th - Geir Aamodt(NMBU): "Access to green area and health – what is the connection?"
- April 21st - Jaione Etxeberria (University of Navarra): "Disease Mapping, is it the game over?"
- May 2nd - Andrew Seaton (Univ.of Glasgow): "Inlabru: software for fitting latent Gaussian models with non-linear predictors".
- May 23th - Melvin Vaupel (NTNU): "An introduction to TDA for statisticians, part 1"
- May 30th - Melvin Vaupel (NTNU): "An introduction to TDA for statisticians, part 2"
- June 7th - Alessandra Menagoflio (Politecnico di Milano): "Object Oriented Spatial Statistics for the analysis of georeferenced complex data"
- June 10: Sortieseminar with Henning
Details
Time/place: Tuesday, June 7th, 2022, 13:15-14, - Location: F4 |
---|
Speaker: Alessandra Menafoglio (Politecnico di Milano) |
Title: "Object Oriented Spatial Statistics for the analysis of georeferenced complex data" |
Abstract:: The analysis of complex data distributed over large or highly textured regions poses new challenges for spatial statistics. Object Oriented Spatial Statistics (O2S2) is a recent system of ideas and methods that allows to analyze high dimensional and complex data when their spatial dependence is an important issue. We present the key concepts of O2S2, as a general approach to analyze and predict georeferenced complex data, interpreted as objects in appropriate mathematical spaces. Examples of object data include functional data, distributional data or tensors. We discuss the extension of key geostatistical concepts (e.g., stationarity) and methods (e.g., Kriging) to the context of O2S2, and discuss recent extensions of these methods that permit the analysis of object data distributed over complex regions. Here, we shall ground our developments on computational intensive methods, based on random domain decompositions of the study domain, and on a novel model valid on tree-structured domains. The presented models and methods will be illustrated through real environmental case studies. (Joint work with P. Secchi) |
Time/place: Monday, May 23rd, 2022, 13:15-14, - Location: S4 |
---|
Speaker: Melvin Vaupel (NTNU) |
Title: "An introduction to TDA for statisticians" |
Abstract:: We will delve into the world of topological data analysis (TDA) and explore how the mathematics of shapes can be useful in the analysis of big datasets. In a first talk I will explain what simplicial complexes are and how we can mathematically identify holes and cavities in their structure by using the computational tool called simplicial homology. In the second talk we will shift the focus to how homology can actually be used in data analysis. We discuss how to obtain simplicial complexes from datasets and then introduce persistent homology. We will also explore various actual applications. If time permits we take a glance at other methods from TDA like for example UMAP or ToMATo. Slides |
Time/place: Monday, May 2nd, 2022, 13:15-14, - Location: F6 |
---|
Speaker: Andrew Seaton (University of Glasgow) |
Title: "Inlabru: software for fitting latent Gaussian models with non-linear predictors" |
Abstract: In recent years, the integrated nested Laplace approximations (INLA) method has become a popular approach for fitting latent Gaussian models (LGMs). The computational efficiency of INLA is based on approximating integrals and leveraging sparsity in the latent Gaussian precision matrix. This has led to a wide adoption of INLA over slower MCMC methods for LGMs with sparse effects, in particular in the spatial statistics setting where it is common to have model components with a large number of parameters. However, the speed of INLA comes with some restrictions over the types of models that it can fit. The inlabru software implements a new approximate inference method that extends the class of models that can be fitted using INLA. In particular, inlabru allows LGMs to have predictor expressions that are non-linear in their parameters. The approach uses an iterative procedure, fitting a linearised model configuration using INLA at each step. We present the theory of the method as well some applications of this approach to (i) specify new prior distributions on model parameters (ii) model the detectability of animals and (iii) aggregate a continuously indexed predictor to discrete areal units The iterative INLA method is general and has the potential to be used in a wide variety of applications, allowing users to benefit from the speed of INLA in new contexts. |
Time/place: Thursday, April 21st, 2022, 12:15-13, - Location: simastuen |
---|
Speaker: Jaione Etxeberria (University of Navarra) |
Title: "Disease Mapping, is it the game over?" |
Abstract: Diseases mapping techniques, a particular field of spatial and spatio-temporal methods, allow mapping the incidence, mortality, or prevalence of different diseases and that, for several decades, has been a fundamental part of public health research in the study of diseases in human populations. The group Spatial Statistics from the Public University of Navarre, have proposed and analysed different spatio-temporal techniques to estimate and forecast one or various diseases, elucidating their spatial evolution over time and detailing if the evolution is the same or different by age groups and cohort of birth. The aim of this talk is to provide an overview of these techniques, presenting first the epidemiological problems that motivated the use of each method. Why is so important the space-time analysis of breast, prostate, or pancreatic cancer in Spain? Why the epidemiologists need short-term cancer forecasts? How the national cancer incidence estimates could be improve in Spain? |
Time/place: Monday, March 28st, 2022, 13:15-14, - Location: R92 |
---|
Speaker: Geir Aamodt (NMBU) |
Title: "Access to green area and health – what is the connection?" |
Abstract: There exist several data sources to assess access to green areas such as land-cover and land-use maps, a combination of these, satellite images, laser technology, and even street-view images. Based on these data, we use different metrics to assess access to green area such as size of the green area, distance to the green areas, or just greenness. I will show how we compute the different metric of access to green areas, link this information to health surveys, and discuss how this information can be used for planners and decision makers. |
Time/place: Monday, February 21st, 2022, 13:15-14, - Location: S21 |
---|
Speaker: Michail Spitieris (NTNU) |
Title: "Bayesian Calibration of imperfect computer models using Physics-informed priors" |
Abstract: In this work, we introduce a computational efficient data-driven framework suitable for quantifying the uncertainty in physical parameters of computer models, represented by differential equations. We construct physics-informed priors for differential equations, which are multi-output Gaussian process (GP) priors that encode the model's structure in the covariance function. We extend this into a fully Bayesian framework which allows quantifying the uncertainty of physical parameters and model predictions. Since physical models are usually imperfect descriptions of the real process, we allow the model to deviate from the observed data by considering a discrepancy function. For inference Hamiltonian Monte Carlo (HMC) sampling is used. This work is motivated by the need for interpretable parameters for the hemodynamics of the heart for personal treatment of hypertension. The model used is the arterial Windkessel model, which represents the hemodynamics of the heart through differential equations with physically interpretable parameters of medical interest. As most physical models, the Windkessel model is an imperfect description of the real process. To demonstrate our approach we simulate noisy data from a more complex physical model with known mathematical connections to our modelling choice. We show that without accounting for discrepancy, the posterior of the physical parameters deviates from the true value while accounting for discrepancy gives reasonable quantification of physical parameters uncertainty and reduces the uncertainty in subsequent model predictions. We also apply our approach to the heat equation which is a space-time dependent partial differential equation (PDE) where we consider a case of biased sensor measurements. |
Time/place: Monday, February 7th, 2022, 13:15-14, - Location: S21 |
---|
Speaker: Erick Orozco-Acosta (Public University of Navarre, Spain.) |
Title: "An approach to Bayesian modeling in spatio-temporal disease mapping" |
Abstract: In this talk we present a general procedure to analyse high-dimensional spatio-temporal count data, with special emphasis on relative risks estimation in cancer epidemiology. Poisson mixed models are typically used in this context for the analysis of count data within a hierarchical Bayesian framework. However, when the number of areas and time periods increase considerably, risk estimation becomes computationally expensive or even unfeasible. In this talk we present a simple and practical idea to solve this problem based on a "divide and conquer" approach. The evaluation of the proposed methodology is carried out in a simulation study with a twofold objective: to estimate risks accurately and to detect extreme risk areas while avoiding false positives/negatives. An analysis of real data is also presented. We show that our method outperforms classical global models. |
Fall 2021
Tentatively every second Monday at 13.15-14.00. All employees in the statistics group will get an announcement by e-mail. Send an e-mail to Sara Martino if you wish to contribute with a talk or be added to the mailing list.
Overview
- August 23rd - Florian Beiser: "Comparison of Ensemble-Based Data Assimilation Methods for Drift Trajectory Forecasting"
- September 6st - No seminar
- September 20th - Susan Anyosa (NTNU): "Adaptive spatial designs minimizing the integrated Bernoulli variance in spatial logistic regression models: with applications to benthic habitat mapping "
- October 4th - Martin Outzen Berild (NTNU): "Importance Sampling with the Integrated Nested Laplace Approximation"
- October 18th - The Tien Mai (NTNU): "On low-rank matrix completion and estimation from a Bayesian perspective"
- October 25th - Athenais Gautier (University of Bern, Swizerland): "Spatial logistic Gaussian processes for probability density field modelling, and how to leverage them for speeding-up Bayesian inference"
- November 1st - Erik Hermansen (NTNU): "The (toroidal) topology of neural activity"
- November 15th - Geir Arne Fuglstad (NTNU): "The Two Cultures for Prevalence Mapping: Small Area Estimation and Spatial Statistics"
- November 29th - Bert van der Veen (NTNU): “Model-based ordination with constrained latent variables”.
- December 20th -
Details
Time/place: Monday, November 29th, 2021, 13:15-14, - Location: S1 | |
---|---|
Speaker: Bert van der Veen (NTNU) | |
Title: “Model-based ordination with constrained latent variables” | |
Abstract: In community ecology, unconstrained ordination can be used to predict latent variables from a multivariate dataset, which generated the observed species composition. Latent variables can be understood as ecological gradients, which are represented as a function of measured predictors in constrained ordination, so that ecologists can better relate species composition to the environment while reducing dimensionality of the predictors and the response data. However, existing constrained ordination methods do not explicitly account for information provided by species responses, so that they have the potential to misrepresent community structure if not all predictors are measured. We propose a new method for model-based ordination with constrained latent variables in the Generalized Linear Latent Variable Model framework, which incorporates both measured predictors and residual covariation to optimally represent ecological gradients. Simulations of unconstrained and constrained ordination show that the proposed method outperforms CCA and RDA. |
Time/place: Monday, November 15th, 2021, 13:15-14, - Location: R4 | |
---|---|
Speaker: Geir-Arne Fuglstad (NTNU) | |
Title: "The Two Cultures for Prevalence Mapping: Small Area Estimation and Spatial Statistics" | |
Abstract: The Sustainable Development Goals have sparked a huge interest in fine-scale spatio-temporal estimation of demographic and health indicators in low- and middle-income countries (LMICs). We consider indicators that are expressed as prevalences, and are interested in estimating prevalences at specific subnational levels that are nested in the national geography. This is challenging since a key data source is surveys that only sample 400–1600 locations approximately every fifth year. Weighted estimators directly targets areal prevalences, but typically breaks down at the desired spatial scales. Geostatisical approaches can borrow strength across space, time and covariates to produce pixel level maps of risk, but spatial aggregation with respect to a population distribution is challenging and the methods are not fully acknowledging the complex survey design. We describe how spatial and spatio-temporal methods are currently used for small area estimation in the context of LMICs, highlight the key challenges that need to be overcome, and discuss a new approach, which is methodologically closer in spirit to small area estimation. For the talk we focus on the spatial setting and illustrate the points through a spatial analysis of vaccination coverage in Nigeria based on the 2018 Demographic and Health Surveys (DHS) survey. The talk is based on https://arxiv.org/abs/2110.09576. |
Time/place: Monday, November 1st, 2021, 13:15-14, - Location: 656 | |
---|---|
Speaker: Erik Hermansen (NTNU) | |
Title: "The (toroidal) topology of neural activity" | |
Abstract: Entorhinal grid cells, so-called because of their hexagonally tiled spatial receptive fields, are organized in modules which, collectively, are believed to form a population code for the animal’s position. Here, we apply topological data analysis to simultaneous recordings of hundreds of grid cells and show that joint activity of grid cells within a module lies on a toroidal manifold. Each position of the animal in its physical environment corresponds to a single location on the torus, and each grid cell is preferentially active within a single “field” on the torus. Toroidal firing positions persist between environments, and between wakefulness and sleep, in agreement with continuous attractor models of grid cells. |
Time/place: Monday, October 25th, 2021, 13:15-14, - Location: 656 | |
---|---|
Speaker: Athenais Gautier (University of Bern, Swizerland) | |
Title: "Spatial logistic Gaussian processes for probability density field modelling, and how to leverage them for speeding-up Bayesian inference" | |
Abstract: When studying natural or artificial systems, it is common for the response of interest to not be fully determined by the system parameters x, but rather to be random and to follow a probability distribution that depends on x. Our aim here is to estimate the underlying field based only on a finite number of observations. Such settings are notably inspired by stochastic optimization and inversion problems, for which estimates and associated uncertainty quantification could be instrumental. The approach that we investigate here generalizes to spatial contexts a class of non-parametric Bayesian density models based on logistic Gaussian processes, and allows modelling (probability) density-valued fields with complex dependences on x while accommodating heterogeneous sample sizes. The induced prior on the space of density fields is called Spatial Logistic Gaussian Process (SLGP). The considered models allow for instance performing (approximate) posterior simulations of probability density functions as well as jointly predicting multiple moments or other functionals of target distributions. We propose an implementation of the SLGP and investigate ways of using the proposed class of model to speed up Approximate Bayesian Computing (ABC) methods. |
Time/place: Monday, October 18th, 2021, 13:15-14, - Location: R40 | |
---|---|
Speaker: The Tien Mai (NTNU) | |
Title: "On low-rank matrix completion and estimation from a Bayesian perspective" | |
Abstract In this talk, I will present some of my current works on Bayesian low-rank matrix completion and estimation. Some different prior distributions for low-rank matrices will be discussed. PAC-Bayesian bounds of correspoding estimators are given. Computational approximation of these Bayesian estimators are also discussed by using MCMC and/or LMC. |
Time/place: Monday, October 4th, 2021, 13:15-14, - Location: R10 | |
---|---|
Speaker: Martin Outzen Berild | |
Title: Importance Sampling with the Integrate Nested Laplace Approximation | |
Abstract Recently, methods have been developed to extend this class of models to those that can be expressed as conditional LGMs by fixing some of the parameters in the models to descriptive values. These methods differ in the manner descriptive values are chosen. This paper proposes to combine importance sampling with INLA (IS-INLA), and extends this approach with the more robust adaptive multiple importance sampling algorithm combined with INLA (AMIS-INLA). This paper gives a comparison between these approaches and existing methods on a series of applications with simulated and observed datasets and evaluates their performance based on accuracy, efficiency, and robustness. The approaches are validated by exact posteriors in a simple bivariate linear model; then, they are applied to a Bayesian lasso model, a Bayesian imputation of missing covariate values, and lastly, in parametric Bayesian quantile regression. The applications show that the AMIS-INLA approach, in general, outperforms the other methods, but the IS-INLA algorithm could be considered for faster inference when good proposals are available. |
Time/place: Monday, September 20th, 2021, 13:15-14, | |
---|---|
Speaker: Susan Anyosa | |
Title: Adaptive spatial designs minimizing the integrated Bernoulli variance in spatial logistic regression models: with applications to benthic habitat mapping | |
Abstract: Sampling efforts are important to understand, characterize and monitor environmental variability in big spatial domains. Multiple types of devices can accomplish these tasks in various ways, but time and computational limitations are to be considered too. In-situ sampling is often accurate, but it tends to be very sparse in the vast spatial areas considered for mapping purposes, and careful constructions of sampling designs are needed. In this work we present a criterion for mapping spatial presence-absence variables which overcomes the aforementioned limitations. We use the expected integrated Bernoulli variance criterion to explore regions with more uncertainty about the binary outcomes. Our approach uses a hierarchical Bayesian logistic regression model for the binary variable and develops approximate closed form expressions for the expected integrated Bernoulli variance. This is extended to find adaptive designs in the setting of robotic exploration, where there are limited computational resources and operational time restrictions. The fast approximations are shown to be accurate in a simulation study. In an application with benthic habitat mapping we consider a dataset from Australia to demonstrate the suggested sampling design approach where one adaptively selects drop locations for an underwater vehicle gathering corals data. |
Time/place: Monday, August 23rd , 2021, 13:15-14 - Location:R10 |
---|
Speaker: Florian Beiser |
Title: Havvarsel: Comparison of Ensemble-Based Data Assimilation Methods for Drift Trajectory Forecasting |
Abstract: The Havvarsel project aims to develop personalized forecasts in a two-way data flow system, where the novel statistical and mathematical methods are exemplified on representative demonstrator cases of oceanic applications. In search-and-rescue operations at sea, for instance, the short-term predictions of drift trajectories are essential to efficiently define search areas that reflect both the forecasted trajectory and the associated uncertainties. To this end, we consider large ensembles of simplified ocean models and assimilate in situ buoy observations, which are typically very sparse compared to the high-dimensional state space. We compare two state-of-the-art ensemble-based data assimilation methods for applications such as forecasting of drift trajectories. The first method is a version of the ensemble-transform Kalman filter (ETKF), which is herein modified to be efficient for sparse point observations. The second method is the implicit equal-weights particle filter (IEWPF), a recently proposed method for non-linear filtering, which has earlier been shown to be efficient for the relevant application. We provide a comparison of the two data assimilation methods applied to a simplified ocean model instance, where we oppose the statistical properties of the state estimation and of the drift trajectory forecasts. |
Spring 2021
Tentatively every second Monday at 13.15-14.00 in S21/Zoom. All employees in the statistics group will get an announcement by e-mail. Send an e-mail to Sara Martino if you wish to contribute with a talk or be added to the mailing list.
Overview
- February 1st - PhD presentation (Silius, Kwaku and Fredrik)
- February 15th - PhD presentation (Martin OB, Yaolin, Umut)
- March 1st - Michela Cameletti: "The Covid-19 pandemic: impact on air-pollution, health and mortality "
- March 15th - PhD presentation (Emma, Janne, Håkon)
- March 22th - Jon Olav Vik (NMBU): "Data wants to be plotted! Visualization and data literacy in an introductory course in biological data analysis."
- April 12th - Phd presentation (Mina Spremic and Yngvild Hamre)
- April 26th - Pål Vegard Johnsen: "Genome-wide association studies with imbalanced binary responses"
- May 10th - Jorge Sicacha Parada: "A spatial modeling framework for monitoring surveys with different sampling protocols"
- May 25th - Andreas Asheim: "Experts in Teamwork with a twist of statistics "
- June 7th - Emily Grace Simmonds: Teaching statistics to biology students
Details
Time/place: Monday, June 7, 2021, 13:15-14, | |
---|---|
Speaker: Emily Grace Simmonds | |
Title: Teaching statistics to biology students | |
Abstract: use modern statistical techniques even in non-mathematical disciplines. Teaching statistics to these groups requires a balance between providing sufficient theoretical background in statistics whilst meeting the desired learning outcomes from the non-specialist discipline. Since 2019, we have been working on developing the ST2304 "statistical modelling for biologists and biotechnologists" as part of a project funded by NV faculty. In this course we teach both programming in R and statistical modelling. We have been working to move the course from a lecture-based style to be constructed of interactive class modules, hosted online, and problem based exercise sessions. The aim has been to improve relevance of the statistical theory for the core subject and improve retention of concepts beyond the focal course. In this seminar we will give an outline of the changes we have made to this course since 2019 and finish with a discussion of whether these techniques are applicable to other statistics courses for non-specialist audiences. |
Time/place: Tuesday, May 25, 2021, 13:15-14, | |
---|---|
Speaker: Andreas Asheim | |
Title: Experts in Teamwork with a twist of statistics | |
Abstract: All fourth-year students at NTNU follow a course called Experts in Teamwork (EiT). The aim of the course is to work in a cross disciplinary team to solve some real problem, while also learning collaboration competence. I have just finished my fifth semester teaching this course. In my version of EiT, the students have been encouraged to work on projects related to reuse of data. That is, can we do something useful with data that may have bad quality, but tend to be laying around in large amounts? This is a favourite topic of mine, being a researcher at St. Olav's Hospital who specialises in using data from all parts of the healthcare system to evaluate quality of care, resource use, patient safety etc. Such projects are great to have students from all backgrounds to work together on. Computer skills are of course essential to have, and so is knowing a bit of statistics. In this seminar, I will give you a little overview of what I and my students have been playing with, and how it is to teach one of the strangest, but most rewarding university courses. |
Time/place: Monday, May 10, 2021, 13:15-14, | |
---|---|
Speaker: Jorge Sicacha Parada | |
Title: A spatial modeling framework for monitoring surveys with different sampling protocols | |
Abstract: Ecological abundance data are mostly collected through professional surveys as part of monitoring programs, often at a national level. However, these surveys rarely follow exactly the same sampling protocol in different countries. Based on data from bird monitoring programs in Norway and Sweden, we propose a modeling framework for merging professional surveys from different countries and producing one estimate for the expected abundance, its uncertainty and the covariate effects. This framework assumes a common Gaussian Random Field driving both the observed and true abundances with either a linear or a relaxed linear association between them. Results of our case study and a simulation study are presented. |
Time/place: Monday, April 26, 2021, 13:15-14, | |
---|---|
Speaker: Pål Vegard Johnsen | |
Title: Genome-wide association studies with imbalanced binary responses | |
Abstract: In genome-wide association studies, one is searching for associations between spesific genetic markers (SNPs) and for instance a disease. An association is screened for multiple genetic markers via a logistic regression model and by using the score test statistic. The score test statistic can be shown to follow an asymptotic normal distribution under the null hypothesis. However, practically this approximation is not sufficiently accurate under certain conditions when using binary responses. We will show when this is the case, and we will propose an improvement using saddlepoint approximation theory. |
Time/place: Monday, March 22, 2021, 13:15-14, | |
---|---|
Speaker: Jon Olav Vik (NMBU) | |
Title: Data wants to be plotted! Visualization and data literacy in an introductory course in biological data analysis. | |
Abstract: First-year students in biological programmes at NMBU are given a 10-credit introductory course in biological data analysis. It was created to remedy the problem that most of our master students had no clue where to begin if you sent them a file with actual data in it. The emphasis is on transforming data to graphics so students can begin to discuss and speculate based on their own real-world understanding of biology. To this end, we teach them just enough programming to generate reproducible reports with R markdown, focusing on data graphics and data wrangling. Datasets are chosen to be interpretable with limited training, relevant to our students, and ideally come from NMBU. The analyses can be built upon in later courses in formal statistics as well as advanced biology courses, letting data literacy grow rather than waste away in the years before master level. I will illustrate my talk with learning goals from the course and how students work with them through a typical week. |
Time/place: Monday, March 1, 2021, 13:15-14, |
---|
Speaker: Michela Cameletti (Università degli studi di Bergamo) |
Title: The Covid-19 pandemic: impact on air-pollution, health and mortality |
Abstract: The Covid-19 pandemic is affecting our societies and lives in different ways. In this talk I discuss the case of Italy with respect to three different aspects connected with the pandemic: the effect of the first lockdown on air quality and pollutant concentrations, the spatio-temporal dynamics of excess mortality during the first Covid-19 wave and the impact on stage shift and increased mortality due to the delays in the screening procedures for colorectal cancer. |
Fall 2020
Tentatively every second Monday (week 34,36,38, …). All employees in the statistics group will get an announcement by e-mail. Send an e-mail to Gunnar Taraldsen if you wish to contribute with a talk or be added to the mailing list.
Overview
- August 17th: Cedric Travelletti, University of Bern, Switzerland
- August 31th: PDF Erlend Nilsen, Norwegian Institute for Nature Research (NINA) and Professor at Nord University
- September 14th: Open
- September 28th: PDF Torsten Hothorn, University of Zurich, Switzerland
- October 12th: PDF Vilja Koski, University of Jyvaskyla, Finland
- October 26th: PDF Leonhard Held, University of Zurich
- November 9th: PDF Håkon Tjelmeland
- November 23th: PDF Pierre Druilhet (Zoom only)
- December 7th: PDF Stein Andreas Bethuelsen (Zoom only)
- December 21th:
Details
Time/place: Monday, December 7h, 2020, 12.15-13.00, Zoom |
---|
Speaker: Stein Andreas Bethuelsen, Department of Mathematics, University of Bergen Title: Random walk in dynamic random environment, with applications in biology |
Abstract: In this talk I will give a gentle introduction to certain models of random walks on evolving networks. In the models I will focus on, the evolving network can be interpreted as a spatial population model that changes with time and the random walk traces the ancestral lineages of an individual in this population. The results I will present show that these random walks behave asymptotically like simple random walks. |
Time/place: Monday, November 23th, 2020, 12.15-13.00, Zoom |
---|
Speaker: Pierre Druilhet, Laboratoire de Mathématiques, Université Clermont Auvergne Title: Optimal cross-over designs |
Abstract: Cross-over designs are devices used in many fields : drugs experiment, field experiment, sensory analysis, animal feeding experiments, etc. Depending on the application, several statistical models can be proposed to modelize both the correlations and the interference between treatments. In this talk, we propose a new method to construct optimal cross-over desings when the parameter of interest is the total effect, which corresponds to a treatment used alone. We compare our designs with existing ones. We also propose some methods to reduce the number of subjects necessary to get an optimal design. |
Time/place: Monday, November 9th, 2020, 12.15-13.00, S21, sentralbygg 2, Gløshaugen (speaker) and Zoom |
---|
Speaker: Håkon Tjelmeland, NTNU Title: A new Bayesian ensemble Kalman filter strategy |
Abstract: We propose a new framework for the updating of a prior ensemble to a posterior ensemble, an essential yet challenging part in ensemble based filtering methods. The proposed framework is based on a Bayesian and generalised view of the traditional ensemble Kalman filter (EnKF). In this presentation we start with an introduction to the state-space model and the Kalman filter, and thereafter give an introduction to the traditional EnKF and a previously proposed Bayesian version of EnKF. The starting point for our new framework is an assumed Bayesian and Gaussian model for all the variables involved in the updating of one ensemble member in the prior ensemble. Based on this assumed model we identify a class of allowed updating rules. To identify an updating rule that is robust with respect to the Gaussian assumption we formulate an optimality criterion, and derive the optimal updating rule with respect to this criterion. We present simulation examples demonstrating that the proposed updating rule gives a filter that seems to provide a more realistic representation of the uncertainty than to the traditional EnKF. |
Time/place: Monday, October 26, 2020, 12.15-13.00, S21, sentralbygg 2, Gløshaugen and Zoom (speaker) |
---|
Speaker: Leonhard Held, University of Zurich Title: Replicability in drug development |
Abstract: Replicability plays a crucial role in drug regulation. Decisions by the US Food and Drug Administration or European Medicines Agency are typically made based on multiple primary studies testing the same medical product, where the two-trials rule is the standard requirement, despite shortcomings. A new approach is proposed for this task based on the harmonic mean of the squared study-specific test statistics. Appropriate scaling ensures that, for any number of independent studies, the null distribution is a chi-squared distribution with 1 degree of freedom. This gives rise to a new method for combining one- sided p-values and calculating confidence intervals for the overall treatment effect. Further properties are discussed and a comparison with the two-trials rule is made, as well as with alternative research synthesis methods. An attractive feature of the new approach is that a claim of success requires each study to be convincing on its own to a certain degree depending on the overall level of significance and the number of studies. The new approach is motivated by and applied to data from five clinical trials investigating the effect of carvedilol for the treatment of patients with moderate to severe heart failure. arXiv |
Time/place: Monday, October 12, 2020, 12.15-13.00, S21, sentralbygg 2, Gløshaugen and Zoom (speaker) |
---|
Speaker: Vilja Koski, Univ of Jyvaskyla, Finland Title: Optimising the profitability of lake monitoring data |
Abstract: Due to the Water Framework Directive (WFD) of the European Union, a monitoring program is implemented to improve and to secure the quality of inland waters in the EU. Regular and long-term monitoring data of parameters representing biotic structure of waters is used to classify the lakes into five ecological status classes; high, good, moderate, poor and bad. If the water system is in moderate status or less, then some management actions are needed to implement to improve the status. The presentation consists of two parts based on two questions about the uncertainty of collecting lake monitoring data in Finland. The first part is based on the paper by Koski et al. (2020). The aim is to assess concretely, how much it is profitable to pay for monitoring information. We suggest the value of information (VOI) to assess the profitability, when taking into account the costs and benefits of monitoring and management actions, as well as associated uncertainty. Results for our case study on 144 Finnish lakes suggest that generally, the value of monitoring exceeds the cost. The second part of the presentation is focusing of optimal design of lake monitoring. The question is, how to select a subset from entire population to estimate the model parameters in a logistic regression model as accurately and precisely as possible, when we have a limited budget. The paper that the presentation is partly based on: Koski, V., Kotamäki, N., Hämäläinen, H., Meissner, K., Karvanen, J., & Kärkkäinen, S. (2020). The value of perfect and imperfect information in lake monitoring and management. Science of the Total Environment, 726. https://doi.org/10.1016/j.scitotenv.2020.138396 |
Time/place: Monday, September 28, 2020, 12.15-13.00, S21, sentralbygg 2, Gløshaugen and Zoom (speaker) |
---|
Speaker: Torsten Hothorn, University of Zurich, Switzerland Title: Transformation Forests |
Abstract: Regression models for supervised learning problems with a continuous response are commonly understood as models for the conditional mean of the response given predictors. This notion is simple and therefore appealing for interpretation and visualisation. Information about the whole underlying conditional distribution is, however, not available from these models. A more general understanding of regression models as models for conditional distributions allows much broader inference from such models, for example the computation of prediction intervals. Several random forest-type algorithms aim at estimating conditional distributions, most prominently quantile regression forests (Meinshausen, 2006, JMLR). We propose a novel approach based on a parametric family of distributions characterised by their transformation function. A dedicated novel ``transformation tree algorithm able to detect distributional changes is developed. Based on these transformation trees, we introduce ``transformation forests as an adaptive local likelihood estimator of conditional distribution functions. The resulting predictive distributions are fully parametric yet very general and allow inference procedures, such as likelihood-based variable importances, to be applied in a straightforward way. The procedure allows general transformation models to be estimated without the necessity of a priori specifying the dependency structure of parameters. Applications include the computation of probabilistic forecasts, modelling differential treatment effects, or the derivation of counterfactural distributions for all types of response variables.Technical Report available from arxiv |
Time/place: Monday, August 31, 2020, 12.15-13.00, S21, sentralbygg 2, Gløshaugen and Zoom |
---|
Speaker: Erlend Nilsen, Norwegian Institute for Nature Research (NINA) and Professor at Nord University Title: Living Norway Ecological Data Network |
Abstract: The accelerating degradation of our planet's ecosystems and the associated biological diversity is among the main present-day societal challenges, and cutting-edge ecological research is increasingly needed to describe, understand and mitigate these challenges. There is currently a severe mismatch between data availability and research needs, and a general agreement within the environmental research sector that improved data management following FAIR principles would be greatly beneficial to the scientific progress. Living Norway Ecological Data Network is a direct answer to this challenge, and will be in high demand by the research community. To this end, Living Norway Ecological Data Network will: • Serve as the main data-infrastructure for ecological data, including software to prepare, map, publish and archive data through established e-infrastructures, retrieval of data relevant for state-of-the-art ecological research, and held desk services supporting the community. • Serve as a hub facilitating the necessary cultural transformation and increasing the human know-how with respect to data sharing and FAIR data management in the ecology community. • Contribute to continued development and implementation of open standards for ecological data, making them more widely applicable and used in ecological research. • Work closely together with the Norwegian GBIF node, and serve as an extension for mobilizing new data types that are needed for state-of-the art ecological research. The consortium consists of eight institutions that together represent the breadth in Norwegian ecological research. In this talk, I will present the current state and future visions for Living Norway. |
Time/place: Monday, August 17, 2020, 14.15-15.00, 13 floor, sentralbygg 2, Gløshaugen |
---|
Speaker: Cedric Travelletti, University of Bern Title: Methods for out-of-memory Bayesian Inversion with a View towards Optimal Design of Experiments |
Abstract: When using gaussian process priors for Bayesian inversion, one has to deal with matrices that grow as the square of the size of the discretization grid. Hence, memory limitations are quickly hit as the discretization gets finer. We present techniques for Bayesian inversion that leverage implicit reprensentations of the covariance matrix to compute quantities of interest in an almost-matrix-free way. We then show how these techniques may be applied to design optimal data-collection plans for large linear inverse problems and demonstrate them on a gravimetric inversion problem on the Stromboli volcano. |
Spring 2020 with COVID-19
Tentatively in Zoom every second Monday (week 22,24,26, …). All employees in the statistics group will get a zoom link by e-mail. Send an e-mail to Gunnar Taraldsen if you wish to contribute with a talk or be added to the mailing list.
- June 22th: 15:15-16. Jan Hannig, University of North Carolina at Chapel Hill slides
- June 8th: Kamiar Rahnama Rad, Paul H. Chook Department of Information Systems and Statistics slides
- May 25th: No seminar.
Title: Generalized Fiducial Inference |
Abstract: R. A. Fisher, the father of modern statistics, developed the idea of fiducial inference during the first half of the 20th century. While his proposal led to interesting methods for quantifying uncertainty, other prominent statisticians of the time did not accept Fisher's approach as it became apparent that some of Fisher's bold claims about the properties of fiducial distribution did not hold up for multi-parameter problems. Beginning around the year 2000, the presenter and collaborators started to re-investigate the idea of fiducial inference and discovered that Fisher's approach, when properly generalized, would open doors to solve many important and difficult inference problems. They termed their generalization of Fisher's idea as generalized fiducial inference (GFI). The main idea of GFI is to carefully transfer randomness from the data to the parameter space using an inverse of a data generating equation without the use of Bayes theorem. The resulting generalized fiducial distribution (GFD) can then be used for inference. After more than a decade of investigations, the authors and collaborators have developed a unifying theory for GFI, and provided GFI solutions to many challenging practical problems in different fields of science and industry. Overall, they have demonstrated that GFI is a valid, useful, and promising approach for conducting statistical inference References See https://hannig.cloudapps.unc.edu/research.html |
Time/place: Monday, June 8th, 2020, 15:15-16, Zoom |
---|
Speaker: Kamiar Rahnama Rad, https://zicklin.baruch.cuny.edu/faculty-profile/kamiar-rahnama-rad/ Title: Scalable estimation of the out-of-sample prediction error via approximate leave-one-out in the high-dimensional regime. |
Abstract: In this talk, I propose a scalable closed-form formula (ALO) to estimate the out-of-sample prediction error of regularized estimators. Our approach employs existing heuristic arguments to approximate the leave-one-out perturbations. We theoretically prove the accuracy of ALO in the high-dimensional setting where the number of predictors is proportional to the number of observations. We show how this approach can be applied to popular non-differentiable regularizers, such as LASSO. Our theoretical findings are illustrated using simulations and real recordings from spatially sensitive neurons (grid cells) in the medial entorhinal cortex of a rodent. References: Rahnama Rad, K. and Maleki, A. A scalable estimate of the extra-sample prediction error via approximate leave-one-out, Journal of the Royal Statistical Society: B (in press) Rahnama Rad, K. and Zhou, W. and Maleki, A. Error bounds in estimating the out-of-sample prediction error using leave-one-out cross validation in high-dimensions, AISTATS 2020. |
Spring 2020
In S4, every second Monday at 1415-1500
- January 13: Ørnulf Borgan, Univ of Oslo
- January 20: John Paige, University of Washington
- February 24: Jarle Tufto, NTNU
- March 2: Maria Selle, NTNU
- March 9: Martin Jullum, NR
- March 9: Medlemsmøte i NSF, avd. Trondheim, foredrag med Thore Egeland, NMBU
- March 10: Kjell Doksum, Wisconsin/Berkeley 'CANCELED'
- March 23: Geir-Arne Fuglstad, NTNU, 'POSTPONED'
- March 30: Håkon Tjelmeland, NTNU, 'POSTPONED'
- April 2: Jarrod Hadfield, Univ. of Edinburgh. Hans J. Skaug, UiB 'CANCELED'
- April 20: Jo Eidsvik, NTNU, 'POSTPONED'
- April 27: John Tyssedal, NTNU, 'POSTPONED'
Time/place: Monday, March 23, 2020, 14:15-15, Aud S4, Gløshaugen |
---|
Speaker: Geir-Arne Fuglstad, https://www.ntnu.edu/employees/geir-arne.fuglstad Title: Compression of Climate Simulations with a Nonstationary Global Spatio-Temporal SPDE Model |
Abstract: Modern climate models pose an ever-increasing storage burden to computational facilities, and the upcoming generation of global simulations from the next Intergovernmental Panel on Climate Change will require a substantial share of the budget of research centers worldwide to be allocated just for this task. A statistical model can be used as a means to mitigate the storage burden by providing a stochastic approximation of the climate simulations. If a suitably validated statistical model can be formulated to draw realizations whose spatio-temporal structure is similar to that of the original computer simulations, then the estimated parameters are effectively all the information that needs to be stored. We propose a new spatio-temporal statistical model defined via a stochastic partial differential equation (SPDE) on the sphere. The model is able to capture nonstationarities across latitudes, longitudes and land/ocean domains for more than 300 million data points. The model also overcomes some fundamental limitations of current global statistical models available for compression such as the need for gridded data and the inability to handle the poles. Once the model is trained, surrogate runs can be instantaneously generated on a laptop by storing just 20 Megabytes of parameters as opposed to more than 6 Gigabytes of the original ensemble. |
Time/place: Tuesday, March 10, 2020, 14:15-15, Lunchroom, sentralbygg 2, Gløshaugen |
---|
Speaker: Kjell Doksum, Wisconsin/Berkeley, http://pages.stat.wisc.edu/~doksum/ Title: Regression Quantile Differences and High Dimensional Data |
Abstract: Statistical methods that give detailed comparisons of responses from two populations are given for studies that include confounding covariates. Let X and Y adjusted denote responses from the two populations that have been adjusted for the confounding covariates by subtracting the linear regression of the centered responses on the standardized covariates. The comparisons of the two populations are in terms of differences of the X and Y quantiles at different quantile levels a in (0,1). This difference can be represented by a shift function D(x) with the property that X+D(X) has the same distribution as Y. That is, if X is an adjusted control response and Y is an adjusted treatment response, then the model allows the treatment effect to be different for different levels of X. For instance, a medication for high blood pressure may have different effects for people with different levels of blood pressure. The usual linear model for this type of regression experiments assumes the D(x) is a constant equal to the difference of the Y and X means. The statistical methods developed are based on simple simultaneous confidence bands for the shift function D(x) computed from independent multivariate samples from the two populations. The shift function D(x) is nonparametric, while the within population models are linear, making this a partially linear model. The methods are shown to be applicable to high dimensional data where the number of variables p is larger than the two X and Y sample sizes m and n. This is joint work with Summer Yang at New York University. |
Time/place: Monday, March 9, 2020, 17-18, Lunchroom, sentralbygg 2, Gløshaugen |
---|
Speaker: Thore Egeland, NMBU, https://www.nmbu.no/ans/thore.egeland Title: Statistical methods in forensic genetics exemplified by ‘The missing grandchildren of Argentina |
of forensic genetics before the main case is presented: The missing grandchildren of Argentina. This is a well known collection of missing person cases. From 1976 to 1983, Argentina suffered a military civic dictatorship. It is estimated that 30,000 people were kidnapped, sent to clandestine centers, tortured and murdered. Many women were pregnant at the time of abduction. Children were delivered to families related to or from the military forces, and their identities were forged. In most cases their biological parents were murdered and their bodies still remain missing. The objective is to decide whether a person of interest (POI), potentially a child born in captivity, is identical to the missing person in a family, based on the DNA profile of the POI and available family members. We evaluate the statistical power of families from the DNA reference databank (Banco Nacional de Datos Genéticos). As a result we show that several of the families have poor statistical power and require additional genetic data to enable a positive identification. |
Time/place: Monday, March 9, 2020, 14.15-15.00, S4, sentralbygg 2, Gløshaugen |
---|
Speaker: Martin Jullum, NR, https://www.nr.no/?q=publicationprofile&query=jullum Title: How to open the black box – individual prediction explanation |
Abstract: Why did just you get a rejection on your loan application? Why is the price of your car insurance higher than that of your neighbor? More and more such decisions are made by complex statistical/machine learning models based on relevant data. Such (regression) models are often referred to as "black boxes" due to the difficulty of understanding how they work and produce different predictions. As these methods become increasingly important for individuals in our society, there is a clear need for methods which can help us understand their predictions, that is "open the black box". In this talk, I will motivate why this is useful and important. I will further discuss how Shapley values from game theory can be used as an explanation framework. To correctly explain the predictions, it is crucial to model the dependence between the covariates. I will exemplify this by showing that even a simple linear regression model is difficult to explain when the covariates are highly dependent. Finally, I will lay out recent work and methodology for modeling such dependence and how that leads to more accurate explanations through the Shapley value framework. |
Time/place: Monday, March 2, 2020, 14.15-15.00, S4, sentralbygg 2, Gløshaugen |
---|
Speaker: Maria Selle https://www.ntnu.edu/employees/maria.selle Title: Modelling environmental variation in animal breeding – an application for dairy cattle smallholders |
Abstract: In animal breeding the goal is to select animals for breeding that will improve the average genetic value in a population for some trait of interest. In advanced economies with commercial farms, animal breeding has significantly improved the dairy production over the last century. Dairy production systems in many low to middle-income countries have not had the same improvement, mostly because farmers are smallholders so herds are small and the breeding designs lead to confounding between genetic and environmental effects. The parameter of interest in statistical models for animal breeding, is the genetic effect of the animals. A random herd-specific effect is usually included in the statistical model to enhance the separation of the genetic and environmental effects of an animal. Our working hypothesis is spatial modelling can improve estimates of genetic effects for smallholder dairy breeding programs in low to middle-income countries. We propose modelling the environmental variation between different herds using a Gaussian random field with Mátern covariance function in a Bayesian hierarchical framework. To test our working hypothesis, we perform a simulation study, with data simulated according to a model imitating the genetic and environmental underlying processes. The results show that a spatial model can improve estimates of genetic effects, and that the importance of a spatial model depends on the breeding design. Results from a case study on Brown Swiss cattle are also presented. |
Time/place: Monday, February 24, 2020, 14.15-15.00, S4, sentralbygg 2, Gløshaugen |
---|
Speaker: Jarle Tufto https://www.ntnu.edu/employees/jarle.tufto Title: A quantitative genetic model of the joint evolution of onset of breeding and double brooding liability |
Abstract: In some bird species, females produce a second brood after raising a successful first brood. The proportion of females doing so varies strongly among study populations and years. To understand the adaptive significance of double brooding we consider the joint evolution of double brooding and onset of breeding in a model with resources limited to a finite window in time. Double versus single brooding is modeled as a threshold character. Onset of breeding and the liability of double brooding follows a binormal phenotypic distribution. Depending on the cost of laying two broods versus one and the delay between the first and second brood relative to the width of the resource window and the phenotypic variance of onset of breeding, the adaptive topography may have single or multiple, purely single- or purely double-brooding adaptive peaks. Despite no frequency-depedence, an adaptive peak at an intermediate frequency of double brooding can exist if double brooding has a sufficiently negative phenotypic correlation with onset of breeding. If the location of the resource windows in time fluctuates between years, double-brooding has an additional adaptive value as a conservative bet-hedging strategy. Climate change, producing a linear trend in the location of the resource window towards earlier dates, may select for a reduced frequency of double brooding. An opposite effect is also possible if the additive genetic covariance between the liability and onset of breeding is negative. Finally, the model is discussed in terms of an empirical example. |
Time/place: Monday, January 20, 2020, 14.15-15.00, S4, sentralbygg 2, Gløshaugen |
---|
Speaker: John Paige https://paigejo.wordpress.com Title: Bayesian Modelling of Multi-Scale Spatial Dependence in Non-Gaussian Data |
Abstract: We propose an extension to the popular `LatticeKrig' model in order to handle non-Gaussian data. We implement the model using integrated nested Laplace approximations (INLA), a Bayesian framework for representing latent Gaussian models. We use a reparameterization of the LatticeKrig model to make the parameters and their priors more interpretable. Predictions match well with those of the original LatticeKrig model as well as with a stochastic partial differential equation (SPDE) model in a Gaussian context. Additionally, the proposed model is applied to secondary education prevalence for young women in Kenya, and compared with those of the SPDE model. We show that the ability of the LatticeKrig framework to model covariance functions flexibly–accounting for both short and long range spatial correlations–can improve predictive performance. Additionally, we show that choosing LatticeKrig basis function resolutions individually based on the context can improve predictions compared to the usual LatticeKrig implementation. |
Time/place: Monday, January 13, 2020, 14.15-15.00, S4, sentralbygg 2, Gløshaugen |
---|
Speaker: Ørnulf Borgan https://www.mn.uio.no/math/english/people/aca/borgan/index.html Title: Survival prediction with neural networks |
Abstract: In survival analysis one may want to estimate the survival function of an individual with given covariate values. Such survival prediction is commonly done using Cox’s proportional hazards regression model. An alternative, which gives similar results, is to use a proportional hazards model with piecewise constant baseline hazard. In the talk I will consider flexible extensions of these models which are obtained by replacing the linear predictors by neural nets, and I will demonstrate that this may give improved survival predictions in a number of situations. The talk is based on joint work with Håvard Kvamme and Ida Scheel. |
Fall 2019
Overview
- August 15: Colin Fox, University of Otago, New Zealand.
- September 12: Giovanna Jona Lasinio, University of Rome - Sapienza, Italy.
- October 28: Jacopo Paglia, IMT, NTNU.
- November 4: Ingeborg Hem, IMF, NTNU.
- November 18: Adil Rasheed, ITK, NTNU.
- December 2: Oscar Pizarro, University of Sydney, Australia.
- December 16: Mette Langaas, IMF, NTNU.
Time/place: Monday, December 16, 2019, 14.15-15.00, 13 floor, sentralbygg 2, Gløshaugen |
---|
Speaker: Mette Langaas https://www.ntnu.no/ansatte/mette.langaas Title: The future of teaching statistics |
Abstract: Today, statistics is everywhere in society. It appears in our media, in jobs, and in advertising. Especially, it affects statisticians and non-statisticians alike. At university, we have the same story. Students in a variety of different fields are required to take statistics. Some of these students may go on to become statisticians, while others will only meet statistics in the news or at work. Common jobs are also being “statistically charged”. More than ever, skills in scientific computation and data analysis are highly sought after. Within these jobs, statisticians and the like are required to “think on their feet, solve problems and analyze data” often with state-of-the-art statistical methods. The requirement of being “lifelong learners” is increasing. But educating and learning is also changing. For example, “the death of the lecture” has been proclaimed for many years now. This has resulted in campuses and lecture halls changing to “active learning areas”. At the same time, top ranked universities abroad advertise massive online open (and paid for) courses available to “everyone”. Due to the digitalization of society, need for continued education, and diverse student population, (statistics) teaching at the university level is facing new challenges. In this presentation I will elaborate on some of these challenges and talk about my experience with developing (open) learning materials and implementing and running active learning sessions at the Norwegian University of Science and Technology. Hopefully, the talk will inspire active discussions in the audience. Participation and feedback from (lifelong learners) who are users of statistics is highly welcome! Before you come to the talk: please spend a few minutes (5-10) to fill in this questionnaire (44 a/b-questions) - and take a screen shot of your scores (the four scores will only be shown to you on your screen), this will (hopefully) give you some insight into your own learning style and make observations in my presentation more easy to relate to. https://www.webtools.ncsu.edu/learningstyles |
Time/place: Monday, December 2, 2019, 14.15-15.00, 13 floor, sentralbygg 2, Gløshaugen |
---|
Speaker: Oscar Pizarro https://sydney.edu.au/engineering/about/our-people/academic-staff/oscar-pizarro.html Title: Autonomous Benthic Monitoring in Australia |
Abstract: The Australian Centre for Field Robotics has operated Australia’s Integrated Marine Observing System (IMOS) Autonomous Underwater Vehicle (AUV) facility for over nine years. This facility supports the IMOS benthic monitoring program, which has identified sites on temperate and tropical reefs around Australia for optical imaging once a year or every other year. This observing program capitalizes on the unique capabilities of AUVs that have allowed repeated visits to the reference sites, will providing an observational link between oceanographic and benthic processes. Since 2010 benthic reference sites have been revisited in Western Australia, Tasmania, Queensland and New South Wales in collaboration with research groups from universities and federal and state agencies. We briefly cover the relevant capabilities of the AUV facility, the design of the IMOS benthic sampling program, results from the surveys around Australia, as well as key finding from an end-user based review of operations. We also report on some of the challenges and potential benefits to be realized from a benthic observation system that collects several TB of geo-referenced stereo imagery a year. These includes semi-automated image analysis and classification, visualization and data mining, and change detection and characterisation. We also discuss the design of an enhanced monitoring system that lowers shiptime requirements while increasing reliability, as well as research to improve underwater imaging and simplified survey methods for wider use of these tools. |
Time/place: Monday, November 18, 2019, 14.15-15.00, 13 floor, sentralbygg 2, Gløshaugen |
---|
Speaker: Adil Rasheed https://www.ntnu.no/ansatte/adil.rasheed Title: HAM for BigCyb |
Abstract: In this presentation we introduce Hybrid Analysis and Modeling (HAM) as an enabler for Big Data Cybernetics. HAM approach combines the interpretability, robust foundation and understanding of physics-based models with the accuracy, efficiency, and automatic pattern-identification capabilities of advanced machine learning (ML) and artificial intelligence (AI) algorithms for real-time steering of any physical asset towards a set point using big data. At a time when blackbox ML and AI algorithms are struggling to find large scale acceptability in safety critical engineering applications, it is argued that HAM will be an attractive alternative. |
Time/place: Monday, November 4, 2019, 14.15-15.00, 13 floor, sentralbygg 2, Gløshaugen |
---|
Speaker: Ingeborg Hem https://www.ntnu.no/ansatte/ingeborg.hem Title: Intuitive joint priors for variance parameters |
Abstract: Bayesian hierarchical models with additive latent structures are popular since additivity simplifies interpretation and inference. However, the common choice of independent priors on the variances of the model components result in haphazard a priori control on the total variance and how it is attributed to the model components. We propose a joint prior for the variance parameters that explicitly controls the total variance and how it is distributed to the model components. For latent Gaussian models, we can utilize the penalized complexity prior framework to achieve the desired shrinkage between the random effects in the model, or we can choose to be ignorant through Dirichlet priors. We call the resulting priors hierarchical decomposition (HD) priors. The HD priors have intuitive hyperparameters and are weakly-informative. In the field of plant breeding, geneticists have strong expert knowledge on the relative sizes of the different genetic effects: additive, dominant and epistasis. We demonstrate that we can intuitively incorporate this knowledge in the model using the new HD priors, and we compare the performance of the resulting Bayesian method to using independent priors and maximum-likelihood inference with respect to selecting the best parents for further breeding and estimating heritability. We use a simulated wheat breeding program using the R-package AlphaSimR, and inference with RStan, to investigate the properties of the HD prior. |
Time/place: Monday, October 28, 2019, 14.15-15.00, 13 floor, sentralbygg 2, Gløshaugen |
---|
Speaker: Jacopo Paglia https://www.ntnu.no/ansatte/jacopo.paglia Title: Efficient spatial designs using the Hausdorff distance and Bayesian optimization |
Abstract: We describe spatial experimental design methods for efficient decision support. We use a Bayesian optimization technique to find spatial configurations of data locations that are expected to carry much information. Within this setting, Gaussian process approximations enable efficient calculation of expected improvement for a large number of designs. The Hausdorff distance is used to model the similarity between design configurations, and hence learn across different designs. Bayesian optimization is done iteratively, using only a few evaluations of the full-scale design criterion. The applications are related to natural resources, and it makes sense to use the decision theoretic notion of value of information as a design criterion. We study properties of the design algorithm in a synthetic -example and in two real-world examples from forestry conservation and petroleum drilling operations. |
Time/place: Thursday, September 12, 2019, 14.15-15.00, 734, sentralbygg 2, Gløshaugen |
---|
Speaker: Giovanna Jona Lasinio https://www.dss.uniroma1.it/it/dipartimento/persone/jona-lasinio-giovanna Title: Modeling and analyzing Multivariate Climate data. An Italian case study. |
Abstract: This talk summarizes the work we are carrying on a large database of climate related recording. In particular Monthly recording, from January 1950 to December 2010, over 360 stations of: Precipitation,Minimum temperature and Maximum temperature for a total of 259200 trivariate observations. Our aim has been to model jointly these series to obtain several results (The Annals of Applied Statistics Volume 13, Number 2 (June 2019), 797-823). We introduce a Bayesian multivariate hierarchical framework to estimate a space-time model for the joint series of monthly recordings. Model components account for spatio-temporal correlation and annual cycles, dependence on covariates and between responses. Spatio-temporal dependence is modeled by the nearest neighbor Gaussian process (GP), response multivariate dependencies are represented by the linear model of coregionalization and effects of annual cycles are included by a circular representation of time. The proposed approach allows imputation of missing values and interpolation of climate surfaces at the national level. It also provides a characterization of the so called Italian ecoregions, namely broad and discrete ecologically homogeneous areas of similar potential as regards the climate, physiography, hydrography, vegetation and wildlife. Currently, on the same data we are building models to detect change-point, again we are interested in joint changes as well as in specific series changes. We are still working in a Bayesian framework. |
Time/place: Thursday, August 15, 2019, 14.15-15.00, 734, sentralbygg 2, Gløshaugen |
---|
Speaker: Colin Fox https://www.otago.ac.nz/physics/staff/ColinFox.html Title: Bayes-Optimal Filtering in the Tensor Train Format |
Abstract: Optimal sequential Bayesian inference, or filtering, for the state of a dynamical system requires solving a partial differential equation (PDE). Representing density functions by an interpolated tensor train decomposition overcomes the curse or dimensionality, and admits a PDE solver that gives estimates that converge to the optimal continuous-time values. More generally, representing PDFs in the tensor train format enables sampling and inference in multi-variate statistical settings, more efficiently than existing MCMC methods. |
Spring 2019
The statistics seminars for the 2019 spring semester is on Mondays at 14.15-15.00, mainly in room 656, 6.etg, sentralbygg2, Gløshaugen.
Overview
- January 21: PhD defence of Xin Luo (trial lecture 10.15-11 and dissertation discussion at 13.15)
- February 4: Stian Lydersen (RKBU NTNU) and Lars Wichstrøm (NTNU, Department of Psychology)
- April 8: Martin Singull (Lindköping University) on "Small Area Estimation under a Multivariate Linear Model for Repeated measures Data"
- April 29: Bert van der Veen (NTNU/IMF and NBIO) on "Latent variable models in vegetation science"
- May 13: Damiano Varagnolo, Department of Enigneering Cybernetics (NTNU): Stein unbiased risk estimators for tuning hyperparameters of distributed regression algorithms
- June 3: Marcelo Hartmann (University of Helsinki, Finland) on "Laplace approximation in multi-type data modelling with multivariate Gaussian process regression and applications".
- June 7: PhD defence of Mohammed Assam Chaudry (trial lecture 10.15-11 and dissertation discussion at 13.15. Both in G144, Rådsrommet, Electro Building, Gløshaugen. Topic for the trial lecture is "DOE in process optimization" and the title of the dissertation is "Robustness of screening designs with a small number of runs".
- June 11 (Tuesday): Ole Jakob Mengshoel, NTNU Open AI lab https://www.ntnu.edu/employees/ole.j.mengshoel.
- June 12 (Wednesday): Rasmus Erlemann, NTNU.
- June 25: Jelle Goeman, Professor of Biostatistics, Leiden University Medical Center, publications
- June 25: Håkon Gjessign, Universitet i Bergen og Folkehelseinstituttet.
- June 26: PhD defence of Thea Bjørnland (trial lecture 10.15-11 and dissertation discussion at 13.15)
- June 27: Giovanni Parmigiani.
- June 28: PhD defence of Ioannis Verdaxis (trial lecture 10.15-11 and dissertation discussion at 13.15)
- Planned for autumn semester: Adil Rasheed, Department of Engineering Cybernetics (NTNU)
Time/place: Monday, February 4, 2019, 14.15-15.00, room S23, 2. etg, sentralbygg 2, Gløshaugen Map |
---|
Speaker: Stian Lydersen (RKBU NTNU) and Lars Wichstrøm (NTNU, Department of Psychology) Title: Analysis of blind tasting of wine |
Abstract: In April 2018 , Lars Wichstrøm arranged a blind tasting of 33 Chablis wines for 14 persons who have wine as a major hobby. For each wine, the raters gave a score on a scale from 50 (undrinkable) to 100 (perfect). Further, they classified the wine in terms of Premier cru versus Grand cru, vintage, and producer. We will present some results from this wine tasting: What is the overall agreement between the raters? To which degree are the raters able to distinguish between Premier cru and Grand cru, identify producer, and identify vintage? Reference: Olkin, I., Lou, Y., Stokes, L., & Cao, J. (2015). Analyses of Wine-Tasting Data: A Tutorial. Journal of Wine Economics, 10(1), 4-30. doi:10.1017/jwe.2014.26 |
Time/place: Monday, March 4, 2019, 14.15-15.00, sentralbygg 2, Gløshaugen |
---|
Speaker: Susan Anyosa and Pål Vegard Johnsen (NTNU) Title: Presentation of PhD research plans |
Abstract: : Susan Anyosa on "Spatio-temporal statistics and data analytics for Earth science applications" and Pål Vegard Johnsen on "Statistical inference and learning in genomics, with application to sepsis". |
Time/place: Monday, March 18, 2019, 14.15-15.00, sentralbygg 2, Gløshaugen |
---|
Speaker: Michail Spitieris and Jorge S. Parada (NTNU) Title: Presentation of PhD research plans |
Abstract: : Michail Spitieris on "Physics-informed Statistical Machine Learning for sensor-acquired big data" and Jorge S. Parada on "Bayesian hierarchical spatiotemporal modeling for citizen science data in biodiversity". |
Time/place: Monday, April 8, 2019, 14.15-15.00, sentralbygg 2, Gløshaugen |
---|
Speaker: Martin Singull Linköping University Title: Small Area Estimation under a Multivariate Linear Model for Repeated measures Data |
Abstract: This talk considers Small Area Estimation with a main focus on estimation and prediction for repeated measures data. The demand of small area statistics is for both cross-sectional and repeated measures data. For instance, small area estimates for repeated measures data may be useful for public policy makers for different purposes such as funds allocation, new educational or health programs, etc, where decision makers might be interested in the trend of estimates for a specific characteristic of interest for a given category of the target population as a basis of their planning. A multivariate linear model for repeated measures data is formulated under small area estimation settings. The estimation of model parameters is discussed within a likelihood based approach, the prediction of random effects and the prediction of small area means across time points, per group units and for all time points are obtained. In particular, as an application of the proposed model, an empirical study is conducted to produce district level estimates of beans in Rwanda during agricultural seasons 2014 which comprise two varieties, bush beans and climbing beans. |
Time/place: Tuesday, April 9, 2019, 13.15-14.00, Room S24, 2. etg., Sentralbygg 2, Gløshaugen |
---|
Speaker: Nikolaos Limnios, Université de Technologie de Compiègne, France Title: Diffusion approximation of Branching processes in Fixed and Random environment |
Abstract: |
Time/place: Monday, April 29, 2019, 14.15-15.00, S24, Sentralbygg 2, Gløshaugen |
---|
Speaker: Bert van der Veen (NTNU/IMF and NBIO) Title: Latent variable models in vegetation science |
Abstract: Ecological communities exist of interacting species that each exploit a part of limited resources such as space or sunlight. The resources a species is capable of occupying is referred to as the theoretical niche. Limitations due to e.g. competition between species determine the space a species occupies in reality, referred to as the realized niche. Multiple biological models have been suggested for the distribution of species abundance in niche space, one of which is the species packing model. The species packing model predicts species abundances to follow a Gaussian response model. Commonly, community ecologists relate species abundances at locations to the species packing model by using dimension-reduction techniques, such as Correspondence Analysis. Correspondence Analysis provides an approximate solution to the species packing model, but doesn’t estimate all parameters, doesn’t provide confidence intervals, places limiting assumptions and provides an approximate maximum likelihood solution in a best case scenario. In this presentation, I’ll outline some of the research I’ve done for the first publication in my PhD, which focusses on fitting the species packing model using GLLVMs. GLLVMs can be understood as a model-based dimension-reduction technique, operating in the GLM-GLMM framework. |
Time/place: Monday, May 13 , 2019, 14.15-15.00, S24, Sentralbygg 2, Gløshaugen |
---|
Speaker: Damiano Varagnolo, Department of Engineering Cybernetics (NTNU) Title: Stein unbiased risk estimators for tuning hyperparameters of distributed regression algorithms |
Abstract: We will start with discussing Stein's lemma, its usefulness for deriving the so-called Stein Unbiased Risk Estimator (SURE), and its centrality in deriving strategies to perform model selection operations. We will then discuss how these model selection approaches may be used to automate the tuning of the hyperparameters of distributed regression algorithms that are based on average-consensus information exchange schemes. |
Time/place: Monday, June 3, 2019, 13.15-14.00, 734, Sentralbygg 2, Gløshaugen |
---|
Speaker: Marcelo Hartmann (University of Helsinki, Finland) Title: Laplace approximation in multi-type data modelling with multivariate Gaussian process regression and applications |
Abstract: With the fast pace development of data management softwares (e.g. GIS) a great variety of practical applications can provide rich databases which are fraught with different data types (e.g. discrete, continuous). Statistical data analysis of multiple data types often require multivariate probabilistic models. However, in this case, the choice of the multivariate probabilistic model may be difficult due to this mixture of types. An alternative way to circumvent this issue is to formulate the model from the hierarchical viewpoint. In this way, well-known probabilistic models can still be used and dependency between data types can be introduced in the second layer of the hierarchy. In this presentation we will introduce a hierarchical model where the vector of predictor functions (in the sense of generalised linear models) is assumed to follow a multivariate Gaussian process. Statistical inference over the vector of predictor functions is approached by the means of the Bayes' rule with the Laplace approximation. These ideas have been motivated by applications in quantitative ecology and species distribution modelling. Some examples are presented. |
Time/place: Tuesday, June 11, 2019, 14.15-15.00, S24, Sentralbygg 2, Gløshaugen |
---|
Speaker: Ole Jakob Mengshoel (NTNU Open AI lab) Title: Stochastic Local Search with Applications |
Abstract: Stochastic local search (SLS) algorithms have proven to be very competitive in solving several computationally hard problems in artificial intelligence, machine learning, and signal processing. These algorithms perform well and can also be analyzed using Markov chains; this analysis brings key insight into their performance under varying conditions. This talk investigates the foundations of SLS algorithms along with some applications. SLS algorithms employ search operators including the following: greedy, noise, and restart. We study MarkovSLS, an SLS variant, theoretically as well as in experiments. For MarkovSLS, two special cases are considered: SoftSLS and AdaptiveSLS. In SoftSLS, the probability parameters are fixed, enabling analysis using standard homogeneous Markov chains. Experimentally, we investigate the dependency of SoftSLS's performance on its noise and restart parameters. AdaptiveSLS dynamically adjusts its noise and restart parameters during search. Experimentally, on synthetic and feature selection problems, we compare AdaptiveSLS with other algorithms including an analytically optimized version of SoftSLS, and find that it performs very well while not requiring prior knowledge of the search space. We motivate and demonstrate the approach by focusing on computing the most probable explanations in Bayesian networks as well as applications in machine learning. Short Bio: Dr. Ole Jakob Mengshoel is a professor in Computer Science at the Norwegian University of Science and Technology (Trondheim, Norway) and an adjunct faculty in Electrical and Computer Engineering at Carnegie Mellon University (Moffett Field, USA). He is currently the Head of the Norwegian Open AI Lab. His current research focus is artificial intelligence and in particular on machine learning, stochastic optimization, inference, and decision support under uncertainty - often using Bayesian networks - with applications. Application areas include aerospace, biology and medicine, earth science, networks, recommender systems, smart communities, and sustainability. Additional research interests include stochastic optimization, evolutionary algorithms, resource allocation and scheduling, real-time systems, intelligent user interfaces, and visual analytics. Dr. Mengshoel has managed and provided hands-on leadership in a wide range of research and development projects. Working with organizations such as Boeing, NASA, Rockwell Automation, and Rockwell Collins, he has successfully developed new technologies and software that have or are being matured and transitioned into diverse sectors. Dr. Mengshoel has published over 100 articles and papers in journals and conferences, and has 4 U.S. patents. He holds a Ph.D. in Computer Science from the University of Illinois, Urbana-Champaign. His undergraduate degree is in Computer Science from the Norwegian Institute of Technology, Norway (now NTNU). Prior to CMU and NTNU, he was a senior research scientist and research area lead with USRA/RIACS in the Intelligent Systems Division at the NASA Ames Research Center. Before that, he was a research scientist in the Knowledge-Based Systems Group at SINTEF and in the Decision Sciences Group at the Rockwell Science Center and Rockwell Scientific (now Teledyne Scientific and Imaging). |
Time/place: Wednesday, June 12, 2019, 14.15, Room 734, 7. etg, Sentralbygg 2, Gløshaugen |
---|
Speaker: Rasmus Erlemann (NTNU) Title: Algorithm for Simulating from Conditional Distributions |
Abstract: Simulations play an important role in wide range of subjects. There are effective algorithms for different cases, such as Monte Carlo methods, rejection sampling, inverse transform sampling etc. In the case of conditional distributions, we need a different method, as the analytical expression is generally not available. We review and complement a general approach for Monte Carlo computations of simulations, given a statistic value. The core concept is about defining a function and finding the corresponding artificial conditional distribution to it. Next, we draw samples from that distribution using the Metropolis-Hastings algorithm and apply the function to the samples. Resulting sample ends up being from the desired conditional distribution. The method is illustrated by examples of uniform, normal, Gamma and inverse Gaussian distributions. Drawing samples from a conditional distribution is widely used in goodness of fit tests. This is demonstrated by a real life example, in which case the method can be applied for deciding if a model is appropriate for a data set. |
Time/place: Tuesday, June 25 , 2019, 14.15, Room 734, 7. etg, Sentralbygg 2, Gløshaugen |
---|
Speaker: Jelle Goeman, Professor of Biostatistics, Leiden University Medical Center, publications Title: Robust testing in generalized linear models by sign-flipping score contributions |
Abstract: Generalized linear models are often misspecified due to overdispersion, heteroscedasticity and ignored nuisance variables. Existing quasi-likelihood methods for testing in misspecified models often do not provide satisfactory type-I error rate control. We present a novel semi-parametric test, based on sign-flipping individual score contributions that is proven to be robust against variance misspecification. When nuisance parameters are estimated, our basic test becomes conservative. We show how to take nuisance estimation into account to obtain an asymptotically exact test. The speed of convergence can be further accelerated considering a particular transformation of the contributions of the score that makes them independent. With this transformation, the test shows excellent control of the first type error, even for very low sample size. In many cases the quality of the control is higher than the one provided by the parametric score test when it assumes the right distribution (which is always unknown in practice, however). This is the case, for example of binomial logistic regression with very correlated nuisances parameters. As a consequence, in many cases, also the power is enhanced. Confidence Intervals are often difficult to define within the conditional resampling approach, while here they become a natural extension. A key point of this method is that – by the sign-flipping strategy – the p-value is computed without the need to estimate the Fisher Information Matrix. The simulations show a better type-I error control than its competitors (e.g. GEE with sandwich estimator of the variance) in many scenarios. This advantage is further magnified when the method is extended to the multi-dimensional and even high-dimensional setting. In this case the whole dependence among test statistics is dealt within the conditional approach without the need to estimate it, hence providing a multidimensional approach which is robust and asymptotically exact. |
Time/place: Tuesday, June 25 , 2019, 15.15, room 734, Sentralbygg 2, Gløshaugen |
---|
Speaker: Håkon Gjessing (University of Bergen, Norwegian Institute of Public Health Title: Joint modeling of fetal growth and time-to-birth |
Abstract: From around week 24 of pregnancy, the weight of a human fetus is frequently estimated using ultrasound. However, very few children get born that early, and those who do are pathological and may not represent normal children very well. Ultrasound measurements can be seen as irregular measurements of an underlying growth process, where only the weight at birth is directly observed. At the same time, birth can be seen as a time-to-event outcome. How can a weight estimation model (fetal growth model) be developed and properly calibrated to represent children in normal pregnancies? |
Time/place: Thursday, June 27 , 2019, 14.15, room 734, Sentralbygg 2, Gløshaugen |
---|
Speaker: Giovanni Parmigiani (Dana Farber Cancer Institute and Harvard T.H. Chan School of Public Health) Title: Training replicable predictors in multiple studies |
Abstract: This lecture considers replicability of the performance of predictors across studies. We suggest a general approach to investigating this issue, based on ensembles of prediction models trained on different studies. We quantify how the common practice of training on a single study accounts in part for the observed challenges in replicability of prediction performance. We also investigate whether ensembles of predictors trained on multiple studies can be combined, using unique criteria, to design robust ensemble learners trained upfront to incorporate replicability into different contexts and populations. In a linear regression setting, we show analytically and confirm via simulation that merging yields lower prediction error than cross-study learning when the predictor-outcome relationships are relatively homogeneous across studies. However, as heterogeneity increases, there exists a transition point beyond which cross-study learning outperforms merging. We provide analytic expressions for the transition point in various scenarios and study asymptotic properties. Joint work with Zoe Guan and Prasad Patil |
Fall 2018
Overview
- August 20: start-up internal seminar for the statistics group
- September 17 at 14.15-15.00 in 656: Benjamin Dunn, Kavlisentret, NTNU.
- September 24: Rafael Sauter, University of Oxford
- September 28-29: Trondheim Symposium in Statistics at Baardshaug
- October 2 at 14.15-15.00: Stephanie Muff, research talk for Statistics position
- October 4 at 14.15-15.00: Geir Arne Fuglstad, research talk for Statistics position
- October 5 at 12.15-13.00: "Statistics as a foreign language" presentation by Kathrine Frey Frøslie at Matematiske perler.
- October 15: Trygve O. Fossum
- October 29: Ingelin Steinsland
- November 12: Erlend Aune
- November 26: Jacopo Paglia
- December 3: PhD defence by Jacob Skauvold (trial lecture 10.15-11 and dissertation discussion at 13.15)
- December 10: Xmas celebration - with gløgg/pepperkaker, short journal group session (15 min) and Kahoot! All seminar participants welcome!
Details about fall 2018 seminars
Time/place: Monday, November 26, 2018, 14.15-15.00, room 656, sentralbygg 2, Gløshaugen |
---|
Speaker: Jacopo Paglia (NTNU) Title: Statistical modeling for real time pore pressure prediction and mud weight window assessment. |
Abstract: Pore pressure is defined as the pressure of the fluids in the porous spaces of the rock and its prediction is an important part in subsurface modeling. Accurate pore pressure prediction helps avoid drilling risks as it allows improved tuning of the mud weights to avoid kicks, and to reduce drilling costs. We focus on pore pressure prediction in an over-pressured area near a well, since it is where disastrous drilling accidents can happen. Prior understanding of pore pressure is available from a 3D geological model for pressure build-up and release using a basin modeling approach, and this is updated in real-time based on well measurements. The presented approach for real time prediction of pore pressure is based on a fitted Gaussian prior distribution. The likelihood model is a Gaussian distribution with non-linear expected values. A Sequential Bayesian method, similar to an extended Kalman filter, is used to conduct real time updating of pore pressure at various locations in the subsurface. Real-time updating entails that we automatically update the pore pressure distribution in every spatial direction (north, east, depth), near the well. Using a Gaussian approximation based on a linearization of the likelihood, the sequential updating of data leads to a new Gaussian posterior distribution at each data assimilation time. Since prediction of pore pressure is commonly applied to make drilling decisions, the reduction in the uncertainty plays an important role here. We analyze this problem in the context of improved decision making related to drilling mud weight, and study whether more data would help make better drilling decisions. |
Time/place: Monday, November 12, 2018, 14.15-15.00, room 656, sentralbygg 2, Gløshaugen |
---|
Speaker: Erlend Aune, Exabel and NTNU Title: Active One-Shot Learning using Memory Augmented Neural Networks |
Abstract: Training neural networks with can be notoriously data intensive, requiring tens of thousands, or even million of labeled examples for a model to achieve the desired performance. For challenges where we have access to a large amount of unlabeled data (i.e. the response variable is unobserved), Active Learning is an important tool. A related topic is one-shot learning. Here the challenge is recognizing the class of an observation after observing only one (or a few) labeled examples of a given class. In this seminar I will give a brief introduction to pool-based Active Learning and streaming Active Learning. Following this, I will motivate and present a model where we use memory augmented neural network for streaming one-shot Active Learning. This model achieves better performance than a baseline model without access to explicit memory. The work on Active One-Shot Learning using Memory Augmented Networks is a collaboration between myself, Andreas Kvistad, Massimiliano Ruocco and Eliezer de Souza da Silva. |
Time/place: Monday, October 29, 2018, 14.15-15.00, room 656, sentralbygg 2, Gløshaugen |
---|
Speaker: Ingelin Steinsland (NTNU) Title: Effects of uncertainty in hydrological modelling – and the story behind |
Abstract: In this seminar I will present the results of the paper 'Effects of uncertainty in hydrological modelling', but also tell some of the story behind it. In 2008 Statkraft (the largest hydro-power producer in Norway) initiated a large research project together with SINTEF Energy on better hydrological modelling and quantification of uncertainty. This paper can be seen as an attempt to answer some of the new questions and challenges that arose when aiming to quantify uncertainty in daily hydrological predictions for hydro-power production. The challenges involve uncertainty and dependency in input variables, observations, model parameters in the physical based hydrological models as well as model inadequacy. In this paper the focus is on uncertainty and poor quality of observations, both used for input (observations used as explanatory variables) and for training the model (observations used as response). Paper Effects of uncertainties in hydrological modelling. A case study of a mountainous catchment in Southern Norway (Journal of Hydrology, 2016). In this study, we explore the effect of uncertainty and poor observation quality on hydrological model calibration and predictions. The Osali catchment in Western Norway was selected as case study and an elevation distributed HBV-model was used. We systematically evaluated the effect of accounting for uncertainty in parameters, precipitation input, temperature input and streamflow observations. For precipitation and temperature we accounted for the interpolation uncertainty, and for streamflow we accounted for rating curve uncertainty. Further, the effects of poorer quality of precipitation input and streamflow observations were explored. Less information about precipitation was obtained by excluding the nearest precipitation station from the analysis, while reduced information about the streamflow was obtained by omitting the highest and lowest streamflow observations when estimating the rating curve. The results showed that including uncertainty in the precipitation and temperature inputs has a negligible effect on the posterior distribution of parameters and for the Nash–Sutcliffe (NS) efficiency for the predicted flows, while the reliability and the continuous rank probability score (CRPS) improves. Less information in precipitation input resulted in a shift in the water balance parameter Pcorr, a model producing smoother streamflow predictions, giving poorer NS and CRPS, but higher reliability. The effect of calibrating the hydrological model using streamflow observations based on different rating curves is mainly seen as variability in the water balance parameter Pcorr. When evaluating predictions, the best evaluation scores were not achieved for the rating curve used for calibration, but for rating curves giving smoother streamflow observations. Less information in streamflow influenced the water balance parameter Pcorr, and increased the spread in evaluation scores by giving both better and worse scores. |
Time/place: Monday, October 15, 2018, 14.15-15.00, room 656, sentralbygg 2, Gløshaugen |
---|
Speaker: Trygve O. Fossum (NTNU) Title: Searching for information at Sea: Exploring data-driven sampling using marine robotics |
Abstract: Finding high-value locations for data collection is of great importance in ocean science, where diverse biological, chemical and physical factors interact to create dynamically evolving features. Generally for the ocean, it is not possible to examine the entire environment in detail, and only limited coverage is possible. This is the sampling conundrum in oceanography and the lack of sufficient observations is the largest source of error in our understanding of the ocean, making when and where to sample/measure the key problem when designing oceanographic experiments. The advent of marine robotic platforms, especially autonomous underwater vehicles (AUVs), has allowed scientists to explore data collection in new ways. Through the capacity of autonomy and data-driven sampling, AUVs can cover larger areas, various depths, remote locations, for longer periods of time, towards collecting data with high information value. These capabilities have made robotics a crucial part of the way forward in modern ocean science. In this talk, we shall consider addressing the question of using statistics in oceanic robotic sampling. As a special example we will look at the use of excursion probabilities for identifying informative regions such as interfaces/boundary layers between water masses, river outlets, and other heterogeneities in the ocean. This is currently ongoing work between the marine technology, the applied underwater robotics laboratory (AURLab), and the math department at NTNU. Note that, the work is application based, and full scale tests are planned in the Trondheim fjord using an AUV. Homepage: https://sites.google.com/view/research-trygve |
Time/place: Monday, September 17, 2018, 14.15-15.00, room 656, sentralbygg 2, Gløshaugen |
---|
Speaker: Benjamin A. Dunn (NTNU) Title: Hidden features in data |
Abstract:With continued advances in high-throughput recordings, researchers are turning to statistical methods to interpret data. Regardless of how seemingly big the data is, however, what remains hidden is typically much bigger. In this talk, I will discuss some of the ways we have sought to understand and correct for the hidden part. First, building on the framework of the generalized linear model, we found approximate methods for systems with either weak connectivity or few hidden units. Noting the difficulty of the important case of large, strongly connected systems, we turned to methods from topological data analysis to characterize what the observed data can capture of the underlying state space. The resulting toolset is both interesting and useful in revealing previously known and unknown features from neural recordings of less than 0.00001% of the mouse brain. |
Time/place: Monday, September 24, 2018, 14.15-15.00, room 656, sentralbygg 2, Gløshaugen |
---|
Speaker: Rafael Sauter (University of Oxford) Title: A joint Bayesian hierarchical model for trial and historical survey data to estimate HIV prevalence and care cascade in HPTN071 PopART trial communities |
Abstract: The HPTN 071 (PopART) trial is a three-arm cluster-randomised trial in 12 communities in Zambia and 9 in South Africa, measuring the impact of a combination prevention intervention including universal testing and treatment (UTT) on population-level HIV incidence. Mathematical modelling informed the trial planning and was further developed into an efficient stochastic individual-based simulation model (IBM) of heterosexual HIV transmission in a generalised epidemic. Historical prevalence data from different surveys, as well as community-specific trial data collected during the ongoing intervention are used to calibrate the IBM projections. The exhaustive usage of historical and trial information is crucial to generate accurate community-specific HIV prevalence estimates, to which IBM simulations are calibrated. A Bayesian hierarchical model, which jointly models historical and trial data and which accounts for dependencies across age, sex, time, regions and for dependencies across the HIV care cascade, may be suited to augment the information incorporated in the IBM. Predictions from the Bayesian statistical model are used to calibrate the IBM simulations with Approximate Bayesian computation (ABC). |
Spring 2018
Time/place: Monday, June 4, 2018, 14.15-15.00, room 1329, sentralbygg 2, Gløshaugen |
---|
Speaker: Yihan Cao (NTNU) Title: Estimating the variation and autocorrelation of phenotypic selection on great tit |
Abstract: Studies of phenotypic selection generates important insights of mechanisms and patterns governing the direction and speed of evolutionary change. To enrich the toolbox of quantifying the power of phenotypic selection acting on adaptive traits, I will introduce state-space models (SSMs) and generalized linear mixed models (GLMMs) to analyse the phenotypic selection process and emphasize the merits of Template Model builder (TMB) for model fitting. I will show how our methods make the accurate estimation of the variation and autocorrelation of phenotypic selection possible with a long-term great tit dataset from the Netherlands. |
Time/place: Monday, May 28, 2018, 14.15-15.00, room 1329, sentralbygg 2, Gløshaugen |
---|
Speaker: Emily Simmons (NTNU) Title: Exploring the causes and consequences of phenological change in wild birds |
Abstract: Changes in climate shape biological populations. They can alter spatial distributions, the timing of life history events, and even the species themselves. I will present my PhD work looking at one of these climate responses, temporal changes in life history events (phenology). I explore the causes and population level consequences of change in breeding phenology of a wild bird population from Wytham Woods, UK. This talk presents an example of applied ecological statistics from data collection to predictive population modelling. |
Time/place: May 24 at 13.15-14.00 in 656, sentralbygg 2 |
---|
Speaker: Andrea Riebler (NTNU) Title: Space-time modeling of under-5 mortality in a developing world context |
Abstract: In much of the developing world, there is limited vital registration, and estimates of under-5 mortality are based on household sample surveys with complex design. This talk presents a new Bayesian continuous space/discrete time model that acknowledges the complex survey design by including urban/rural stratum effects. A key component of the model is to use an offset to adjust for HIV epidemics. The methodology will be illustrated by producing yearly subnational estimates in Kenya for the period 1980-2014 using data from the Demographic and Health Surveys (DHS). |
Time/place: Thursday, April 26, 2018, 13.15-14.00, S21, 2.etg (across from Smia), sentralbygg 2, Gløshaugen |
---|
Speaker: Jacob Skauvold (NTNU) Title: A particle filter with equal weights |
Abstract: Particle filters are a class of algorithms which use weighted ensembles to represent probability distributions. They are commonly used for data assimilation in applications involving high-dimensional dynamical systems, such as simulations of the atmosphere and ocean. The main drawback of particle filters is that with each iteration the distribution of weights tends to become increasingly concentrated on an ever smaller subset of the ensemble, leading to poor estimates and wasted computational effort. We describe a variant of the particle filter which avoids this behavior by forcing all weights to be equal. |
Time/place: Monday, April 16, 2018, 14.15-15.00, room 1329, sentralbygg 2, Gløshaugen |
---|
Speaker: Michael Gineste (NTNU) Title: Seismic inversion using waveform data and Ensemble Kalman Filter |
Abstract: Using seismic waveform data for the purpose of recovering an image of subsurface elastic parameters (acoustic- and shear wave velocities and density) has a long history. Nowadays industries are concerned with obtaining an uncertainty estimate associated with the predicted subsurface image and for this reason the problem is cast as a probabilistic inverse problem. This talk will demonstrate how the Ensemble Kalman Filter framework is utilised to sequentially process the large amount of observations into a posterior for the parameters. And discuss some of the methodological strengths and weaknesses. |
Time/place: Monday, March 12, 2018, 14.15-15.00, room 1329, sentralbygg 2, Gløshaugen |
---|
Speaker: Nikolai Ushakov (IMF, NTNU) Title: Recovering information lost due to rounding |
Abstract: Data for a statistical analysis are always given in a rounded form, so they contain both a random error and a rounding error. The rounding errors are especially serious and must not be ignored when the sample size is large. These days, huge data sets become more and more usual due to the rapid development of the computer technologies, therefore there is a growing interest to statistical analysis of rounded data. In this work we consider situations when data for statistical analysis are given in a rounded form, and the rounding error (the discretization step) is comparable or even greater than the measurement errors. We study possibilities to achieve accuracy much higher than the discretization step and to recover information lost due to rounding. The main tool for solving this problem is the use of additional measurement errors. |
Time/place: Monday, March 5, 2018, 14.15-15.00, room 656, sentralbygg 2, Gløshaugen |
---|
Speaker: Ingeborg Hem, Rasmus Erlemann, Ole Bernhard Forberg Title: Internal seminar: presentation of PhD research plans |
Abstract: 14.15: Ingeborg Hem on "Robustify Bayesian inference in practice using penalized complexity priors. 14.30: Rasmus Erlemann on "Mathematical Statistics". 14.45: Ole Bernhard Forberg on "Fluid Prediction From Time-lapse Seismic AVO Data". The PhD students present their research plan, and the participants may ask questions. |
Time/place: Monday, February 26, 2018, 14.15-15.00, room 1329, sentralbygg 2, Gløshaugen |
---|
Speaker: Tommy Jørstad (Gexcon Consulting) Title: Managing accident risk |
Abstract: Tommy Stokmo Jørstad, Risk Consultant at Gexcon Consulting, talks about accident risk management. An introduction to the work being done to prevent major accidents from happening at high-risk facilities. Also, some reflections on why statistics was useful when starting out in this field, and why it will be vital going forward. |
This talk is well suited for students at the master level in statistics (year 4+5). |
Time/place: Monday, February 19, 2018, 14.15-15.00, room 1329, sentralbygg 2, Gløshaugen |
---|
Speaker: Jørn Vatn (NTNU) Title: Probabilistic degradation modelling of railway tracks |
Abstract: |
Time/place: Monday, February 12, 2018, 14.15-15.00, room 1329, sentralbygg 2, Gløshaugen |
---|
Speaker: Hans Julius Skaug (UiB) Title: Statistics and Automatic Differentiation in Template Model Builder |
Abstract: I will argue that Automatic Differentiation (AD) is a useful technique in Statistics. One reason is that it makes the Laplace approximation “automatic” from a user perspective. An R package that makes AD easily available (but via C++ code) is TMB (Template Model Builder). TMB can be used as a platform for developing other mixed-model R-packages, and will use as an example “glmmTMB” for fitting Overdispersed and zero-inflated mixed models. I will show how a SPDE precision matrix can be imported from R-INLA into TMB, hence combining the flexible mesh generation tools in R-INLA with TMB’s flexibility in likelihood formulation. |
Time/place: Friday, February 9, 2018, 14.15-15.00, room 1329, sentralbygg 2, Gløshaugen |
---|
Speaker: Kjell Doksum (University of Wisconsin) A comparison of ensemble methods for high dimensional data |
Abstract: We consider a high dimensional regression framework where the number of predictors (p) exceed the number of subjects (n). Recent work in high dimensional regression analysis has embraced an ensemble approach that consists of selecting random subsets with p or fewer than p predictors, doing statistical analysis on each subset, and then merging the results from the subsets. We examine condition under which penalty methods such as lasso perform better when used in the ensemble approach by computing mean squared estimation and prediction errors for simulations and a real data example. Both random and fixed designs are considered. We find that the ensemble approach improves on penalty methods in the random design case when sparsity decrease. |
Short biography: Dr Kjell Doksum is Senior Scientist in the Statistics Department at the University of Wisconsin, Madison, and he is Emeritus Professor in the Statistics Department at the University of California, Berkeley. He has held visiting positions at the L’Universite de Paris, University of Oslo, the Norwegian Institute of Technology in Trondheim, Harvard University, Harvard Medical School, Columbia University, Bank of Japan, Hitotsubashi University in Tokyo, and Stanford University. He is a Fellow of the Institute of Mathematical Statistics and of the American Statistical Association, as well as an elected member of the International Statistical Institute and the Royal Norwegian Society of Sciences and Letters. His research focuses on statistical theory and modeling. It includes inference for nonparametric regression and correlation curves, global measures of association in semiparametric and nonparametric settings, estimation of regression quantiles, Bayesian nonparametric inference, and high dimensional data analysis. Applications include statistical modeling of HIV data, and the analysis of financial data. Kjell Doksum is the co-author with Peter Bickel of the book “Mathematical Statistics: Basic Concepts and Selected Topics. Volumes I and II “, CRC Press. |
Time/place: Monday, February 5, 2018, 14.15-15.00, room 1329, sentralbygg 2, Gløshaugen |
---|
Speaker: Jacob Laading (DNB and IMF,NTNU) Models in finance - a curse or a blessing |
Abstract: Jacob Laading, Head of Integrated Risk Management in DNB and associate professor (II) at IMF talks about the history and current use of statistical methods in the financial industry. Particular emphasis is given to the lessons from the financial crisis, regulatory issues relating to models and the growth of models as the industry turns increasingly digital. |
This talk is well suited for students at the master level in statistics (year 4+5). |
Time/place: Monday, January 22, 2018, 14.15-15.00, room 1329, sentralbygg 2, Gløshaugen |
---|
Speaker: Magne Aldrin (Norsk Regnesentral, Oslo) Title: Estimation of climate sensitivity |
Abstract: Predictions of climate change are uncertain mainly because of uncertainties in the emissions of greenhouse gases and how sensitive the climate is to changes in the abundance of the atmospheric constituents. The equilibrium climate sensitivity is defined as the temperature increase because of a doubling of the CO2 concentration in the atmosphere when the climate reaches a new steady state. CO2 is only one out of the several external factors that affect the global temperature, called radiative forcing mechanisms as a collective term. I will present a model framework for estimating the climate sensitivity. The core of the model is a simple, deterministic climate model based on elementary physical laws such as energy balance. It models yearly hemispheric surface temperature and global ocean heat content as a function of historical radiative forcing. This deterministic model is combined with an empirical, stochastic model and fitted to observations on global temperature and ocean heat content, conditioned on estimates of historical radiative forcing. We use a Bayesian framework, with informative priors on a subset of the parameters and flat priors on the climate sensitivity and the remaining parameters. The model is estimated by Markov Chain Monte Carlo techniques. |
References: Aldrin, M., Holden, M., Guttorp, P., Skeie, R.B., Myhre, G. and Berntsen, T.K. (2012). Bayesian estimation of climate sensitivity based on a simple climate model fitted to observations of hemispheric temperatures an global ocean heat content. Environmetrics, vol. 23, p. 253-271. Skeie, R.B., Berntsen, T., Aldrin, M., Holden, M. and Myhre, M. (2014). A lower and more constrained estimate of climate sensitivity using updated observations and detailed radiative forcing time series. Earth System Dynamics, vol. 5, p. 139-175. |
Time/place: Monday, January 15, 2018, 14.15-15.00, room 1329 (left), sentralbygg 2, Gløshaugen |
---|
Speaker: Gunnar Taraldsen (IMF, NTNU) Title: Improper posteriors are not improper |
Abstract: In 1933 Kolmogorov constructed a general theory that defines the modern concept of conditional expectation. In 1955 Renyi fomulated a new axiomatic theory for probability motivated by the need to include unbounded measures. We introduce a general concept of conditional expectation in Renyi spaces. In this theory improper priors are allowed, and the resulting posterior can also be improper. In 1965 Lindley published his classic text on Bayesian statistics using the theory of Renyi, but retracted this idea in 1973 due to the appearance of marginalization paradoxes, presented by Dawid, Stone, and Zidek. The paradoxes are investigated, and the seemingly conflicting results are explained. The theory of Renyi can hence be used as an axiomatic basis for statistics that allows use of unbounded priors. This is joint work with Jarle Tufto and Bo Lindqvist. |
Fall 2017
Monday, December 18, 2017, 14:00-15:00, room 1329 (left side), 13.etg, sentralbygg 2, Gløshaugen
Statistics Christmas End-of-semester celebration
Welcome to this social event, with light refreshments.
Agenda:
- 14.00-14.20 Light refreshments (non-alcoholic apple mulled wine, julebrød, assorted xmas cookies)
- 14.20-14.35: Short commentary by Stian Lydersen: "The roles and responsibilities of a co-author".
- 14.40-14.55: Team kahoot! with questions on statistical activities this autumn semester (no theoretical statistical questions:-)
- 14.55-15.00: Julekveldsvisa - sing-along with Turid Follestad at the piano - practice on the lyrics and read background.
More about the short commentary by Stian Lydersen: "The roles and responsibilities of a co-author". What qualifies for authorship in a publication, and what are the responsibilities? Some of the roles and responsibilities defined in "The Vancouver guidelines" (www.icmje.org) can be subject to interpretation and discussion. I have co-authored more than 200 articles. And I have contributed to a recent discussion in www.universitetsavisa.no. I will share some of my experiences and thoughts about these issues.
This event is open to all statistics seminar participants this autumn semester - and everyone else!
Monday, December 11, 2017, 14:15-15:00, 1329 (left side), 13.etg, sentralbygg 2, Gløshaugen
Thea Bjørnland: Score tests for genetic association studies under extreme phenotype sampling
Abstract: In a genetic association study, the aim is to find genetic markers (positions along the DNA strand where the bases A, T, C, G varies between individuals in a population) that are associated with some phenotype (i.e. observable trait, e.g. blue or brown eyes, height). Thousands of such markers are tested for association with the phenotype. These associations are weak in the sense that we need large samples to have good enough power to detect a signal. At the same time, genotyping is an expensive procedure and obtaining a large enough sample size might be infeasible. The extreme sampling design has been proposed as a solution to this problem; if we can afford to genotype n out of N individuals in a cohort, we can have better power to detect an association if we genotype the extreme-phenotype individuals, rather than randomly selecting n individuals. Since the sampling procedure leaves us with a non-random sample of individuals, we have derived a score test particularly for this design. We consider the power of different extreme phenotype designs by application to a study of genetic association with maximum volume of oxygen uptake.
Monday, December 4, 2017, 14:15-15:00, 656, 6.etg, sentralbygg 2, Gløshaugen
Bob O'Hara (Department of Mathematical Sciences and Centre of Biodiversity Dynamics, NTNU): Why does the world look linear?
Abstract: We live in a non-linear word. And there is no reason why the world should be linear. So why are linear models such an important part of statistics (and hence science)? This cannot just be because they are relatively easy to fit: they must also be useful.
Unfortunately I don't have an answer to this question. But I have some thoughts, which I will present (including a relevant digression into ANOVA), and some half-formed ideas about how to proceed. I am hoping that this will spark some interest, and that the audience can help clear up what is happening, and why straight lines work so well.
Monday, November 20, 2017, 14:15-15:00, room 656, 6.etg, sentralbygg 2, Gløshaugen
Edmund Førland Brekke (Department of Engineering Cybernetics, NTNU): Target tracking: Applications in maritime collision avoidance and foundations in finite set statistics
Abstract: Target tracking is a key ingredient in a variety of sensor fusion systems. In particular, autonomous vehicles need tracking methods to detect and keep track of other objects that move in their vicinities. Target tracking is generally done by means of Bayesian filtering methods which generalize the Kalman filter to deal with measurement origin uncertainty in addition to plant and measurement noises. This talk will cover two topics. First, I will describe how target tracking is used in collision avoidance methods for autonomous marine vehicles in the Autosea project. Second, I will discuss how established multi-target tracking methods such as the multiple hypothesis tracker can be re-derived within the more recent framework of finite set statistics, which is a Bayesian formulation of point process theory.
Monday, November 13, 2017, 14:15-15:00, room 656, 6.etg, sentralbygg 2, Gløshaugen
Xin Luo (NTNU): Prior specification for binary Markov mesh models
Abstract: In geostatistics one is inclined to estimate a prior model for the spatial distribution of reservoir properties from one or more training images. There are several options for this purpose. For instance, multiple-point statistics models (Strebelle, 2002; Journel & Zhang, 2006) are defined for this goal, and another popular alternative is Markov mesh models (Stien & Kolbjørnsen, 2011; Abend et al., 1965). In this presentation we propose prior distributions for all parts of the specification of a Markov mesh model. In the formulation we define priors for the sequential neighborhood, for the parametric form of the conditional distributions and for the parameter values. By simulating from the resulting posterior distribution when conditioning on an observed scene, we thereby obtain an automatic model selection procedure for Markov mesh models. To sample from such a posterior distribution, we construct a reversible jump Markov chain Monte Carlo algorithm (RJMCMC). We demonstrate the usefulness of our prior formulation and the limitations of our RJMCMC algorithm in two examples.
Monday, November 6, 2017, 14:15-15:00, room 656, 6.etg, sentralbygg 2, Gløshaugen
Internal seminar with focus on teaching activities at the statistics group All staff and phd. students at the statistics group are welcome. Agenda sent by mail.
Monday, October 30, 2017, 14:15-15:00, room 656, 6.etg, sentralbygg 2, Gløshaugen
Yimin Xiao (Michigan State University): Joint Estimation of Fractal Indices for Bivariate Gaussian Processes
Abstract: Multivariate (or vector-valued) stochastic processes are important in probability, statistics and various scientific areas as stochastic models. In recent years, there has been increasing interest in investigating their statistical inference and prediction. In this talk, we study the problem for estimating jointly the fractal indices of a bivariate Gaussian process. These indices not only determine the smoothness of each component process, fractal behavior of the whole process, but also play important roles in characterizing the dependence structure among the components. Under the infill asymptotics framework, we establish joint asymptotic results for the increment-based estimators for bivariate fractal indices. Our main results show the effect of the cross dependence structure on the performance of the estimators. This is a joint paper with Yuzhen Zhou.
Monday, October 23, 2017, 14:15-15:00, room 656, 6.etg, sentralbygg 2, Gløshaugen
Stefanie Muff (Epidemiology, Biostatistics and Prevention Institute, University of Zurich, visiting NTNU): Measurement error and uncertainty in data - a fascinating statistical challenge
Abstract: When I started my research as a (bio)statistician, I was first employed by an evolutionary biologist that was worried about the fact that data are full of measurement error (often perceived as "uncertainty"), but almost nobody seemed to care. Thanks to the ubiquity of measurement error problems, my studies led to exciting interdisciplinary collaborations with epidemiologists, quantitative geneticists or movement ecologists, involving a wide range of statistical models and methods. The Bayesian framework has thereby proven to be particularly useful and flexible, especially because the "Bayesian crank" can be turned even if a model is theoretically nonidentifiable - a notorious problem of error models.
In my talk I will give a gentle introduction to different error types (classical vs Berkson), their potential effects and methods to deal with them. I will then present some examples of recent error correction methods and applications, namely for a two-component (classical/Berkson) error in a single covariate, miscounting error in the response of a zero-inflated negative binomial regression, and an example from quantitative genetics, where we correct for misassigned parentages in pedigrees by adapting the Simulation Extrapolation (SIMEX) approach. Finally, I will point out some ideas to capture measurement error and missing data in a unified framework.
Monday, October 16, 2017, 14:00-15:30, room 1329, 13. etg. sentralbygg 2, Gløshaugen
Thiago G. Martins (AIAScience and IMF, NTNU): Data science: Industry Challenges and ExpectationsAbstract: In this talk I will give an overview of some challenges faced by statisticians in the industry based on my experience while working at Yahoo and AIA Science. My goal with this presentation is to bring awareness to students around important topics that might not be currently covered by university classes and to help understanding what else can be done at the university level to better prepare students to jobs in the industry.
This seminar is arranged together with the Trondheim branch of the Norwegian Statistical Association, and pizza will be served. 53 participants attended.
Monday, September 18, 2017, 14:15-15:00, room 656, 6.etg, sentralbygg 2, Gløshaugen
Florentina Paraschiv (Faculty of Economics, NTNU): Estimation and Application of Fully Parametric Multifactor Quantile Regression with Dynamic CoefficientsAbstract: This paper develops and applies a novel estimation procedure for quantile regressions with time-varying coefficients based on a fully parametric, multifactor specification. The algorithm recursively filters the multifactor dynamic coefficients with a Kalman filter and parameters are estimated by maximum likelihood. The likelihood function is built on the Skewed-Laplace assumption. In order to eliminate the non-differentiability of the likelihood function, it is reformulated into a non-linear optimisation problem with constraints. A relaxed problem is obtained by moving the constraints into the objective, which is then solved numerically with the Augmented Lagrangian Method. In the context of an application to electricity prices, the results show the importance of modelling the time-varying features and the explicit multi-factor representation of the latent coefficients is consistent with an intuitive understanding of the complex price formation processes involving fundamentals, policy instruments and participant conduct.
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2741692
This is joint work with Derek Brunn (London Business School) and Sjur Westergaard (NTNU, Dept Industrial Economics)
Monday, September 11, 2017, 14:15-15.00, room 656, 6.etg, sentralbygg 2, Gløshaugen
Stian Lydersen (Regional Centre for Child and Youth Mental Health and Child Welfare Department of Mental Health, Faculty of Medicine and Health Sciences, NTNU): Statistical analysis of contingency tables.Abstract: The book “Statistical analysis of contingency tables” (2017) describes methods for effect size estimation, confidence intervals, and hypothesis tests for one- two- and three-dimensional contingency tables, with evaluation of their properties. Important properties of a confidence interval method are coverage probability, interval width, and interval location. For a hypothesis test method, important properties are actual significance level, and statistical power. I will show some examples of evaluations of methods for analysis of contingency tables, and the resulting recommendations. http://www.contingencytables.com
Thursday, September 7, 2017, 13.15-14.00, room 734
Sara Martino: From INLA to stochastic weather generators: My life as an applied statistician
Abstract: In this talk, I will discuss my journey as an applied statistician. In my PhD and PostDoc work, we developed what is now known as the INLA (Integrated Nested Laplace Approximation) approach to approximated Bayesian inference for latent Gaussian models. This project is still very active (R-INLA project) and has expanded in a way that we did not predict, both in methodological development and the interest from all the users around the world. I will discuss the early days of this project and also the current status.
When I started to work at SINTEF Energy, I encountered a new class of problems highly relevant for the industrial community and very challenging from a statistical point of view: stochastic weather generators. Hydropower generation is important for Norway for obvious reasons, but is highly variable and hard to predict. It is closely related to metereological variables such as precipitation, temperature, wind and so on. Planning and scheduling of hydropower production on the medium and long term, must therefore be based on scenarios describing future weather. There should be no surprise that this is especially challenging in Norway. To do this, we use a stochastic weather generator with aims to simulate realistic time series of meteorological variables. I will discuss its Statistical challenges.
Tuesday, September 5, 2017, 10.15-11.00, room 1329
Elisabeth Waldmann: Joint Modelling of Longitudinal and Time-to-Event Data - extensions in statistical learning methods andBayesian inference
Abstract: Joint Models for longitudinal and time-to-event data have gained a lot of attention in the last few years as they are a helpful technique to approach a common data structure in clinical studies where longitudinal outcomes are recorded alongside event times. Those two processes are often linked and the two outcomes should thus be modeled jointly in order to prevent the potential bias introduced by independent modelling. Commonly, joint models are estimated in likelihood based expectation maximization or Bayesian approaches using frameworks where variable selection is problematic and which do not immediately work for high-dimensional data. Gradient boosting is a method from the field of statistical learning which leads to automated variable selection and shrinkage. This talk introduces the extension of the boosting framework for joint models, which renders possible not only the selection of variables but also the allocation to the correct part of the model. These type of algorithms allow for the first time to estimate joint models in high-dimensional data situations. Based on this novel algorithm and stablished Bayesian approaches current projects and further extensions will be presented.
Tuesday, August 29, 2017, 10:15-11:00, room 1329
Gunnar Taraldsen: Statistics with improper priors
Abstract: The axiomatic foundation of probability theory presented by Kolmogorov (1933) is the basis of modern theory for probability and statistics. In certain applications it is, however, necessary or convenient to allow improper (unbounded) distributions. This is, unfortunately, most often done without a theoretical foundation. The talk reviews the theory of Taraldsen and Lindqvist (2010, 2013, 2015, 2016, 2017) which includes improper distributions, and which is related to Renyi’s theory (1955) of conditional probability spaces. It is in particular demonstrated how the theory leads to simple explanations of apparent paradoxes known from the Bayesian literature. Several examples from statistical practice with improper distributions are discussed in light of the given theoretical results. This also includes a recent theory of convergence of proper distributions to improper ones presented by Bioche and Druilhet (2016).
Keywords: axioms of probability and statistics, Bayesian statistics, conditional law, Gibbs sampling, intrinsic Gaussian Markov random fields, marginalization paradox, Jeffreys-Lindley paradox, q-vague convergence
Note: The presented work is joint with Bo Henry Lindqvist.
Monday, August 28, 2017, 14:15-15:00, Room 1329 (left) SBII
Short reports from participation at statistics(related) conferences: Please email Mette.Langaas@ntnu.no about which conference(s) you can give a short report from. "Short report"= Show webpage, talk about what you attended/liked/presented - 5 minutes pr conference
- useR!2017 in Brussels https://user2017.brussels/ (Mette)
- Norsk statistisk møte 2017 i Fredrikstad https://sites.google.com/view/nsm19/ (Thea R, Margrethe, Mette)
- BFF4 https://statistics.fas.harvard.edu/bff4 (Gunnar)
- Spatial Statistics 2017 https://www.elsevier.com/events/conferences/spatial-statistics-one-world-one-health (Torstein)
Monday, August 21, 2017, 14:15-15:00, Room 1329 (left) SBII
Welcome to the fall semester: Internal seminar of the statistics group.
Spring 2017
Thursday, June 1, 2017, 14:15-15:00, Room 1329 SBII
Sheng-Tsaing Tseng (Institute of Statistics, National Tsing-Hua University, Taiwan): Prediction of Nano-sol Products (via pH Accelerated Degradation Model)
Abstract: In order to provide timely product’s lifetime information to the customers, conventionally, manufacturers usually use temperature (or voltage) as the accelerating variable for shortening life testing time. Based on well-known life-stress relationship (such as Arrhenius reaction or inverse power model), it allows us to extrapolate the lifetime of highly-reliable products at a normal used condition. In this talk, however, we will present a real case study that the shelf-life prediction of nano-sol products can be successfully obtained by adopting pH value as an accelerating variable. A pH accelerated degradation model is proposed to describe the time-evolution of the particle size distributions under three different pH values. Then, we can analytically construct the confidence interval for the shelf-life of nano-sol products under its normal use condition.
Tuesday, May 30, 2017, 14:15-15:00, Room 1329 SBII
Esther Jones (University of St Andrews, Scotland): Seals and shipping
Abstract: Vessels can have acute and chronic impacts on marine species. The rate of increase in commercial shipping is accelerating, and there is a need to quantify and potentially manage the risk of these impacts. We present a framework to allow shipping noise, an important marine anthropogenic stressor, to be explicitly incorporated into spatial planning. Potentially sensitive areas were identified through quantifying risk to grey and harbour seals of exposure to shipping traffic, and individual noise exposure was predicted with associated uncertainty in an area with varying rates of co-occurrence. Rates of co-occurrence were highest within 50 km of the coast, close to seal haul-outs. Acoustic exposure to individual harbour seals was modelled in a study area using contemporaneous movement data from animals fitted with telemetry tags and tracking data from all ships during 2014 and 2015. For 20 of 28 animals in the study, 95% CI for cumulative Sound Exposure Levels had upper bounds above levels known to induce temporary hearing loss. Data from four acoustic recorders were used to validate sound exposure predictions.
Friday, May 19, 2017, 13:15-14:00, Room 656 Simastuen
Matz Haugen (Postdoc, University of Chicago): Assessing changes in variability of extreme temperatures using ensemble model simulations
Monday, April 24, 2017, 14:15-15:00, Room 1329 SBII
Gunnar Taraldsen (IMF, NTNU): Improper priors and fiducial inference
Abstract: The use of improper priors flourish in applications and is as such a central part of contemporary statistics. Unfortunately, this is most often presented without a theoretical basis: “Improper priors are just limits of proper priors …”. We present ingredients in a mathematical theory for statistics which generalize the axioms of Kolmogorov so that improper priors are included. A particular by-product is an elimination of the famous marginalization paradoxes in Bayesian and structural inference. Secondly, we demonstrate that structural and fiducial inference can be formulated naturally in this theory of conditional probability spaces. A particular by-product is then proof of conditions which ensure coincidence between a Bayesian posterior and the fiducial distribution. The concept of a conditional fiducial model is introduced, and the interpretation of the fiducial distribution is discussed. It is in particular explained that the information given by the prior distribution in Bayesian analysis is replaced by the information given by the fiducial relation in fiducial inference.
This presentation is a preliminary version of a 25 minutes invited talk to be given at The 4th Bayesian, Fiducial and Frequentist Workshop BFF4, Harvard University, May 1 - 3, 2017 (http://statistics.fas.harvard.edu/bff4). The first part of this seminar will hence last for 20-25 minutes, and then the next 20-25 minutes are left for questions and discussion.
Friday, March 24, 2017, 10:15-11:00, Room 656 (Simastuen)
Manuela Zucknick (Department of Biostatistics, University of Oslo): The prediction of drug responses and drug synergies in personalized cancer therapy
Abstract: It was expected that deciphering the human genome would quickly transform our understanding of biology and lead to major advances in medicine. New high-throughput technologies have generated vast amounts of data in omics with huge effort by the science community and at enormous costs. In cancer research in particular, the possibility to perform whole-genome sequencing of the tumour and compare it with the DNA sequence of corresponding normal tissue has raised the hopes to develop fully individualized treatment strategies which directly target the cancer-causing mutations. I will present the task of using whole-genome molecular data for making indidualized predictions of drug responses (for individual compounds) and of drug synergy effects (for combinations of compounds) in personalized cancer therapy. I will talk about some our our experiences, discuss the main challenges and present some solutions.
I will focus on one particular project, where we investigate several multivariate penalized regression methods to jointly model the sensitivity of cancer cell lines to a set of related pharmacological compounds. This includes standard multivariate lasso and elastic net regression, tree-lasso (Kim & Xing, 2012) which can capture a hierarchical structure between drugs, and spatial lasso (Lam & Souza, 2014) where the prediction model for one drug includes the other (related) drugs as covariates. To distinguish different sources of molecular data, we then combine these models with the Integrative LASSO with Penalty Factors (IPF-LASSO; Boulesteix et al., 2015) which assigns different penalties to different sources of features.
Tuesday, March 14, 2017, 14:15-15:00, Room 1329 SBII
Bob O'Hara (IMF, NTNU): Ecological Statistics and Statistical Ecology
Abstract: Ecology is an appealing area for statisticians to work in, because the data are often messy and the ecological questions are amenable to a statistical treatment. I will survey some of the work I have done in the past, adapting econometric models to look at how and why communities of species change in time, and use point processes in a state space model (and INLA) to combine different sources of data to model where species occur. I will end up by suggesting how I will continue with this work in Trondheim.
Tuesday, February 28, 2017, 14:15-15:00, Room 1329 SBII
Torstein Mæland Fjeldstad (PhD-student, IMF, NTNU): Spatially coupled Gaussian mixture prior models with applications in reservoir prediction
Abstract: Prediction of reservoir properties, such as porosity, subsurface is a problem of utmost importance in reservoir prediction. Since these properties in general appear as skewed and multimodal, the traditional Gaussian prior model is not necessarily adequate. We introduce a lithology/fluid dependent rock physics model to honour geophysical constraints and present a high dimensional spatial Gaussian mixture prior model for the reservoir properties. For a Gaussian likelihood function, the posterior model of interest is a spatial Gaussian mixture model. However, due to the spatial coupling in the likelihood model the posterior model cannot be assessed analytically. We present a class of approximate posterior models and sketch an efficient MCMC algorithm to assess the correct posterior model. A 2D crossline case study from the Norwegian Sea is presented.
Tuesday, February 14, 2017, 14:15-15:00, Room 1329 SBII
Jo Eidsvik (IMF, NTNU): Value of Information in the Earth Sciences
Abstract: We constantly use information to make decisions about utilizing and managing resources. How can we quantitatively analyze and evaluate different information sources? What is the value of data and how much data is enough? This presentation covers multidisciplinary concepts required for conducting value of information analysis for multivariate spatial situations. The value of information is computed before purchasing data, and can be useful for checking if data acquisition or processing is worth its price, or for comparing various experiments. Examples demonstrate value of information analysis for various applications in the earth sciences.
Fall 2016
Monday, December 19, 2016, 14:15-15:00, Room 1329, SBII
Henning Omre (IMF, NTNU): Henning's Christmas Causerie: 'The Noble Art of Estimating Pi.'
Abstract: In 1773, Georges-Louis Leclerc (1707-1788) - titled 'Compte de Buffon' - designed an experiment and defined an estimator for assessing the value of the physical constant pi - known to us as being 3.1415926535… . His experiment is usually termed ' Buffon's Needle'. The'Compte' estimated pi, but there existed no statistical theory at the time so he had no deep insight into his estimation procedure. An extension of his original experiment was also defined by the 'Compte' in 1777 - unfortunately his calculations contained an error - and the correct solution was presented by Pierre-Simon Laplace (1749-1827) in 1812. Hence this extended experiment was termed 'Buffon -Laplace's Needle'. Through history since then, the estimation problem has drawn a lot of attention from statisticians, and we will look casually at some of the Needle-results, which also make us touch 'Bertran's Paradox'.
Monday, November 21, 2016, 13:15-14:00, Auditorium S4 SBII
Ola Diserud (Norwegian Institute for Nature Research, NINA): Monitoring biological diversity – more pitfalls than solid ground?
Abstract: Norwegian nature management is interested in monitoring nature in a way that detects undesirable changes in biological diversity, due to e.g. climate change or other human pressure factors. Managers and politicians have often unrealistic expectations for what’s possible to detect within a reasonable sampling effort, since most biological communities have a large natural stochasticity and the sampling methods can be both biased and uncertain. I will here discuss some of the challenges we at NINA encounter when analyzing biological diversity, and illustrate these by some selected data sets on biological communities.
Monday, October 31, 2016, 13:15-14:00, Auditorium S4 SBII
Turid Follestad (Unit of Applied Clinical Research, Faculty of Medicine, NTNU): An application of penalized regression to prediction of good recovery after moderate, traumatic brain injuries.
Abstract: I will present some examples of projects we are involved in as statisticians at the Faculty of Medicine at NTNU. One common type of problem in medical research is to identify risk factors or predictors for an outcome. I will in particular discuss the application of penalized regression, using the lasso, for studying predictors for good recovery for patients having suffered a moderate traumatic brain injury.
Monday, October 17, 2016, 13:15-14:00, Auditorium S4 SBII
Gunnar Taraldsen (IMF, NTNU): What is fiducial inference – and why?
Abstract: This seminar contribution – with active discussion with the audience - presents some of the historic background, the original example presented by Fisher, and recent developments and trends as seen by members of the BFF group (Bayes-Fiducial-Frequentist = Best-Friends-Forever) and recent and upcoming JASA publications. slides
Monday, October 3, 2016, 13:15-14:00, Auditorium S4 SBII
Ioannis Vardaxis (PhD-student, IMF, NTNU): Bayesian Model-based Analysis for ChiA-PET Data (BACPET)
Abstract: It is known that genomes are organized as three-dimensional rather than linear structures in the nucleus of the cells. Recently, the ChIA-PET strategy was introduced which enables not only the identification of protein binding sites on the genome, but also the investigation of interactions created by those proteins. Those interactions transform the linear structure of the genome into three-dimentional. Researchers have been placing their focus on investigating the 3D DNA structure by analyzing interaction data provided by ChIA-PET, while they have been still using existing algorithms, like MACS, for finding protein binding sites using ChIA-PET data. The existing algorithms, however, are built based on ChIP-seq data and are therefore not the optimal choice for binding site analysis using paired-end data like ChIA-PET, because they do not use all the available information that ChIA-PET data provides. We propose a new Bayesian model-based method: Bayesian Analysis for ChIA-PET data (BACPET), and pipeline for the identification of protein binding sites on the genome using ChIA-PET data. Unlike MACS, BACPET uses information from both tags of each PET in ChIA-PET data and searches for binding sites in two dimensional space. BACPET also takes into account different noise levels in different genomic regions. Finally, BACPET shows favorable results compared to MACS, in terms of motif occurrence, the precise binding site locations and inference.
Monday, September 26, 2016, 13:15-14:00, Auditorium S4 SBII
Gunnar Taraldsen (IMF, NTNU): The road from conditional Monte Carlo to improper priors and fiducial inference.
Or trying to prove a false statement . . .
Abstract: This is joint work with Bo Lindqvist. We started out in 1997 on a project to clarify certain aspects of a conditional Monte Carlo method. The method produces conditional samples given a sufficient statistic, and can hence be used to construct optimal inference procedures. In some cases it can also be used to eliminate nuisance parameters by conditioning. The steps in the algorithm give strong links to fiducial inference, and also to the foundations of statistics both in mathematical and philosophical terms. The talk will give an informal sketch of the initial method, and the many related foundational and practical issues. slides
Spring 2016
Friday, April 29, 2016, 14:15-15:00, Rom 1329 SBII
Jingyi Guo (PhD-student, IMF, NTNU): Using interpretable priors in bivariate meta-analysis of diagnostic test studies
Abstract: In bivariate meta-analyses the number of studies involved is often low and data are sparse, so that model fitting using likelihood approaches can be problematic and Bayesian approaches are advantageous. However, Bayesian analysis is often computationally demanding and the selection of the prior for the covariance matrix of the bivariate structure is crucial. Bayesian inference became attractive for routine use after the proposal of integrated nested Laplace approximation (INLA). However, the assignment of suitable prior distributions for the covariance matrix of the bivariate random effects has been still challenging. In this presentation, I will apply the recently proposed framework of penalised complexity(PC) priors to the variance and the correlation components in a Bayesian bivariate meta-analysis of diagnostic test studies. PC priors facilitate model interpretation and hyperparameter specification as expert knowledge can be incorporated intuitively. To investigate the usage of PC priors in practice we re-analyse a meta-analysis using the telomerase marker for the diagnosis of bladder cancer with our new user-friendly R package meta4diag.
Friday, February 26, 2016, 14:15-15:00, Rom 1329 SBII
Emre Yaksi (Kavli Institute for systems neuroscience, Center for Neural Computation, NTNU): Sensory processing in zebrafish brain
Abstract: Our laboratory is mixture of enthusiastic life scientist, physicists and engineers, whose goal is to understand the fundamental principles underlying the function of brain circuits in health and disease. In order to achieve this aim, we use genetically tractable small model organisms, zebrafish and fruitfly. We monitor, dissect and perturb these tiny brains, through a combination of functional imaging, optogenetics, electrophysiological recordings, molecular genetics and quantitative behavioral assays. Our primary goal is to understand how chemosensory world (smell and taste) is represented in the brain and how these computations regulate different behaviorals (e.g. fear, arousal, feeding). Moreover, we are interested in understanding how these representations are modulated by behavioral states of animals (e.g, stress and hunger) or other senses (e.g. vision). We achieve this by focusing on those brain areas that integrate information from multiple sensory modalities and closely relate to behavior. Small and accessible brain of zebrafish provides an exceptional framework for studying the neural circuit computations both locally and across multiple brain regions simultaneously. In my seminar, I will discuss about how internal states of brain networks can generate ongoing spontaneous neural activity and how this ongoing brain activity can influence the representations of sensory information in the brain. Our findings suggest that a small evolutionary conserved brain region, habenula, sits in the middle of this complex network, acting very much like a hub. We showed that habenula operates like a switchboard and can use ongoing brain activity to gate and relay information from multiple brain regions to downstream brainstem nuclei that regulate animal behavior.
Fall 2015
Friday, December 11, 2015, 14:15-15:00, Rom 1329 SBII
Henning Omre (IMF, NTNU): A Causerie on Probability & Statistics in Space and Time
CANCELLED: Friday, December 11, 2015, 14:15-15:00, Rom 1329 SBII
Juan M. Restrepo (Department of Mathematics, Oregon State University): Defining a Trend of a Multi-Scale Time Series
Abstract: Defining a trend for a time series is a fundamental task in time series analyses. It has very practical outcomes: determining a trend in a financial signal, the average behavior of a dynamic process, defining exceptional and likely behavior as evidenced by a time series. On signals and time series that have an underlying stationary statistical distribution there are a variety of ways to estimate a trend, many of which come equipped with a very concrete notion of optimality. Signals that are not statistically stationary are commonly encountered in nature, business, and the social sciences and for these the challenge of defining a trend is two-fold: computing it, and figuring out what this trend means. Adaptive filtering is frequently explored as a means to calculating/proposing a trend. The Empirical Mode Decomposition and the Intrinsic Time Decomposition are such schemes. I will describe a practical notion of trend based upon the ITD we call a "tendency." We will briefly describe how to compute the tendency and explain its meaning.
Friday, November 27, 2015, 14:15-approx. 16:00, Rom 1329 SBII
This seminar is part of the General Meeting of the Trondheim Chapter of Norwegian Statistical Association.
Eva Skovlund (Department of Public Health and General Practice, NTNU): Health registries and cohorts in medical research
Abstract: Epidemiological research deals with describing and understanding patterns of disease occurrence across populations or groups and with identifying causes of disease or disease outcomes. A main topic is thus to study the association between different types of exposure and disease. The quality of available data is of course of paramount importance. In general a large part of epidemiological research is based on observational data from cohorts which may have limitations due to selection bias and attrition. The Nordic countries are famous for our population-wide health registries. The talk will present some of these registries as well as some large cohorts, and examples of successful research will be given. Some of the challenges inherent in drawing conclusions based on analyses of observational data will also be discussed.
Friday, November 20, 2015, 14:15-15:00, Rom 1329 SBII
Mette Langaas, Håkon Tjelmeland, Thea Bjørnland, Margrethe K. Loe og Torstein M. Fjeldstad TMA4245 Statistikk og KTDiM: nye planer for V2016
Sammendrag: "Kvalitet, tilgjengelighet og differensiering innen grunnutdanningen i matematikk (KTDiM)" er IMFs 3 årige (2014-2016) prosjekt innen innovativ utdanning ved NTNU. Fagene Matematikk 1, Matematikk 2 og TMA4240/TMA4245 Statistikk utgjør kjernen i prosjektet, og aktiviteter er orientert rundt forelesninger (spesielt fokus på interativitet), nettbaserte læringsressurser (temasider og videoer), øvinger (nettbaserte Maple TA øvinger, anbefalte og skriftlige øvinger) og veiledning i matte/statistikk-lab. Pedagogisk følgeforskning og kommunikasjon med studentene (evalueringer) har en sentral plass i prosjektet. For V2016 vil vi i TMA4245 Statistikk, som det tredje store faget fra grunnutdanningen i matematikk, implementere viktige KTDiM-elementer. I denne presentasjonen vil vi fortelle om de konkrete nye aktivitetene som er planlagt for V2016 - og legge spesielt vekt på målet med disse aktivitetene. Vi vil gi demonstrasjoner (bla videoer og Maple TA). Ett av målene med vår presentasjon - i tillegg til å informere om omlegget - er at vi ønsker å knytte til oss flere av de ansatte og stipendiatene i statistikkgruppa. I tillegg vil vi gjerne diskutere om noen av KTDiM-elementene kan være aktuelle å implementere i andre statistikkurs.
Friday, November 13, 2015, 14:15-15:00, Rom 1329 SBII
Bob O'Hara (BiK-F, Biodiversity and Climate Change Research Centre, Frankfurt am Main, Germany): Lets not be discrete with our SDMs
Abstract: A major problem for ecologists is inferring the distribution of species, using records of where they have been observed. The most popular methods to do this take little account of the way the data were collected, and also reduce the world to discrete grid cells. I will describe an alternative approach, which teats the species' distribution as a continuous intensity, thus side-stepping the problem of spatial scale. Different observation models can then be used together, for the different types of data. Conveniently, it can be fitted with INLA, using ideas developed in that context.
Friday, October 30, 2015, 14:15-15:00, Rom 1329 SBII
Harald Martens (prof. II, Inst. teknisk kybernetikk, NTNU): Myk matematisk modellering av virkeligheten: En medisin mot matte-angst, Macho matematikk og Gucci statistikk ?
Abstract:
Fremtidens overflod av data må tolkes. Til det trengs matematisk modellering og statistisk vurdering - tusen ganger mer enn idag. Dette foredraget handler om hvordan vi jobber ved Inst. Teknisk Kybernetikk og i IDLETechs AS for å utvikle metodikk til å tolke Big Data med naturvitenskapens øyne. Hvordan vi kombinerer ulike datamodellerings-kulturer. Multivariat Myk Modellering, slik den er utviklet innen kjemometri, er en nesten helt forutsetningsfri generalisert Taylor-utvikling av ukjente lovmessigheter i store tabeller. Dette egner seg til Gjør-Det-Selv data-analyse ute blant brukere (kjemikere, fysikere, biologier, psykologer) i praksis. Det brukes f.eks. mye til tolkning av flerfaglige datatabeller (1,2 ) og til enkel løsning av invers-problemer som f.eks. kalibrering av mange-kanals måleinstrumenter (3). Multivariat Meta-modellering (4) er bruken av slik myk modellering til å få oversikt over hvordan overveldende kompliserte matematiske modeller oppfører seg i praksis (utvidet sensitivitetsanalyse), til å sammenligne ulike modelleringsalternativer (5), til å speede opp beregninger (tolkbar surrogatmodellering) og til å forenkle tilpasningen av matematisk «sloppy» modeller til observasjonsdata (6). Disse teknikkene skaper bro mellom matematiserende fag og andre fag, i tidens rom, rommets rom og egenskapenes rom. Jeg vil i foredraget diskutere hvordan vi sammen kan øke den fornuftige bruken av matematikk og statistikk i forskning, utvikling og i samfunnet for øvrig. Bl.a. vil jeg diskutere de matematiserende fagenes samfunnsrolle, problematisere den akademiske hakkeorden, der «teori er finere enn praksis», og reflektere over forholdet mellom personlighetstype og matematisk kultur. Og jeg skal beskrive noen nye kurs og verktøy for nysgjerrighets-drevet data-modellering, som motvekt både mot dagens utbredte matte-angst og mot amatørenes MACHO-MATEMATIKK («My Model, right or wrong!») og GUCCI-STATISTIKK («looking good with senseless p-values». Jeg ønsker at NTNU skal utvikle et lav-terskel Oppdag-Verden kurs, basert på fler-domene myk modellering, med et metodisk minimums-pensum (få eller ingen formler, kun noen enkle algoritmer med vektor-multiplikasjon, kryssvalidering og grafikk, samt mye bruk av mobil-mikrofon og -kamera som datakilder). Det skal gi begynner-studentene selvtillit og motivere dem for å ta de kommende matematikk og statistikkfagene alvorlig.
Bakgrunn: Det finnes ikke profesjonelle data-analytikere nok til å håndtere alle de datasettene som nå trenger analyse. Data er ikke det samme som informasjon for mennesker. Jo mer rådata man har, jo mindre informasjon. Men hvem skal tolke data i praksis? Med fremtidens eksplosive økning av tekniske måledata («Quantitative Big Data») er det behov for en drastisk økning i bruk av matematikk og statistikk, utført av de som eier problemstillingene og har skaffet dataene. Hvem tar ansvar for dette? Matematisk modellering er nødvendig for forenkling av måledata. Det viktig å oppsummere de viktige mønstrene i datatabeller og gjøre innholdet kognitivt tilgjengelig for eierne av dataene, og derved minimere type II feil: risikere å overse eller feiltolke det som burde vært oppdaget. Men i matematisk datamodellering er det viktig hva modellene GJØR, ikke bare hva hvordan modellene SER UT matematisk. I praktisk matematisk modellering må matematikken bygge tovegs kommunikasjon mellom hva verden prøver å fortelle oss og hva vi prøvde å si om verden. Statistisk vurdering er nødvendig for å gardere mot ønsketenkning. Det handler om planlegging av datainnsamling: Hvilke variabler og hvor mange? Hvilke objekter og hvor mange? Og det handler om vurdering av de data-tabellene man så får: hvor valide er de resulterende matematiske datamodellene og våre konklusjoner derfra? Altså også minimere risiko for Type I feil. Men i statistisk datamodellering er det viktig å ikke fortape seg i residualenes statistiske fordeling: Det er ikke feilene som er viktige – det er selve parameterverdiene – deres innbyrdes mønstre og deres relasjon til vår for-forståelse. Mennesker tenker forskjellig, og ulike fagkulturer tiltrekker ulike mennesketyper. Matematikk-kulturen har en privilegert plass, mange trygge stillinger og et stort samfunnsansvar. En stor innsats er gjort ved NTNU for å forbedre måten man underviser matematikk til de mange ikke-matematikerne, og det er meget prisverdig. Men man kan kanskje spørre: Hvordan gjøre de spesielt kreative og de spesielt empatiske student-typene bedre i stand til å bruke matematikk og statistikk?
Referanser:
- Martens H & Martens M (2001) Multivariate Analysis of Quality. An Introduction. J. Wiley & Sons Chichester UK. http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0471974285.html
- The Unscrambler, Version 10. http://www.camo.com/
- Martens H & Næs T (1989) Multivariate Calibration. J. Wiley & Sons Chichester UK. http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0471930474.html
- Tøndel K & Martens H (2014) Analyzing complex mathematical model behavior by partial least squares regression-based multivariate metamodeling. WIREs Comp Stat, 6: 440–475. http://onlinelibrary.wiley.com/doi/10.1002/wics.1325/abstract
- Isaeva J, Martens M, Sæbø S, Wyller J. A. & Martens H. The modelome of line curvature: Many nonlinear models approximated by a single bi-linear metamodel with verbal profiling. Physica D: Nonlinear Phenomena 241, 877–889 (2012).
- Tafintseva V, Tøndel K, Ponosov A, Martens H (2014) Global structure of sloppiness in a nonlinear model. J. Chemometrics, 28(8), 645–655.
- Martens H (2015) Quantitative Big Data: where Chemometrics can contribute. Perspective Paper, J. Chemometrics, in press.
Friday, October 16, 2015, 14:15-15:00, Rom 1329 SBII
Elias Krainski (PhD-student, IMF, NTNU): A non-separable space-time model
Abstract: A non-separable model derived from a iterated heat equation will be presented. This model has three parameters with physical interpretation. However, to use it as a statistical model it is good to have statistical parameters, such as marginal variance and correlation ranges. This is important in order to understand the model and to assign priors. Some results of a mapping from the statistical parameters to the physical ones is provided.
Friday, October 2, 2015, 14:15-15:00, Rom 1329 SBII
Bo Lindqvist (IMF, NTNU): On the proper treatment of improper distributions
Abstract: The axiomatic foundation of probability theory presented by Kolmogorov has been the basis of modern theory for probability and statistics. In certain applications it is however necessary or convenient to allow improper (unbounded) distributions, which is often done without a theoretical foundation. The talk presents the elements of a mathematical theory which includes improper distributions. An obvious motivation for the study is the extensive use of improper priors in Bayesian statistics, where the present theory gives an alternative to common ad hoc arguments which are not based on an underlying theory. In particular, the approach leads to simple explanations of apparent paradoxes known from the literature. Examples involving Gibbs sampling and intrinsic prior models will be presented.
This is joint work with Gunnar Taraldsen.
Wednesday September 23, 2015, 10:15-11:00, Rom 1329 SBII
Jukka Corander (University of Helsinki): Voodoo or real inference - ABC meets machine learning
Abstract: Some statistical models are specified via a data generating process for which the likelihood function cannot be computed in closed form. Standard likelihood-based inference is then not feasible but the model parameters can be inferred by finding the values which yield simulated data that resemble the observed data. This approach faces at least two major difficulties: The first difficulty is the choice of the discrepancy measure which is used to judge whether the simulated data resemble the observed data. The second difficulty is the computationally efficient identification of regions in the parameter space where the discrepancy is low. An introduction is given to recent work which tackle the two difficulties through classification and Bayesian optimization.
Spring 2015
Thursday April 23, 2015, 14:15 - 15:00, Rom 1326 SBII
Anne Barros (Department of Production and Quality Engineering, NTNU): Multivariate Degradation Model with Dependenc Due to Shocks
Abstract The aim of the presented work is to model degradation phenomena in a multi-unit context taking into account stochastic dependence between the units. Here, we intend to propose a model structure that allows in the same time enough complexity for the representation of phenomena and enough simplicity for the analytical calculations. More precisely, a multi-unit system is considered, which is submitted to a random stressing environment which arrives by shocks. The model takes into account two types of dependence between the components: firstly, a shock impacts all components at the same time; secondly, for a given shock, the deterioration increments of the different components are considered to be correlated. The intrinsic deterioration of the n (say) units is modeled through independent stochastic processes. Given the usual nature of the degradation phenomena, is it seems reasonable to suppose that each intrinsic deterioration should be a monotone process with continuous state space. Accordingly, the shocks are classically assumed to arrive independently, according to a Poisson process. The parameter estimation (moment method) and the reliability assessment are presented for any multi-unit systems with coherent structure. At last a numerical result is presented with a 3 units system.
This is joint work with Sophie Mercier (University of Pau - France) and Antoine Grall (University of Technology of Troyes - France).
Torsdag 9. april 2015, 14:15 - 15:00 Rom 1326
Egil Ferkingstad (Dept. of Math. Sci., NTNU): Monte Carlo null models for genomic data
Abstract: As increasingly complex hypothesis-testing scenarios are considered in many scientific fields, analytic derivation of null distributions is often out of reach. To the rescue comes Monte Carlo testing, which may appear deceptively simple: as long as you can sample test statistics under the null hypothesis, the p-value is just the proportion of sampled test statistics that exceed the observed test statistic. Sampling test statistics is often simple once you have a Monte Carlo null model for your data, and defining some form of randomization procedure is also, in many cases, relatively straightforward. However, there may be several possible choices of a randomization null model for the data and no clear-cut criteria for choosing among them. Obviously, different null models may lead to very different p-values, and a very low p-value may thus occur due to the inadequacy of the chosen null model. It is preferable to use assumptions about the underlying random data generation process to guide selection of a null model. In many cases, we may order the null models by increasing preservation of the data characteristics, and we argue in this paper that this ordering in most cases gives increasing p-values, that is, lower significance. We denote this as the null complexity principle. The principle gives a better understanding of the different null models and may guide in the choice between the different models.
Torsdag 26. Mars 2015, 14:15 - 15:00 Rom 1326
Professor Marian Scott (School of Mathematics and Statistics, University of Glasgow): The environmental 'data deluge' and statistical challenges it present.
Abstract: This presentation will compare and contrast past environmental monitoring modelling with what the future holds with high frequency, low energy sensor networks and how our statistical modelling needs to respond. The presentation will also touch on how we communicate our modelling results before concluding with some ideas about where future challenges might lie.
Torsdag 11. Mars 2015, 14:15 - 15:00 Rom 1326
Jarle Tufto (Dept. of Math. Sci., NTNU): Darwinian evolution, plasticity and bet-hedging as adaptive responses to temporally autocorrelated fluctuating selection: A quantitative genetic, joint evolutionary model
Abstract: Darwinian genetic evolution, the evolution of plasticity and the evolution of diversification bet-hedging strategies is considered jointly within a unified quantitative genetic framework. Analytic approximations expressing the mutual dependencies between these evolutionary responses are derived, given temporally autocorrelated macroenvironmental fluctuations and additional individually varying microenvironmental deviations determining the expressed and optimal phenotype at the time of development and selection, respectively. Both plasticity and genetic evolution in mean reaction norm elevation are favoured by slow temporal fluctuations in the environment, reducing the between-generation variance of the mismatch between the mean expressed and mean optimal phenotype. Selection for increased phenotypic variance (bet-hedging) only occurs if fluctuations in the environment are fast such that the mismatch variance remain above a critical threshold. Otherwise, the phenotype is canalized and microenvironmental variability leads to selection for either less or more plasticity depending on the correlation between individual microenvironmental devations at the time of development and selection. Without microenvironmental variability, the conditions favouring plasticity and genetic evolution become coupled, with plasticity being the dominant evolutionary outcome for reasonable parameter values. The adaptive significance of evolutionary fluctuations in plasticity and the phenotypic variance, and the validity of the analytic approximations are investigated using simulations. Finally, a new method for estimating patterns of fluctuating selection using INLA is applied to data on breeding date in a population of great tits (Parus major).
Torsdag 26. Feb 2015, 14:15 - 15:00 Rom 1326
Haakon Bakka (Dept. of Math. Sci., NTNU): Animal habitats in spatial models - what to do when not all land is created equal?
Abstract: First I will introduce latent Gaussian spatial models, give a quick outline of INLA and the SPDE approach. Then I will talk about the research I have been doing for my PhD. Classical models in spatial statistics assume that the correlation between two points depends only on the distance between them. In practice, however, the shortest distance may not be an appropriate measure of the separation of two points. Real life is not stationary! For example, when modelling fish near the shore, correlation should not take the shortest path going across land, but should travel along the shoreline.
Similar problems occur in ecology, where animal movement depends on the terrain or the existence of animal corridors. I will show how this kind of information can be included in a spatial model. For analysing a point process we can use a latent Gaussian log-Cox point process model, set up in a Bayesian hierarchical framework, and computed using INLA. We have a linear predictor for the log-intensity of the process, which includes a spatial field. How do we model a process with different clustering in different parts of the domain? How do we adapt the point process to different cluster sizes in different areas, without changing the average number of points? In this talk, we will answer these questions.
Torsdag 12. Feb 2015, 14:15 - 15:00, Rom 1326
Yingjun Deng (University of Technology of Troyes, France): Nonparametric Estimation of Failure Level via Inverse First Passage Problems
Abstract: In this presentation, the degradation process is described by a time-dependent Ornstein-Uhlenbeck (OU) process. The system failure time is further described by the first passage time to a pending failure level, which is solved later by inverse first passage problems (IFPT). Different from the ordinary estimation based on physical barriers or experts’ judgments, the failure level estimation via IFPT problems aims to make up the inconsistence between observed failure records and prognosis based on first passage times. A numerical algorithm based on Fortet’s equation is proposed to solve the IFPT problem for the OU process. Several simulation tests are fulfilled to verify proposed algorithms.
Fall 2011
? desember 2011
Professor Jostein Lillestøl (Institutt for Foretaksøkonomi, Norges Handelshøyskole, Bergen): The Z-Poisson distribution with application to the modelling of soccer score probabilities
Fredag 2. desember 2011
Paul Fearnhead (Dept. of Mathematics and Statistics, Lancaster University, UK): Changepoint Method
Torsdag 10. november 2011
Erlend Aune: Matrix functions for Gaussian distributions: A computational approach
Torsdag 3. november 2011
Nikolai Ushakov: Two talks: 1. Convergence of moment generating functions and corresponding distributions 2. Some moment inequalities
Torsdag 27. oktober 2011
Jørn Vatn: Probabilistic modelling of effect of ultrasonic inspection of rails and the need for data collection and analysisTorsdag 20. oktober 2011
Stian Lydersen: Recommended confidence intervals for binomial proportions
Torsdag 29. september 2011
Arvid Næss: Reliability of structural systems by enhanced Monte Carlo methods
Spring 2011
Tirsdag 7. juni 2011
Ben Shaby (Duke University): A more practical max-stable process for spatial extremes
Onsdag 1. juni 2011
James Gunning (CSIRO, Australia): Some newer algorithms in joint categorical and continuous inversion problems around seismic data
Tirsdag 3. mai 2011
Bo Lindqvist: Competing risks in a health survey
Tirsdag 12. april 2011
Ingelin Steinsland: Spatial predictive distribution for precipitation based on numerical weather predictions
Onsdag 6. april 2011
Yasser Roudi (Kavlie Centre): Inferring network structure from the state of a non-equilibrium system and applications to multivariate neural data analysis
Tirsdag 29. mars 2011
Finn Lindgren: Solving large spatial problems with consistent Gaussian markov random fields
Tirsdag 22. mars 2011
Geir Evensen (Statoil): Using Ensemble methods for history matching reservoir models
Tirsdag 1. mars 2011
John Tyssedal: Two-level designs from a screening perspective
Tirsdag 15. februar 2011
Michela Cameletti (Univ. of Bergamo): Spatio-temporal model for air quality data
Fall 2006
Man 25/09 2006, 13:15-14:00, Lunchrom 13. etasje
Bo Lindqvist: Competing risk
Spring 2006
Tir 28/03 2006, 12:15 - 13:00, Rom 738
Marit Ulvmoen (Department of mathematical sciences, NTNU, Trondheim): Seismic lithology-fluid prediction based on a hidden Markov random field
Abstract: The knowledge of lithology (rock types) and fluid filling (water, oil or gas) in the reservoir is crucial in evaluation of petroleum prospects. In the North Sea these reservoirs are offshore at a depth of about 3000 meters and hence not easily assessable. The lithology-fluid characteristics must be predicted based on general knowledge about geological sedimentation and fluid behaviour, and on reservoir specific indirect observations. These observations are usually made through seismic surveys from ships and meassurements in a small number of wells. The observations do not uniquelt determine the lithology-fluid characteristic, hence their prediction can be considered an illposed inverse problem.
The inverse problem is cast in a Bayesian framework with a prior model representing the general knowledge about lithology-fluid properties, while the likelihood model represent the observation acquisition procedure. In the current study the lithology-fluid variables are represented by the four classes: shale, brine (water) filled sand, oil filled sand, and gas filled sand.
The prior model captures information about the vertical sequence of sedimentation of sand-shale and gravity segregation of brine-oil-gas. Moreover, the lithology-fluid characteristics are known to be fairly continuous horizontally. In order to represent this fairly precise prior knowledge, the prior Markov random field is formulated in a particullar way which is coined a profile Markov random field.
The likelihood model captures information about the observation acquisition procedure. Well observations are assumed to be exact observations of lithology-fluid properties along vertical wells. The seismic data however, consists of reflections of sound pulses generated at the earths surface. These reflections are caused by changes in the lithology-fluid properties and it is modeled by wave propagation in solid matters which entails angle-dependence and convolution in the seismic data. Moreover, the observation errors are expected to have considerable spatial dependence. Consequently, the likelihood model is non-linear with strong spatial coupling.
The posterior model is fully defined by the prior and likelihood models, but the normalizing constant is not analytically tractable. Brute force McMC sampling is hardly feasible for 3D problems of this size. We have defined an approximate posterior model which can be assessed by simulation using a mixture of a recursive and an iterative algorithm. The algorithm appers to have favorable convergence characteristic.
The approach will be evaluated on a 2D synthetic reservoir case which is inspired by a real North Sea reservoir.
The presentation is a continuation of the work presented in this article.
Tir 14/03 2006, 12:15 - 13:00, Rom 738
Daniel Berg presenterer artikkelen Copulas
Aas K.: (2004). Modelling the dependence structure of financial assets: A survey of four copulas. Note, Norwegian Computing Centre, December 2004.
og
Embrechts P., Lindskog F., McNeil A.: (2003). Modelling Dependence with Copulas and Applications to Risk Management. In: Handbook of heavy tailed distributions in finance.
Tir 07/03 2006, 12:15 - 13:00, Rom 738
Håkon Tjelmeland presenterer artikkelen Bayesianske metoder i det 21.århundret
Scott (2002) "Bayesian Methods for Hidden Markov Models: Recursive Computing for the 21st Century"
Tir 28/02 2006, 12:15 - 13:00, Rom 738
Ingelin Steinsland presenterer artikkelen Modellseleksjon
Artikkel: Kadane, J. B., Lazar, N. A.: (2004). Methods and Criteria for Model Selection, JASA, 99, 465, 279-290.
Fall 2005
Man 21/11 2005, 14:15-15:00, Kantina i 13. etasje (rom 1329) , Sentralbygg II
Forsker Gunnar Taraldsen (Acoustic Research Center, SINTEF and NTNU): Comments on the ISO Guide to the Uncertainty of Measurements
Abstract: The International Organization for Standardization (ISO) requires that the 1993 edition of the Guide to the expression of uncertainty in measurement (GUM) be referenced when writing standards concerning the expression of uncertainty in measurement. The purpose of such guidance is:
- to promote full information on how uncertainty statements are arrived at;
- to provide a basis for the international comparison of measurement results.
In 2004 the ISO WG43 (Acoustics) published a policy paper on the treatment on measurement uncertainty in standards on acoustics: "Use the GUM as follows …" This indicates that the ISO WGs so far have found it difficult to apply the GUM.
The purpose of this lecture is to comment on some aspects of the GUM.
- It will be argued that the GUM can and should be interpreted in terms of classical statistics. This will be done by a presentation of the procedure by example.
- Some parts of the GUM can seem to indicate that a Bayesian approach is intended. This will be discussed.
- The GUM gives a definition of the term 'true value'. A consequence is that parts of the GUM has philosophical content.
I would very much like to use some of the time for discussion and alternative viewpoints. It may be beneficial to consider (download your own private version from Bibsys-British standard http://www.bsonline.bsi-global.com)
References
- ISO, Guide to the expression of uncertainty in measurement (GUM)
- IS0 3534, 1985, Statistics - Vocabulary and symbols
- IS0 5725, 1994, Accuracy (trueness and precision) of measurement methods and results
- INCE (2005). Managing uncertainties in noise measurements and predictions: a new challenge for acousticians. Uncertainty Noise Symposium, LeMans, INCE.
Man 14/11 2005, 14.15-15.00, Kantina i 13. etasje, sentralbygg II
Sara Martino: Approximative inference for Hierarchical Gaussian Markov Random Fields
Abstract: Many common models in statistics involve a hidden (non observed) Gaussian Markov Random Field (GMRF). In these settings MCMC, which is the common answer to inferential problems, is particularly slow to converge. We discuss an approximated, deterministic alternative to MCMC-based inference to compute posterior marginals for hierarchical GMRF. The benefit of this approach is the computational speed. It can be computed almost instant compared to its MCMC competitors and it turns out that for typical applications the accuracy is also very high.
Man 24/10 2005, 14.15-15.00, S1
Hans C. van Houwelingen (Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, The Netherlands): Predicting survival with gene-expression data: a case study on the Dutch breast cancer data set
Abstract: One of the applications of micro gene-expression data is the prediction of the clinical outcome of patients that are treated for a disease. In cancer patients the outcome of interest is usually the survival of the patient. Obtaining reliable predictive models for survival data is complicated by the censoring of such data.
In the process of model building we can distinguish three phases: 1) global testing for the relation between the gene-expression data and the survival outcome, 2) making a predictive model using all genes and 3) improving the model by using external (pathway) information.
In the talk I will sketch the statistical methodology for each step as developed in my department (global test for survival data, cross-validated penalized Cox regression and combination of significant pathways) and exemplify that on the well-known Dutch breast cancer data.
References
- Goeman, JJ; Oosting, J; Cleton-Jansen, AM; Anninga, JK; van Houwelingen, HC. 2005. Testing association of a pathway with survival using gene expression data. BIOINFORMATICS 21 (9): 1950-1957.
- Hans C. van Houwelingen, Tako Bruinsma, Augustinus A. M. Hart, Laura J. van't Veer, Lodewyk F. A. Wessels, Cross-validated Cox regression on microarray gene expression data, Statistics in Medicine, in press, Early View on Wiley InterScience
- Hans van Houwelingen and Jelle Goeman, Mining pathways in micro-array gene expression data, Conference of the International Society for Clinical Biostatistics, Szeged, Hungary, 2005
- van't Veer, LJ; Dai, HY; van de Vijver, MJ; He, YDD; Hart, AAM; Mao, M; Peterse, HL; van der Kooy, K; Marton, MJ; Witteveen, AT; Schreiber, GJ; Kerkhoven, RM; Roberts, C; Linsley, PS; Bernards, R; Friend, SH. 2002. Gene expression profiling predicts clinical outcome of breast cancer. NATURE 415 (6871): 530-536.
Man 03/10 2005, 14.15-15.00, Rom 446 SB II
Stipendiat Trond Sagerup: Partikkelfilter anvendt på reservoarmodeller
Abstract: Vi betrakter et oljereservoar under produksjon. Basert på brønnmålinger ønsker vi å si noe om egenskapene til reservoaret. Vi bruker et partikkelfilter, bootstrapfilteret, til å lage en stokastisk modell. Fokus vil ligge på anvendelsene og evt forbedringsmuligheter av modellen.
Man 19/09 2005, 14.15 - 15.00, OBS! Rom 446 i SB II
Nicolas Chopin (Department of Mathematics, University of Bristol, UK): Dynamic detection of change points in long time series
Abstract: We consider the problem of detecting change points (structural changes) in long sequences of data, whether in a sequential fashion or not, and without assuming prior knowledge of the number of these change points. We reformulate this problem as the Bayesian filtering and smoothing of a non standard state space model. Towards this goal, we build a hybrid algorithm that relies on particle filtering and Markov chain Monte Carlo ideas. The approach is illustrated by a GARCH change point model.
Spring 2005
Tir 19/04 2005, 14:15 - ca. 15:00, Rom 446 SB II
Harald Weedon-Fekjær (Stipendiat Kreftforeningen / Statistikker, Kreftregisteret): Hvor fort vokser brystkreftsvulster? (modellering av veksthastigheten til brystkreftsvulster)
Abstract: Veksthastigheten til brystkreftsvulster er sentral i utarbeidelsen av optimale screening strategier, evaluering av brystkreftscreening og planlegging av epidemiologiske studier. Likevel mangler vi gode kliniske studier av veksthastigheten da diagnostisert brystkreft må behandles av etiske / medisinske grunner. Løsningen har tradisjonelt vært å estimere vekst ut ifra screeningdata med såkalte Markov-modeller. Disse modellene gir interessante resultater, men har dessverre en rekke svakheter når det gjelder den praktiske tolkningen av resultatene. Vi har derfor laget en ny mulig modell for estimering av vekstraten til brystkreftsvulster basert på screeningdata, som vil bli presentert i foredraget.
Tir 12/04 2005, 14:15 - ca. 15:00, Rom 1329 SBII
Nial Friel (Department of Statistics, University of Glasgow): Estimation of hidden Markov random fields - a comparative study
Abstract: Markov random fields (MRFs) play an important role in spatial statistics, for example to model dependent categorical data, or indeed as a prior model in Bayesian image analysis. In fact early influential work on MCMC (Geman and Geman '84, Besag '83), focused on the latter problem where a latent process (or image) is corrupted by a known noise model - the problem then being to recover the hidden 'true' image. The prior distribution was modelled using an MRF (an Ising model) with fixed parameter value. However for a complete Bayesian analysis it is of interest to also include the parameter of the MRF as an unknown parameter (and not just the latent process). This however requires calculation of the likelihood of the MRF - a computationally demanding task. However recent methods have been proposed which allow the likelihood to be calculated for relatively small lattices. In this talk I aim illustrate different ways in which these methods may be extended to large practical sized lattices. I will compare their performance on simulated data, as well as examples involving gene expression levels from time course microarray experiments.
This work is joint with Tony Pettitt and Rob Reeves (QUT, Brisbane) and Ernst Wit (University of Glasgow).
Tir 15/02 2005, 14.15 - ca. 15.00, Rom 1329 SB II
Nikolai Ushakov: Mean squared error of kernel estimators for finite values of the sample size
Abstract: The performance of kernel estimators of probability density functions and their derivatives is usually studied via Taylor expansions and asymptotic approximation arguments, in which the smoothing parameter tends to zero with increasing sample size. In contrast, in this talk, we focus directly on the finite-sample situation. Respectively, instead of asymptotic expressions for the mean squared error of an estimator, we derive upper bounds, which hold for any sample size.
A special attention is paid to the sinc estimator (Fourier integral estimator — FIE) which has excellent asymptotic properties, but many believes is inferior to conventional estimators if the sample size is not very large. Studying the performance of this estimator for moderate sample sizes, we show that this is not true.
(Joint work with I.K.Glad and N.L.Hjort).
Fall 2004
Tir 30/11 2004, 14:15 - ca. 15:00, Rom 1329 SB II
Stipendiat Inge Christoffer Olsen: Estimating the natural mortality of costal cod using the EM-algorithm
Abstract: I will present an EM-algorithm to estimate the mortality rate of cod in a natural habitat. I use a continuous model, in contradiction to previous analyses of such data which use discrete models. The approach is based on the Nelson-Aalen estimator.
Tir 23/11 2004, 14:15 - ca. 15:00, Rom 1329 SB II
Alexandra M. Lewin (Imperial College, London): Bayesian Modelling of Gene Expression Data
Tir 26/10 2004, 14:15 - ca. 15:00, Rom 1329 SB II
Helge Langseth: Data Mining - mitt syn
Abstrakt: For en stund siden holdt jeg en introduksjon om datamining for studentene i faget IT2702: Kunstig intelligens
. Dette var et foredrag blottet for statistiske fordelinger og greske bokstaver, der jeg ga en intro til noen av utfordringene innenfor multivariabel statistikk (kamuflert som "data mining"), med spesielt fokusert mot beregningsmessige kompleksitetsproblemer. Bo har overtalt meg til å gi samme showet også i dette forumet.
Tir 12/10 2004, 14:15 - ca 15:00, Rom 1329 SB II
Pra Murthy (University of Queensland, Australia): Gjesteforelesning: Product Warranty – An Overview
Abstract: The seminar will start with a discussion on the varied roles and role and uses of warranty and introduce different types of warranty policies. There are several aspects to the study of warranty and these will be highlighted. The seminar will conclude with a brief discussion on the modelling for warranty cost analysis, warranty servicing and warranty logistics.
D. N. P. (Pra) Murthy obtained B.E. and M.E. degrees from Jabalpur University and the Indian Institute of Science in India and M.S. and Ph.D. degrees from Harvard University. He is currently a Research Professor in the School of Engineering at the University of Queensland and a Senior Scientific Advisor to the Norwegian University of Science and Technology. He has held visiting appointments at several universities in the USA, Europe and Asia. His current research interests include various aspects of technology management (new product development, strategic management of technology), operations management (lot sizing, quality, reliability, maintenance), and post-sale support (warranties, service contracts). He has authored or coauthored 20 book chapters, 145 journal papers and over 130 conference papers. He is a coauthor of five books – Mathematical Modelling (Pergamon Press, London, 1990), Warranty Cost Analysis (Marcel Dekker, New York, 1994), Reliability: Modelling, Prediction and Optimization (Wiley, New York, 2000), Weibull Models (Wiley, 2003) and Warranty Management and Product Manufacturing (Springer Verlag, 2005). He is co-editor of two books – Product Warranty Handbook (Marcel Dekker, New York, 1996) and Case Studies in Reliability and Maintenance (Wiley, New York, 2003). He is a member of several professional societies and is on the editorial boards of seven international journals. He has run short courses for industry on various topics in technology management, operations management and post-sale support in Australia, Asia, Europe and the USA.
Tir 05/10 2004, 14:15 - 15:00, Rom 1329 SB II
Håkon Tjelmeland: En bayesiansk CART-modell, men anvendelse på brystkreftdata
Tir 21/09 2004, 14:15 - ca. 15:00, Rom 1329 SB II
Stian Lydersen (Det medisinske fakultet): Hvordan teste for avhengighet i 2x2 tabeller?
Summary: The most common methods for testing association in 2x2 tables seem to be Pearson?s Chi square test and Fisher?s exact test. However, the former does not preserve the significance level, and the latter is conservative (has unnecessarily small power). There exist unconditional tests, such as Barnard's test, which are considerably more powerful that Fisher's exact test in moderate to small samples, while preserving the significance level. Unconditional tests are computationally demanding, but software is now available. Further, Fischer's mid p test gives approximately the same results as an unconditional test, and the mid p value is readily computed. Unconditional tests and the mid p approach ought to be used much more in practice than is the case today.
Tir 07/09 2004, 14:15 - ca. 15:00, Rom 1329 SB II
Simos Meintanis (National and Kapodistrian University of Athens): Bayesian procedures for change-point detection
Abstract: A class of Bayesian like procedures for detection of a change in the distribution of a sequence independent observations based on empirical characteristic functions is developed and their limit properties are studied. Theoretical results are accompanied by a simulation study.
Tir 31/08 2004, 14:15 - ca. 15:00, Rom 1329 SB II
Finn Lindgren (Lunds Universitet): Modelling and estimation of fluorescence spectra in tumor imaging
Abstract: An application of laser induced fluorescence to tumour imaging is analysed using a hierarchical statistical model. The location and size of a tumour is determined by estimating the relative concentration of a marker substance from its fluorescence spectrum.
Spring 2004
Man 02/02 2004, 14:15-15:00, Lunsjrommet 13. etasje
Jofrid Vårdal: Diplom: Metaanalyse i populasjonsdynamikk
Fall 2003
Fre 28/11 2003, 13.15 - 14.00, Rom 1236, 12. etg. Sentralbygg 2
Eirik Mo (Arvid Næss): Temaet er ennå ukjent
Fre 21/11 2003, 13.15-14.00, Rom 1236, 12. etg. Sentralbygg 2
Jofrid Frøland Vårdal: Temaet er ennå ukjent
Fre 07/11 2003, 13:15-14:00, Rom 1236, 12. etg. Sentralbygg 2
Renee X. de Menezes: An improved t-test to handle heteroscedasticity in microarray data analysis
Abstract: In small studies where main interest lies in comparing two group means, gene-specific expression variance estimates can be unstable and lead to wrong conclusions. We construct a test while modelling these variances hierarchically across all genes, and the result is not only simpler than other previously suggested approaches, but also optimal in its class. The hierarchical test is shown to be more powerful than its traditional version,and to generate less false positives, in a simulation study. This approach can be extended to cases where there are more than two groups.
Fre 31/10 2003, 13.15-14.00, Rom 1236, 12. etg. Sentralbygg 2
Hugo Hammer: Noen nye estimatorer for binære markovfelt
Abstract: Paramterestimering i binære markovfelt er en utfordrende oppgave. Jeg vil foreslå noen nye estimatorer for binære markovfelt. Estimatorene er konstruert på en slik måte at de bevarer både lokale og globale egenskaper i det observerte bildet (dataene). Det vil kort analyseres hvor godt de foreslåtte estimatorene fungerer.
Fre 03/10 2003, 13.15-14.00, Rom 1236, 12. etg. Sentralbygg 2
Inge Christoffer Olsen: How to estimate capture and mortality rates of fish using continous capture-recapture experiment data
Spring 2003
Man 07/04 2003, 13:15 - 14:00, Rom 1236 Sentralbygg 2
Jo Eidsvik: A Directional Metropolis-Hastings Algorithm
Man 31/03 2003, 13:15-15:00, Rom 546
Håvard Rue: Seminar om "computational methods"
Håvard Rue vil holde et foredrag om "Computational Methods": Gaussian Markov Random Field (GMRF) models.
a) What is it?
b) Basic properties & definitions
Dette er første del av en seminarrekke på 3 foredrag. Det er fullt mulig å kun komme å høre på bare dette første foredraget. De to neste foredragene er planlagt til mandagene 05.05 og 02.06 (også rom 546 og 13.15-15.00).
Man 24/03 2003, 13:15 - 14:00, Rom 1236, 12. etg. Sentralbygg 2
Håkon Tjelmeland: Litt om intrinsic gaussiske markovfelt
Man 10/03 2003, 13:15 - 14:00, Rom 1236, 12. etg. Sentralbygg 2
Ayele Goshu, Henning Omre & John Tyssedal: Samplingstrategi i Inversproblemer - Hvordan eksamensoppgave 3b i Statistikk 1 ble et pyntelig lite forskningsproblem
Man 24/02 2003, 13:15 - 14:00, Rom 1236 Sentralbygg 2
Steinar Engen: Stokastisitet i populasjonsmodeller
Fall 2002
Tor 07/11 2002, 14:15 - 15:00, Rom 1236 SBII
Gunnar Taraldsen: Sufficient Conditional Monte Carlo
Tor 24/10 2002, 14:15 - 15:00, Rom 1236 SBII
John Tyssedal: Factor screening, Factor sparsity and Projectivity
Tor 03/10 2002, 14:00 - 15:15, Rom 1236 SBII
Stian Lydersen: Tester for avhengighet r x c tabeller med få observasjoner
Tor 26/09 2002, 14:15 - 15:00, Rom 1236 SBII
Ingelin Steinsland: Parallell eksakt sampling av Gaussiske Markov felt
Tor 19/09 2002, 14:15 - 15:00, Rom 1236 SBII
Håvard Rue: Approksimasjon av Gaussiske Markov felt
Spring 2002
Tor 02/05 2002, 12:15 - 13:00, Rom 1236 SBII
Turid Follestad: Estimating parameters in non-linear state-space models by sampling from marginal posteriors
Tor 25/04 2002, 12:15 - 13:00, Rom 1236 SBII
Magnar Lillegård: Populasjonsdynamikk for norsk vårgytende sild - historien om det store sildekrasjet
Tor 18/04 2002, 12:15 - 13:00, Rom 1236 SBII
Odd Kolbjørnsen: Inversproblemer - matematikk eller statistikk
Tor 11/04 2002, 12:15 - 13:00, Rom 1236 SBII
Jarle Tufto: Effekter av rømt oppdrettlaks - en kvantitativ genetisk, populasjonsdynamisk modell
Tor 04/04 2002, 12:15 - 13:00, Rom 1236 SBII
Inge Olsen: Statistical analysis of animal movement and survival
Tor 07/03 2002, 12:15 - 13:00, Rom 1236 SBII
Mette Langaas: Statistical analysis of data from functional genomics
Tor 21/02 2002, 12:15 - 13:00, Rom 1236 SBII
Ayele Goshu: Stochastic models for groundwater flow
Tor 14/02 2002, 12:15 - 13:00, Rom 1236 SBII
Steinar Engen: Bevaringsbiologi - en arena for stokastisk modellering
Tor 07/02 2002, 12:15 - 13:00, Rom 1236 SBII
Jo Eidsvik: Geology, well logs and hidden Markov chains
Tor 31/01 2002, 12:15 - 13:00, Rom 1236 SBII
Henning Omre: Production Forecasts for petroleum reservoirs or Calibration of complex computer models
Spring 2001
Ons 31/01 2001, 12:15 - 13:00, Rom 1236 SBII
Jo Eidsvik: Geologi, brønner og skjulte Markov kjeder
Ons 24/01 2001, 12:15 - 13:00, Rom 1236 SBII
Henning Omre: Aktuelle problemstillinger