MA8701 General Statistical Methods
Reading list with key concepts
Part 1: Regularized linear and generalized linear models (25%)
- Hastie, Tibshirani, Wainwright: "Statistical Learning with Sparsity: The Lasso and Generalizations". The newest version of the ebook can be downloaded from the page of Trevor Hastie: https://trevorhastie.github.io/: Chapters 2.1-2.6, 2.9, 3.1-3.2, 3.7, 4.1-4.3, 4.5-4.6, 5.1, 5.4, 6.0,6.2
- Single/multi-sampling splitting part of Dezeure, Bühlmann, Meinshausen (2015). "High-Dimensional Inference: Confidence Intervals, p-Values and R-Software hdi". Statistical Science, 2015, Vol. 30, No. 4, 533–558 DOI: 10.1214/15-STS527 (focus on the single/multiple sample splitting).
- Intro to lasso - Chapters 2.1-2.6, 5.1, 5.4: Linear regression, Why sparsity?, Least absolute shrinkage and selection operator (lasso) and related approaches, Fitting the model / coordinate descentfor lasso
- GLM with regularization Chapters 2.9, 3.1-3.2, 3.7, 5.4: Generalized linear models (GLM), Logistic regession with l1 (example), Fitting the model
- Generalizations of lasso – Chapter 4.1-4.3, 4.5-4.6: Elastic net, Relaxed lasso, Grouped lasso, Fused lasso, Non-convex penalties
- Inference for lasso – Chapter 6.0, 6.2 and Dezeure et al., 2015: Bootstrap method, Multi sample-splitting
Part 2: Smoothing and splines (25%)
- Friedman, Hastie and Tibshirani (2008): Elements of Statistical Learning. Chapter 5: 5.1-5,6 and Chapter 6: 6.1-6.8. Book at https://web.stanford.edu/~hastie/ElemStatLearn/
- Chapter 5: Linear basis expansion, Natural cubic spline, Smoothing spline, Degrees of freedom
- Chapter 6: Kernel smoother, Local linear (and polynomial) regression, Kernel density estimation and classification, Radial basis function, Mixture model
Part 3: Experimental design in statistical learning (10%)
(to download you must be on NTNU vpn)
- Article: Design of experiments and response surface methodology to tune machine learning hyperparameters, with a random forest case-study (2018), Gustavo A. Lujan-Moreno, Phillip R. Howard, Omar G. Rojas, Douglas Montgomery, Expert Systems with Applications, Volume 109, https://doi.org/10.1016/j.eswa.2018.05.024
- Article: Design and Analysis of Classifier Learning Experiments in Bioinformatics: Survey and Case Studies (2012), Ozan Irsoy ; Olcay Taner Yildiz ; Ethem Alpaydin, IEEE/ACM Transactions on Computational Biology and Bioinformatics ( Volume: 9 , Issue: 6 , Nov.-Dec. 2012 ) https://doi.org/10.1109/TCBB.2012.117
- How to optimize hyperparameters
- How to compare algorithms on the same dataset
- How to compare algorithms on several dataset
Screening, Steepest ascent, CCD, BBD, Canonical analysis, Cross-validation. Paired t, One-way anova, Wilcoxon signed rank test, Friedman test.
Part 4: Deep neural nets (30%)
- François Chollet with J. J. Allaire (2018) Deep learning with R, https://www.manning.com/books/deep-learning-with-r
- François Chollet (2017) Deep learning with Python https://www.manning.com/books/deep-learning-with-python.
You choose if you read the R or Python version, both built on Keras.
- Sequentially layered networks: architecture, activation functions, loss function: matched to the problem (regression, 2 or more classes in classification)
- Tensors, inner products and their use in neural nets
- Backpropagations (for chain rule), minibatch stochastic gradient descent, choice of learning rate and variants
- Regularization: weight decay, early stopping, drop-out
- Recurrent neural networks: MORE
- Convolutional neural networks: MORE
Part 5: Active learning (10%)
What is the problem (we do not have much data where we know the response="little labelled data") - how can this be handled (overview) and in particular what are (some) solutions within the field of active learning (know about some strategies).
1. Strategies to handle little labelled data:
- transfer learning,
- data augmentation,
- active learning,
- semi-supervised and
- multitask learning
- model constraining
- one-shot learning.
2. Active learning:
- Pool based active learning
- Stream based active learning
- Sample selection strategies
- Challenges with using an active learning strategy in practice
- Case studies