Seminar on Statistics and Data Science

This seminar series is organized by the research group in mathematical statistics and features talks on advances in methods of data analysis, statistical theory, and their applications.
The speakers are external guests as well as researchers from other groups at TUM.

All talks in the seminar series are listed in the Munich Mathematical Calendar.


The seminar takes place in room BC1 2.01.10 under the current rules and simultaneously via zoom. To stay up-to-date about upcoming presentations please join our mailing list. You will receive an email to confirm your subscription.

Zoom link

Join the seminar. Please use your real name for entering the session. The session will start roughly 10 minutes prior to the talk.


Upcoming talks

19.10.2022 13:15 Frank Röttger (University of Geneva, Switzerland): Graph Laplacians in Statistics

The Laplacian matrix of an undirected graph with positive edge weights encodes graph properties in matrix form. In this talk, we will discuss how Laplacian matrices appear prominently in multiple applications in statistics and machine learning. Our interest in Laplacian matrices originates in graphical models for extremes. For Hüsler--Reiss distributions, which are considered as an analogue of Gaussians in extreme value theory, they characterize an extremal notion of multivariate total positivity of order 2 (MTP2). This leads to a consistent estimation procedure with a typically sparse graphical structure. Furthermore, the underlying convex optimization problem under Laplacian constraints allows for a simple block descent algorithm that we implemented in R. An active area of research in machine learning are Laplacian-constrained Gaussian graphical models. These models admit structure learning under various connectivity constraints. Multiple algorithms for these problems with different lasso-type penalties are available in the literature. A surprising appearance of Laplacian matrices is in the design of discrete choice experiments. Here, the Fisher information of a discrete choice design is a Laplacian matrix, which gives rise to a new approach for learning locally D-optimal designs.

26.10.2022 12:15 Helmut Farbmacher (TUM): t.b.a.


Previous talks

14.09.2022 12:00 Leena C. Vankadara (University of Tübingen): Is Memorization Compatible with Causal Learning? The Case of High-Dimensional Linear Regression.

Deep learning models exhibit a rather curious phenomenon. They optimize over hugely complex model classes and are often trained to memorize the training data. This is seemingly contradictory to classical statistical wisdom, which suggests avoiding interpolation in favor of reducing the complexity of the prediction rules. A large body of recent work partially resolves this contradiction. It suggests that interpolation does not necessarily harm statistical generalization and may even be necessary for optimal statistical generalization in some settings. This is, however, an incomplete picture. In modern ML, we care about more than building good statistical models. We want to learn models which are reliable and have good causal implications. Under a simple linear model in high dimensions, we will discuss the role of interpolation and its counterpart --- regularization --- in learning better causal models.

14.09.2022 13:45 Johannes Lederer (Ruhr-University Bochum): Sparse Deep Learning

Sparsity is popular in statistics and machine learning, because it can avoid overfitting, speed up computations, and facilitate interpretations. In deep learning, however, the full potential of sparsity still needs to be explored. This presentation first recaps sparsity in the framework of high-dimensional statistics and then introduces sparsity-inducing methods and corresponding theory for modern deep-learning pipelines.

14.09.2022 15:00 Michaël Lalancette (University of Toronto, CAN): Estimation of bivariate and spatial tail models under asymptotic dependence and independence

Multivariate extreme value theory mostly focuses on asymptotic dependence, where the probability of observing a large value in one of the variables is of the same order as that of observing a large value in all variables simultaneously. There is growing evidence, however, that asymptotic independence prevails in many data sets. Available statistical methodology in this setting is scarce and not well understood theoretically. We revisit non-parametric estimation of bivariate tail dependence and introduce rank-based M-estimators for parametric models that may include both asymptotic dependence and asymptotic independence, without requiring prior knowledge on which of the two regimes applies. We further show how the method can be leveraged to obtain parametric estimators in spatial tail models. All the estimators are proved to be asymptotically normal under minimal regularity conditions. The methodology is illustrated through an application to extreme rainfall data.

07.09.2022 12:15 Marco Scutari (Polo Universitario Lugano, Switzerland): Bayesian Network Models for Continuous-Time and Structured Data

Bayesian networks (BNs) are a versatile and powerful tool to model complex phenomena and the interplay of their components in a probabilistically principled way. Moving beyond the comparatively simple case of completely observed, static data, which has received the most attention in the literature, I will discuss how BNs can be extended to model continuous data and data in which observations are not independent and identically distributed. For the former, I will discuss continuous-time BNs. For the latter, I will show how mixed effects models can be integrated with BNs to get the best of both worlds.

01.08.2022 12:15 Benjamin Hollering (Max-Planck-Institut für Mathematik in den Naturwissenschaften, Leipzig): Certifying Generic Identifiability with Algebraic Matroids

A statistical model is identifiable if the map parameterizing the model is injective. This means that the parameters producing a probability distribution in the model can be uniquely determined from the distribution itself which is a critical property for meaningful data analysis. In this talk I'll discuss a new strategy for proving that discrete parameters are identifiable that uses algebraic matroids associated to statistical models. This technique allows us to avoid elimination and is also parallelizable. I'll then discuss a new extension of this technique which utilizes oriented matroids to prove identifiability results that the original matroid technique is unable to obtain.