Seminar on Statistics and Data Science

This seminar series is organized by the research group in mathematical statistics and features talks on advances in methods of data analysis, statistical theory, and their applications. The speakers are external guests as well as researchers from other groups at TUM. All talks in the seminar series are listed in the Munich Mathematical Calendar.

The seminar takes place in room BC1 2.01.10 under the current rules and simultaneously via zoom. To stay up-to-date about upcoming presentations please join our mailing list. You will receive an email to confirm your subscription.

Upcoming talks

(keine Einträge)

Previous talks

within the last 90 days

30.11.2022 12:15 Elizabeth Gross (University of Honolulu, USA): Phylogenetic network inference with invariants

Phylogenetic networks provide a means of describing the evolutionary history of sets of species believed to have undergone hybridization or horizontal gene flow during the course of their evolution. The mutation process for a set of such species can be modeled as a Markov process on a phylogenetic network. Previous work has shown that a site-pattern probability distributions from a Jukes-Cantor phylogenetic network model must satisfy certain algebraic invariants, i.e. polynomial relationships. As a corollary, aspects of the phylogenetic network are theoretically identifiable from site-pattern frequencies. In practice, because of the probabilistic nature of sequence evolution, the phylogenetic network invariants will rarely be satisfied, even for data generated under the model. Thus, using network invariants for inferring phylogenetic networks requires some means of interpreting the residuals when observed site-pattern frequencies are substituted into the invariants. In this work, we propose an approach that combines statistical learning and phylogenetic invariants to infer small, level-one phylogenetic networks, and we discuss how the approach can be extended to infer larger networks. This is joint work with Travis Barton, Colby Long, and Joseph Rusinko.

26.10.2022 12:15 Helmut Farbmacher (TUM): Detecting Grouped Local Average Treatment Effects and Selecting True Instruments: With an Application to the Estimation of the Effect of Imprisonment on Recidivism

Under an endogenous binary treatment with heterogeneous effects and multiple instruments, we propose a two-step procedure for identifying complier groups with identical local average treatment effects (LATE) despite relying on distinct instruments, even if several instruments violate the identifying assumptions. Our procedure is based on the fact that the LATE is homogeneous for instruments which (i) satisfy the LATE assumptions (instrument validity and treatment monotonicity in the instrument) and (ii) generate identical complier groups in terms of treatment propensities given the respective instruments. Under the plurality assumption that within each set of instruments with identical treatment propensities, instruments truly satisfying the LATE assumptions are the largest group, our procedure permits identifying these true instruments in a data driven way. We also provide a simulation study investigating the finite sample properties of our approach and an empirical application investigating the effect of incarceration on recidivism in the US with judge assignments serving as instruments.

19.10.2022 13:15 Frank Röttger (University of Geneva, Switzerland): Graph Laplacians in Statistics

The Laplacian matrix of an undirected graph with positive edge weights encodes graph properties in matrix form. In this talk, we will discuss how Laplacian matrices appear prominently in multiple applications in statistics and machine learning. Our interest in Laplacian matrices originates in graphical models for extremes. For Hüsler--Reiss distributions, which are considered as an analogue of Gaussians in extreme value theory, they characterize an extremal notion of multivariate total positivity of order 2 (MTP2). This leads to a consistent estimation procedure with a typically sparse graphical structure. Furthermore, the underlying convex optimization problem under Laplacian constraints allows for a simple block descent algorithm that we implemented in R. An active area of research in machine learning are Laplacian-constrained Gaussian graphical models. These models admit structure learning under various connectivity constraints. Multiple algorithms for these problems with different lasso-type penalties are available in the literature. A surprising appearance of Laplacian matrices is in the design of discrete choice experiments. Here, the Fisher information of a discrete choice design is a Laplacian matrix, which gives rise to a new approach for learning locally D-optimal designs.

14.09.2022 12:00 Leena C. Vankadara (University of Tübingen): Is Memorization Compatible with Causal Learning? The Case of High-Dimensional Linear Regression.

Deep learning models exhibit a rather curious phenomenon. They optimize over hugely complex model classes and are often trained to memorize the training data. This is seemingly contradictory to classical statistical wisdom, which suggests avoiding interpolation in favor of reducing the complexity of the prediction rules. A large body of recent work partially resolves this contradiction. It suggests that interpolation does not necessarily harm statistical generalization and may even be necessary for optimal statistical generalization in some settings. This is, however, an incomplete picture. In modern ML, we care about more than building good statistical models. We want to learn models which are reliable and have good causal implications. Under a simple linear model in high dimensions, we will discuss the role of interpolation and its counterpart --- regularization --- in learning better causal models.

14.09.2022 13:45 Johannes Lederer (Ruhr-University Bochum): Sparse Deep Learning

Sparsity is popular in statistics and machine learning, because it can avoid overfitting, speed up computations, and facilitate interpretations. In deep learning, however, the full potential of sparsity still needs to be explored. This presentation first recaps sparsity in the framework of high-dimensional statistics and then introduces sparsity-inducing methods and corresponding theory for modern deep-learning pipelines.

For talks more than 90 days ago please have a look at the Munich Mathematical Calendar (filter: "Oberseminar Statistics and Data Science").