Seminar on Statistics and Data Science

This seminar series is organized by the research group in mathematical statistics and features talks on advances in methods of data analysis, statistical theory, and their applications.
The speakers are external guests as well as researchers from other groups at TUM.

All talks in the seminar series are listed in the Munich Mathematical Calendar.

 

The seminar takes place in room BC1 2.01.10 under the current rules and simultaneously via zoom. To stay up-to-date about upcoming presentations please join our mailing list. You will receive an email to confirm your subscription.

Zoom link

Join the seminar. Please use your real name for entering the session. The session will start roughly 10 minutes prior to the talk.

 

Upcoming talks

06.07.2022 12:15 Anastasios Panagiotelis (University of Sydney, AUS): Anomaly detection with kernel density estimation on manifolds

Manifold learning can be used to obtain a low-dimensional representation of the underlying manifold given the high-dimensional data. However, kernel density estimates of the low-dimensional embedding with a fixed bandwidth fail to account for the way manifold learning algorithms distort the geometry of the underlying Riemannian manifold. We propose a novel kernel density estimator for any manifold learning embedding by introducing the estimated Riemannian metric of the manifold as the variable bandwidth matrix for each point. The geometric information of the manifold guarantees a more accurate density estimation of the true manifold, which subsequently could be used for anomaly detection. To compare our proposed estimator with a fixed-bandwidth kernel density estimator, we run two simulations with 2-D metadata mapped into a 3-D swiss roll or twin peaks shape and a 5-D semi-hypersphere mapped in a 100-D space, and demonstrate that the proposed estimator could improve the density estimates given a good manifold learning embedding and has higher rank correlations between the true and estimated manifold density. A shiny app in R is also developed for various simulation scenarios. The proposed method is applied to density estimation in statistical manifolds of electricity usage with the Irish smart meter data. This demonstrates our estimator's capability to fix the distortion of the manifold geometry and to be further used for anomaly detection in high-dimensional data.
mehr

19.07.2022 12:15 Tobias Boege (Max-Planck-Institut für Mathematik in den Naturwissenschaften, Leipzig): The complexity of Gaussian conditional independence models

We study statistical models of regular Gaussian distributions given by assumptions about the signs of partial correlations. This includes conditional independence models and graphical modeling devices such as Markov and Bayes networks. For these models, we consider the following basic questions: (1) How hard is it (complexity-theoretically) to check if the model specification is inconsistent? (2) If it is consistent, how hard is it (algebraically) to write down a covariance matrix from the model? (3) How badly shaped (homotopy-theoretically) can these models be? For all of these questions the answer is "it is as bad as it could possibly be".
mehr

01.08.2022 12:15 Benjamin Hollering (Max-Planck-Institut für Mathematik in den Naturwissenschaften, Leipzig): t.b.a.

t.b.a.
mehr

07.09.2022 12:15 Marco Scutari (Polo Universitario Lugano, Switzerland): t.b.a.

t.b.a.
mehr

14.09.2022 12:15 Michaël Lalancette (University of Toronto, CAN): t.b.a.

t.b.a.
mehr

Previous talks

22.06.2022 12:15 Han Li (University of Melbourne, AUS): Joint Extremes in Temperature and Mortality: A Bivariate POT Approach

This research project contributes to insurance risk management by modeling extreme climate risk and extreme mortality risk in an integrated manner via extreme value theory (EVT). We conduct an empirical study using monthly temperature and death data and find that the joint extremes in cold weather and old-age death counts exhibit the strongest level of dependence. Based on the estimated bivariate generalized Pareto distribution, we quantify the extremal dependence between death counts and temperature indexes. Methodologically, we employ the bivariate peaks over threshold (POT) approach, which is readily applicable to a wide range of topics in extreme risk management.
mehr

22.06.2022 13:15 Hans Manner (University of Graz, AT): Testing the equality of changepoints (joint with Siegfried Hörmann, TU Graz)

Testing for the presence of changepoints and determining their location is a common problem in time series analysis. Applying changepoint procedures to multivariate data results in higher power and more precise location estimates, both in online and offline detection. However, this requires that all changepoints occur at the same time. We study the problem of testing the equality of changepoint locations. One approach is to treat common breaks as a common feature and test, whether an appropriate linear combination of the data can cancel the breaks. We propose how to determine such a linear combination and derive the asymptotic distribution resulting CUSUM and MOSUM statistics. We also study the power of the test under local alternatives and provide simulation results of its nite sample performance. Finally, we suggest a clustering algorithm to group variables into clusters that are co-breaking.
mehr

15.06.2022 12:15 Harry Joe (University of British Columbia, CAN): Comparison of dependence graphs based on different functions of correlation matrices

A dependence graph for a set of variables has rules for which pairs of variables are connected. In the literature on dependence graphs for gene expression measurements, there have been several rules for connecting pairs of variables based on a correlation matrix: (a) absolute correlation of the pair exceed a threshold; (b) absolute partial correlation of the pair given the rest exceed a threshold; (c) first-order conditional independence rule of Magwene and Kim (2004). These three methods will be compared with the dependence graph from a truncated partial correlation vine with thresholding. The comparisons are made for correlation matrices that are derived from (a) factor dependence structures, (b) Markov tree structure, and (c) variables that form groups with strong within group dependence and weaker between group dependence. If there are latent variables, the graphs are compared with and without them. The goal is to show that more parsimonious and interpretable graphs can be obtained with inclusion of latent variables.
mehr

01.06.2022 12:15 Jack Kuipers (ETH Zürich): Efficient sampling for Bayesian networks and benchmarking their structure learning

Bayesian networks are probabilistic graphical models widely employed to understand dependencies in high-dimensional data, and even to facilitate causal discovery. Learning the underlying network structure, which is encoded as a directed acyclic graph (DAG) is highly challenging mainly due to the vast number of possible networks in combination with the acyclicity constraint, and a wide plethora of algorithms have been developed for this task. Efforts have focused on two fronts: constraint-based methods that perform conditional independence tests to exclude edges and score and search approaches which explore the DAG space with greedy or MCMC schemes. We synthesize these two fields in a novel hybrid method which reduces the complexity of Bayesian MCMC approaches to that of a constraint-based method. This enables full Bayesian model averaging for much larger Bayesian networks, and offers significant improvements in structure learning. To facilitate the benchmarking of different methods, we further present a novel automated workflow for producing scalable, reproducible, and platform-independent benchmarks of structure learning algorithms. It is interfaced via a simple config file, which makes it accessible for all users, while the code is designed in a fully modular fashion to enable researchers to contribute additional methodologies. We demonstrate the applicability of this workflow for learning Bayesian networks in typical data scenarios. References: doi:10.1080/10618600.2021.2020127 and arXiv:2107.03863
mehr

25.05.2022 12:15 Oksana Chernova (Nationale Taras-Schewtschenko-Universität Kiew, Ukraine): Estimation in Cox proportional hazards model with measurement errors

The Cox proportional hazards model is a semiparametric regression model that can be used in medical research, engineering or insurance for investigating the association between the survival time (the so-called lifetime) of an object and predictor variables. We investigate the Cox proportional hazards model for right-censored data, where the baseline hazard rate belongs to an unbounded set of nonnegative Lipschitz functions, with fixed constant, and the vector of regression parameters belongs to a compact parameter set, and in addition, the time-independent covariates are subject to measurement errors. We construct a simultaneous estimator of the baseline hazard rate and regression parameter, present asymptotic results and discuss goodness-of-fit tests.
mehr