Seminar on Statistics and Data Science

19.11.2025 12:15 Mahsa Taheri Ganjhobadi (Universität Hamburg): Sparsity and Efficiency in Diffusion Models

Diffusion models are one of the key architectures of generative AI. Their main drawback, however, is the high computational cost. The first part of the talk introduces diffusion models, outlining their main ideas and their role in modern generative AI. The second part of the talk will focus on our recent research showing how the concept of sparsity, well known especially in statistics, can provide a pathway to more efficient diffusion pipelines. Our mathematical guarantees prove that sparsity can reduce the influence of the input dimension on computational complexity to that of a much smaller intrinsic dimension of the data. Our empirical findings further confirm that inducing sparsity can indeed lead to better samples at a lower cost.

Quelle

05.11.2025 12:15 Nicolas-Domenic Reiter (TUM): A frequency domain approach to causal inference in discrete-time processes

The talk is divided into two parts. In the first part, I will introduce structural equation processes as a model for causal inference in discrete-time stationary processes. A structural equation process (SEP) consists of a directed graph, an independent stationary (zero-mean) process for every vertex of the graph, and a filter (i.e., an absolutely summable sequence) for every link on the graph. Every structural vector autoregressive (SVAR) process, a commonly used linear time series model, admits a representation as a SEP. Furthermore, the Fourier-transformed SEP representation of an SVAR process is parameterized over the field of rational functions with real coefficients. Using this frequency domain parameterization, we will see that d- and t- separation statements about the causal graph (associated with the SVAR process) are generically characterized by rank conditions on the spectral density of the SVAR process. Here, the spectral density is considered as a matrix over the field of rational functions with real coefficients. Additionally, we will see that the Fourier-transformed SEP parameterization of an SVAR process comes with a notion of rational identifiability for the Fourier transformed link filters. This notion allows to reason about identifiability in the presence of latent confounding processes. For instance, the recent latent factor half-trek criterion can be used to determine if the effect (i.e., the associated link function) between two potentially confounded processes is a rational function of the spectral density of the observed processes. \[ \] In the second part of the talk, I will expand the SEP framework to include a specific class of non-stationary linear processes. This class of non-stationary SEPs includes SVAR processes with periodically changing coefficients. I will also demonstrate how this framework can be used to reason about identifiability in subsampled processes, i.e., when observations are gathered at a lower frequency than the frequency at which causal effects occur.

Quelle

08.10.2025 14:00 Alois Wieshuber (DESY, Base4NFDI), Melina Jander (Base4NFDI): Leveraging Base4NFDI for your research: Eight basic services for research data management (RDM) and what comes next

Get to know Base4NFDI, a framework that develops, integrates, and sustains basic services for research data management across all domains and disciplines. The Base4NFDI portfolio currently includes eight essential services: IAM4NFDI – Identity and Access Management PID4NFDI – Persistent Identifier Service TS4NFDI – Terminology Services Jupyter4NFDI – Central JupyterHub service DMP4NFDI – Data (and Software) Management Plans KGI4NFDI – Knowledge Graph Infrastructure nfdi.software – Research Software registry/catalog RDMTraining4NFDI – Training and education service for RDM This talk will explore how these services can support your research by aligning workflows with the FAIR principles. You’ll also learn about the Base4NFDI service lifecycle - from initialization to ramp-up - and how your community’s needs and contributions can help shape the next generation of basic services.

Quelle

23.07.2025 12:15 Oezge Sahin (TU Delft, NL): Effects of covariate discretization on conditional quantiles in bivariate copulas

Clinical data often include a mix of continuous measurements and covariates that have been discretized, typically to protect privacy, meet reporting obligations, or simplify clinical interpretation. This combination, along with the nonlinear and tail-asymmetric dependence frequently observed in clinical data, affects the behavior of regression and variable-selection methods. Copula models, which separate marginal behavior from the dependence structure, provide a principled approach to studying these effects. In this talk, we analyze how discretizing a continuous covariate into equiprobable categories impacts conditional quantiles and likelihoods in bivariate copula models. For the Clayton and Frank families, we derive closed-form anchor points: for a given category, we identify the continuous covariate value at which the conditional quantile under the continuous model matches that of the discretized one. These anchors provide an exact measure of discretization bias, which is small near the center but can be substantial in the tails. Simulations across five copula families show that likelihood-based variable selection may over- or under-weight discretized covariates, depending on the dependence structure. Through simulations, we conclude by comparing polyserial and Pearson correlations, as well as Kendall’s tau (-b), in the same settings. Our results have practical implications for copula-based modeling of mixed-type data.

Quelle

23.07.2025 16:00 Thomas Nagler (LMU Munich): On dimension reduction in conditional dependence models

Inference of the conditional dependence structure is challenging when many covariates are present. In numerous applications, only a low-dimensional projection of the covariates influences the conditional distribution. The smallest subspace that captures this effect is called the central subspace in the literature. We show that inference of the central subspace of a vector random variable Y conditioned on a vector of covariates can be separated into inference of the marginal central subspaces of the components of Y conditioned on X and on the copula central subspace, which we define in this paper. Further discussion addresses sufficient dimension reduction subspaces for conditional association measures. An adaptive nonparametric method is introduced for estimating the central dependence subspaces, achieving parametric convergence rates under mild conditions. Simulation studies illustrate the practical performance of the proposed approach.

Quelle

22.07.2025 10:00 Junhyung Park (ETH Zürich, CH): Causal Spaces: A Measure-Theoretic Axiomatisation of Causality

While the theory of causality is widely viewed as an extension of probability theory, a view which we share, there was no universally accepted, axiomatic framework for causality analogous to Kolmogorov's measure-theoretic axiomatization for the theory of probabilities. Instead, many competing frameworks exist, such as the structural causal models or the potential outcomes framework, that mostly have the flavor of statistical models. To fill this gap, we propose the notion of causal spaces, consisting of a probability space along with a collection of transition probability kernels, called causal kernels, which satisfy two simple axioms and which encode causal information that probability spaces cannot encode. The proposed framework is not only rigorously grounded in measure theory, but it also sheds light on long-standing limitations of existing frameworks, including, for example, cycles, latent variables, and stochastic processes. Our hope is that causal spaces will play the same role for the theory of causality that probability spaces play for the theory of probabilities.

Quelle

09.07.2025 12:15 Nils Sturma (TU München): Identifiability in Sparse Factor Analysis

Factor analysis is a statistical technique that explains correlations among observed random variables with the help of a smaller number of unobserved factors. In traditional full-factor analysis, each observed variable is influenced by every factor. However, many applications exhibit interesting sparsity patterns, that is, each observed variable only depends on a subset of the factors. In this talk, we will discuss parameter identifiability of sparse factor analysis models. In particular, we present a sufficient condition for parameter identifiability that generalizes the well-known Anderson-Rubin condition and is tailored to the sparse setup. This is joint work with Mathias Drton, Miriam Kranzlmüller, and Irem Portakal.

Quelle

09.07.2025 13:15 Pratik Misra (TU München): Structural identifiability in graphical continuous Lyapunov models

Graphical continuous Lyapunov models offer a novel framework for the statistical modeling of correlated multivariate data. These models define the covariance matrix through a continuous Lyapunov equation, parameterized by the drift matrix of the underlying dynamic process. In this talk, I will discuss key results on the defining equations of these models and explore the challenge of structural identifiability. Specifically, I will present conditions under which models derived from different directed acyclic graphs (DAGs) are equivalent and provide a transformational characterization of such equivalences. This is based on ongoing work with Carlos Amendola, Tobias Boege, and Ben Hollering.

Quelle

23.06.2025 12:15 Speaker has cancelled: Qingqing Zhai (Shanghai University, CN): Modeling Complex System Deterioration: From Unit Degradation to Networked Recurrent Failures

This presentation addresses statistical challenges in modeling the deterioration of complex systems, spanning from individual unit degradation to interdependent network failures. First, we introduce statistical degradation data modeling using stochastic processes. Then, we shift to modeling recurrent failures in large-scale infrastructure networks (e.g., water distribution systems). Motivated by 16 years of Scottish Water pipe failure data, we propose the novel Network Gamma-Poisson Autoregressive NHPP (GPAN) model. This two-layer framework captures temporal dynamics via Non-Homogeneous Poisson Processes (NHPPs) with node-specific frailties and spatial dependencies through a gamma-Poisson autoregressive scheme structured by the network's Directed Acyclic Graph (DAG). To overcome computational intractability, a scalable sum-product algorithm based on factor graphs and message passing is developed for efficient inference, enabling application to networks with tens of thousands of nodes. We demonstrate how this approach provides accurate failure predictions, identifies high-risk clusters, and supports operational management and risk assessment. The methodologies presented offer powerful tools for reliability analysis across diverse engineering contexts, from product lifespan prediction to critical infrastructure resilience.

Quelle

Upcoming talks

12.01.2026 14:15 Thomas Mikosch (University Copenhagen) : Modeling extremal clusters in time series

14.01.2026 13:00 Fadoua Balabdaoui (ETH Zürich): t.b.a.

Previous talks

within the last 180 days

19.11.2025 12:15 Mahsa Taheri Ganjhobadi (Universität Hamburg): Sparsity and Efficiency in Diffusion Models

05.11.2025 12:15 Nicolas-Domenic Reiter (TUM): A frequency domain approach to causal inference in discrete-time processes

08.10.2025 14:00 Alois Wieshuber (DESY, Base4NFDI), Melina Jander (Base4NFDI): Leveraging Base4NFDI for your research: Eight basic services for research data management (RDM) and what comes next

23.07.2025 12:15 Oezge Sahin (TU Delft, NL): Effects of covariate discretization on conditional quantiles in bivariate copulas

23.07.2025 16:00 Thomas Nagler (LMU Munich): On dimension reduction in conditional dependence models

22.07.2025 10:00 Junhyung Park (ETH Zürich, CH): Causal Spaces: A Measure-Theoretic Axiomatisation of Causality

09.07.2025 12:15 Nils Sturma (TU München): Identifiability in Sparse Factor Analysis

09.07.2025 13:15 Pratik Misra (TU München): Structural identifiability in graphical continuous Lyapunov models

23.06.2025 12:15 Speaker has cancelled: Qingqing Zhai (Shanghai University, CN): Modeling Complex System Deterioration: From Unit Degradation to Networked Recurrent Failures