Seminar on Statistics and Data Science

This seminar series is organized by the research group in statistics and features talks on advances in methods of data analysis, statistical theory, and their applications. The speakers are external guests as well as researchers from other groups at TUM. All talks in the seminar series are listed in the Munich Mathematical Calendar.

The seminar takes place in room BC1 2.01.10 under the current rules and simultaneously via zoom. To stay up-to-date about upcoming presentations please join our mailing list. You will receive an email to confirm your subscription.

Upcoming talks

15.02.2023 13:00 Seyed Jalal Etesami (TUM): t.b.a.

t.b.a.
mehr

28.02.2023 13:15 Yanbo Tang (Imperial College London): t.b.a.

t.b.a.
mehr

Previous talks

within the last 90 days

25.01.2023 13:00 Merle Behr (Universität Regensburg): Provable Boolean interaction recovery from tree ensemble obtained via random forests

Random Forests (RFs) are at the cutting edge of supervised machine learning in terms of prediction performance, especially in genomics. Iterative RFs (iRFs) use a tree ensemble from iteratively modified RFs to obtain predictive and stable nonlinear or Boolean interactions of features. They have shown great promise for Boolean biological interaction discovery that is central to advancing functional genomics and precision medicine. However, theoretical studies into how tree-based methods discover Boolean feature interactions are missing. Inspired by the thresholding behavior in many biological processes, we first introduce a discontinuous nonlinear regression model, called the “Locally Spiky Sparse” (LSS) model. Specifically, the LSS model assumes that the regression function is a linear combination of piecewise constant Boolean interaction terms. Given an RF tree ensemble, we define a quantity called “Depth-Weighted Prevalence” (DWP) for a set of signed features S. Intuitively speaking, DWP(S) measures how frequently features in S appear together in an RF tree ensemble. We prove that, with high probability, DWP(S) attains a universal upper bound that does not involve any model coefficients, if and only if S corresponds to a union of Boolean interactions under the LSS model. Consequentially, we show that a theoretically tractable version of the iRF procedure, called LSSFind, yields consistent interaction discovery under the LSS model as the sample size goes to infinity. Finally, simulation results show that LSSFind recovers the interactions under the LSS model, even when some assumptions are violated. Reference: https://www.pnas.org/doi/10.1073/pnas.2118636119 Co-authors: Yu Wang, Xiao Li, and Bin Yu (UC Berkeley)
mehr

11.01.2023 12:15 Y. Samuel Wang (Cornell University, Ithaca, NY): Uncertainty Quantification for Causal Discovery

Causal discovery procedures are popular methods for discovering causal structure across the physical, biological, and social sciences. However, most procedures for causal discovery only output a single estimated causal model or single equivalence class of models. In this work, we propose a procedure for quantifying uncertainty in causal discovery. Specifically, we consider structural equation models where a unique graph can be identified and propose a procedure which returns a confidence sets of causal orderings which are not ruled out by the data. We show that asymptotically, a true causal ordering will be contained in the returned set with some user specified probability. In addition, the confidence set can be used to form conservative sets of ancestral relationships.
mehr

30.11.2022 12:15 Elizabeth Gross (University of Honolulu, USA): Phylogenetic network inference with invariants

Phylogenetic networks provide a means of describing the evolutionary history of sets of species believed to have undergone hybridization or horizontal gene flow during the course of their evolution. The mutation process for a set of such species can be modeled as a Markov process on a phylogenetic network. Previous work has shown that a site-pattern probability distributions from a Jukes-Cantor phylogenetic network model must satisfy certain algebraic invariants, i.e. polynomial relationships. As a corollary, aspects of the phylogenetic network are theoretically identifiable from site-pattern frequencies. In practice, because of the probabilistic nature of sequence evolution, the phylogenetic network invariants will rarely be satisfied, even for data generated under the model. Thus, using network invariants for inferring phylogenetic networks requires some means of interpreting the residuals when observed site-pattern frequencies are substituted into the invariants. In this work, we propose an approach that combines statistical learning and phylogenetic invariants to infer small, level-one phylogenetic networks, and we discuss how the approach can be extended to infer larger networks. This is joint work with Travis Barton, Colby Long, and Joseph Rusinko.
mehr

For talks more than 90 days ago please have a look at the Munich Mathematical Calendar (filter: "Oberseminar Statistics and Data Science").