## 04.03.2024 15:00 Amit Einav (Durham University): About chaos and order in many element systems

Systems that involve many elements, be it a gas of particles or a herd of animals, are ubiquitous in our day to day lives. Their investigation, however, is hindered by the complexity of such systems and the amount of (usually coupled) equations that are needed to be solved.
The late 50’s has seen the birth of the so-called mean field limit approach as an attempt to circumvent some of the difficulties arising in treating such systems. Conceived by Kac as a way to give justification to the validity of the Boltzmann equation, the mean field limit approach attempts to find the behaviour of a limiting “average” element in a many element system and relies on two ingredients: an average model of the system (i.e. an evolution equation for the probability density of the ensemble), and an asymptotic correlation relation that expresses the emerging phenomena we expect to get as the number of elements goes to infinity.
Mean field limits of average models, originally applied to particle models, have permeated to fields beyond mathematical physics in recent decades. Examples include models that pertain to biological, chemical, and even societal phenomena. However, to date we use only one asymptotic correlation relation – chaos, the idea that the elements become more and more independent. While suitable when considering particles in a certain cavity, this assumption doesn’t seem reasonable in models that pertain to biological and societal phenomena.
In our talk we will introduce Kac’s particle model and the notions of chaos and mean field limits. We will discuss the problem of having chaos as the sole asymptotic correlation relation and define a new asymptotic relation of order. We show that this is the right relation for a recent animal based model suggested by Carlen, Degond, and Wennberg, and highlight the importance of appropriate scaling in its investigation.

Quelle
## 05.03.2024 16:00 Maximilian Gläser: Sub-Exponential Lower Bounds for Branch-and-Bound with General Disjunctions via Interpolation

We investigate linear programming based branch-and-bound using general disjunctions, also known as stabbing planes and derive the first sub-exponential lower bound (i.e., 2^{L^\omega(1)}) in the encoding length L of the integer program for the size of a general branch-and-bound tree. In fact, this is even the first super-linear bound.
This is achieved by showing that general branch-and-bound admits quasi-feasible monotone real interpolation. Specialized to a certain infeasible integer program whose infeasibility expresses that no graph can have both a k-coloring and a (k+1)-clique, this property states that any branch-and-bound tree for this program can be turned into a monotone real circuit of similar size which distinguishes between graphs with a k-coloring and graphs with a (k+1)-clique. Thus, we can utilize known lower-bounds for such circuits.
Using Hrubeš' and Pudlák's notion of infeasibility certificates, one can show that certain random CNFs are sub-exponentially hard to refute for branch-and-bound with high probability via the same technique.
This is joint work with Marc Pfetsch.

Quelle
## 13.03.2024 12:15 Bryon Aragam (University of Chicago): Optimal neighbourhood selection in structural equation models

We study the optimal sample complexity of neighbourhood selection in linear structural equation models, and compare this to best subset selection (BSS) for linear models under general design. We show by example that -- even when the structure is \emph{unknown} -- the existence of underlying structure can reduce the sample complexity of neighbourhood selection. This result is complicated by the possibility of path cancellation, which we study in detail, and show that improvements are still possible in the presence of path cancellation. Finally, we support these theoretical observations with experiments. The proof introduces a modified BSS estimator, called klBSS, and compares its performance to BSS. The analysis of klBSS may also be of independent interest since it applies to arbitrary structured models, not necessarily those induced by a structural equation model. Our results have implications for structure learning in graphical models, which often relies on neighbourhood selection as a subroutine.

Quelle
## 20.03.2024 12:15 Yichen Zhu (Università Bocconi, Milano): Posterior Contraction Rates for Vecchia Approximations of Gaussian Processes

Gaussian Processes (GP) are widely used to model spatial dependency in geostatistical data, yet the exact Bayesian inference has an intractable time complexity of $O(n^3)$. Vecchia approximation has become a popular solution to this computational issue, where spatial dependency is characterized by a sparse directed acyclic graph (DAG) that allows scalable Bayesian inference. Despite the popularity in practice, little is understood about its theoretical properties. In this paper, we systematically study the posterior contraction rates of Vecchia approximations of GP. Under minimal regularity conditions, we prove that by appropriate selection of the underlying DAG, the Vecchia approximated GP possess the same posterior contraction rates as the mother GP. Therefore, by optimal choices of the tunning hyper-parameters, the Vecchia approximation can achieve the minimax contraction rate, providing strong frequentist guarantees to the procedure. Our theoretical findings are demonstrated numerically as well using synthetic and real world data sets.

Quelle
## 25.03.2024 15:00 Gautham Govind: When can we Approximate Wide Contrastive Models with Neural Tangent Kernels and Principal Component Analysis?

Contrastive learning is a paradigm for learning representations from unlabelled data that has been highly successful for image and text data. Several recent works have examined contrastive losses to claim that contrastive models effectively learn spectral embeddings, while few works show relations between (wide) contrastive models and kernel principal component analysis (PCA). However, it is not known if trained contrastive models indeed correspond to kernel methods or PCA. In this work, we analyze the training dynamics of two-layer contrastive models, with non-linear activation, and answer when these models are close to PCA or kernel methods. It is well known in the supervised setting that neural networks are equivalent to neural tangent kernel (NTK) machines, and that the NTK of infinitely wide networks remains constant during training. We provide the first convergence results of NTK for contrastive losses, and present a nuanced picture: NTK of wide networks remains almost constant for cosine similarity based contrastive losses, but not for losses based on dot product similarity. We further study the training dynamics of contrastive models with orthogonality constraints on the output layer, which is implicitly assumed in works relating contrastive learning to spectral embedding. Our deviation bounds suggest that representations learned by contrastive models are close to the principal components of a certain matrix computed from random features. We empirically show that our theoretical results possibly hold beyond two-layer networks.

Quelle