Dec 18 Monday
Schedule
- 7.00: Breakfast
- 8.45: Opening
- 9:00: John Duchi Stanford University Privacy - Definitions, Procedures, Open Problems
- 10.00: Coffee break
- 10:20: Stanislav Minsker University of Southern California (USC) Improved performance guarantees for robust mean estimation
- 11:00: Arshak Minasyan CentraleSupélec - Paris-Saclay University Statistically optimal robust mean and covariance estimation for anisotropic gaussians
- 11:40: Chao Gao University of Chicago Robust and minimax estimation in a two-groups model
- 12.30: Lunch
- 16:00: Christophe Giraud Paris-Saclay University Improving fairness in online learning
- 16:40: Jean-Michel Loubes Université Toulouse Paul Sabatier counterfactual fairness using optimal transport
- 17.20: Coffee break
- 17:40: Zhao Ren University of Pittsburgh Heteroskedastic Sparse PCA in High Dimensions
- 18:20: Angelika Rohde Freiburg University Bootstrapping high-dimensional sample covariance matrices
- 19.30: Dinner
7.00: Breakfast
8.45: Opening
9:00: John Duchi Stanford University Privacy - Definitions, Procedures, Open Problems
I will provide a broad overview of differential privacy, which provides guarantees that a data analysis protects the privacy of data contributors. The main focus will be on the private computation and release of different statistics, both classical (low-dimensional) and high-dimensional statistics. In addition to giving a high-level program for the development of optimal private estimators, I will likely discuss a few open questions as well.
10.00: Coffee break
Session: Robust statistics
Chair: Pierre Alquier, ESSEC Business School
10:20: Stanislav Minsker University of Southern California (USC) Improved performance guarantees for robust mean estimation
The median of means (MOM) estimator and its variants has become a go-to method for problems involving heavy-tailed and outlier-contaminated data. Examples include robust versions of mean and covariance estimators, linear regression, and k-means clustering, among others. Achieving the best possible performance for the MOM estimator in the simplest univariate case has positive implications for many of these problems. In the first part of the talk, we will demonstrate how to obtain efficient versions of the MOM estimator that satisfies deviation inequalities with sharp constants, requiring only minimal assumptions on the underlying distribution. Moreover, we will discuss the interplay between this question and the theory of U-statistics.
The second part of the talk will be devoted to the multivariate version of the MOM estimator based on the geometric median. We will demonstrate that for large classes of heavy-tailed distributions, the geometric MOM estimator attains sub-exponential deviation guarantees, improving the known bounds in many cases. New analysis of this estimator reveals interesting connections with the small ball probabilities and some questions about the negative moments of the norms. This part is based on a joint work with Nate Strawn.
11:00: Arshak Minasyan CentraleSupélec - Paris-Saclay University Statistically optimal robust mean and covariance estimation for anisotropic gaussians
Assume that \(X_{1}, \ldots, X_{N} \) is an \(\varepsilon\)-contaminated sample of \(N\) independent Gaussian vectors in \(\mathbb{R}^d\) with mean \(\mu\) and covariance \(\Sigma\). In the strong \(\varepsilon\)-contamination model we assume that the adversary replaced an \(\varepsilon\) fraction of vectors in the original Gaussian sample by any other vectors. We show that there is an estimator \(\widehat \mu\) of the mean satisfying, with probability at least \(1 - \delta\), a bound of the form
\(|\widehat{\mu} - \mu|_2 \le c\left(\sqrt{\frac{Tr(\Sigma)}{N}} + \sqrt{\frac{|\Sigma|\log(1/\delta)}{N}} + \varepsilon\sqrt{|\Sigma|}\right), \)
where \(c > 0\) is an absolute constant, and \(|\Sigma|\) denotes the operator norm of \(\Sigma\). In the same contaminated Gaussian setup, we construct an estimator \(\widehat \Sigma\) of the covariance matrix \(\Sigma\) that satisfies, with probability at least \(1 - \delta\),
\( \left|\widehat{\Sigma} - \Sigma\right| \le c\left(\sqrt{\frac{|\Sigma|Tr(\Sigma)}{N}} + |\Sigma|\sqrt{\frac{\log(1/\delta)}{N}} + \varepsilon|\Sigma|\right). \)
Both results are optimal up to multiplicative constant factors. Despite the recent significant interest in robust statistics, achieving both dimension-free bounds in the canonical Gaussian case remained open. In fact, several previously known results were either dimension-dependent and required \(\Sigma\) to be close to identity, or had a sub-optimal dependence on the contamination level \(\varepsilon\).
As a part of the analysis, we derive sharp concentration inequalities for central order statistics of Gaussian, folded normal, and chi-squared distributions.
This is a joint work with N. Zhivotovskiy.
11:40: Chao Gao University of Chicago Robust and minimax estimation in a two-groups model
The advent of large scale inference has spurred reexamination of conventional statistical thinking. In a series of highly original articles, Efron showed in some examples that the ensemble of the null distributed test statistics grossly deviated from the theoretical null distribution, and Efron persuasively illustrated the danger in assuming the theoretical null’s veracity for downstream inference. Though intimidating in other contexts, the large scale setting is to the statistician’s benefit here. There is now potential to estimate, rather than assume, the null distribution.
In a model for n many z-scores with at most k nonnulls, we adopt Efron’s suggestion and consider estimation of location and scale parameters for a Gaussian null distribution. Placing no assumptions on the nonnull effects, we consider rate-optimal estimation in the entire regime k < n/2, that is, precisely the regime in which the null parameters are identifiable. The minimax upper bound is obtained by considering estimators based on the empirical characteristic function and the classical kernel mode estimator. Faster rates than those in Huber’s contamination model are achievable by exploiting the Gaussian character of the data. As a consequence, it is shown that consistent estimation is indeed possible in the practically relevant regime k ≍ n. In a certain regime, the minimax lower bound involves constructing two marginal distributions whose characteristic functions match on a wide interval containing zero. The construction notably differs from those in the literature by sharply capturing a second-order scaling of n/2 − k in the minimax rate.
12.30: Lunch
Session: Fairness
Chair: Philippe Rigollet, MIT
16:00: Christophe Giraud Paris-Saclay University Improving fairness in online learning
Online learning and recommandation systems are ubiquitous in decision making, including in some sensitive cases such as job hiring, decision in justice, admission in college. Applying blindly standard ML technics can lead to unfair and discriminatory decisions. We will discuss some approaches for improving fairness in online learning and recommandation systems.
https://arxiv.org/abs/2305.15807
https://arxiv.org/abs/2106.12242
https://arxiv.org/abs/2203.09784
16:40: Jean-Michel Loubes Université Toulouse Paul Sabatier counterfactual fairness using optimal transport
A supervised machine learning algorithm determines a model from a learning sample that will be used to predict new observations. To this end, it aggregates individual characteristics of the observations of the learning sample. But this information aggregation may be potentially biased.. This situation has raised concerns around the so-called fairness of machine learning algorithms, especially towards disadvantaged groups.
We provide a better comprehension of both global biases and individual biases using definitions grounded in optimal transport theory. Besides the identification of theses biases, we will also consider mitigation methods to handle local bias while maintaining some accuracy in the output.
17.20: Coffee break
Session: High dimensional covariance estimation
Chair: Cristina Butucea, ENSAE
17:40: Zhao Ren University of Pittsburgh Heteroskedastic Sparse PCA in High Dimensions
Principal component analysis (PCA) is one of the most commonly used techniques for dimension reduction and feature extraction. Though it has been well-studied for high-dimensional sparse PCA, little is known when the noise is heteroskedastic, which turns out to be ubiquitous in many scenarios, like biological sequencing data and information network data. We propose an iterative algorithm for sparse PCA in the presence of heteroskedastic noise, which alternatively updates the estimates of the sparse eigenvectors using the orthogonal iteration with adaptive thresholding in one step, and imputes the diagonal values of the sample covariance matrix to reduce the estimation bias due to heteroskedasticity in the other step. Our procedure is computationally fast and provably optimal under the generalized spiked covariance model, assuming the leading eigenvectors are sparse. A comprehensive simulation study demonstrates its robustness and effectiveness in various settings.
18:20: Angelika Rohde Freiburg University Bootstrapping high-dimensional sample covariance matrices
Bootstrapping is the classical approach for distributional approximation of estimators and test statistics when an asymptotic distribution contains unknown quantities or provides a poor approximation quality. For the analysis of massive data, however, the bootstrap is computationally intractable in its basic sampling-with-replacement version. Moreover, it is even not valid in some important high-dimensional applications. Combining subsampling of observations with suitable selection of their coordinates, we introduce a new $(m,mp/n)$ out of $(n,p)$-sampling with replacement bootstrap for eigenvalue statistics of high-dimensional sample covariance matrices based on $n$ independent $p$-dimensional random vectors. In the high-dimensional scenario $p/n\rightarrow c\in [0,\infty)$, this fully nonparametric bootstrap is shown to consistently reproduce the underlying spectral measure if $m/n\rightarrow 0$. If $m^2/n\rightarrow 0$, it approximates correctly the distribution of linear spectral statistics. The crucial component is a suitably defined representative subpopulation condition which is shown to be verified in a large variety of situations. The proofs incorporate several delicate technical results which may be of independent interest.