Dec 22 Friday
Schedule
- 7.00: Breakfast
- 8.45: Opening
- 9:00: Jaouad Mourtada ENSAE Finite-sample performance of the maximum likelihood estimator in logistic regression
- 9:40: Daniil Tiapkin HSE Demonstration-regularized RL
- 10.20: Coffee break
- 10:40: Yuhao Wang Tsinghua University Residual Permutation Test for High-Dimensional Regression Coefficient Testing
- 11:20: Subhodh Kotekal University of Chicago Optimal Heteroskedasticity testing in nonparametric regression
- 12.30: Lunch
7.00: Breakfast
8.45: Opening
Session: Learning theory
Chair: Sara Van De Geer, ETH Zurich
9:00: Jaouad Mourtada ENSAE Finite-sample performance of the maximum likelihood estimator in logistic regression
The logistic model is a classical linear model to describe the probabilistic dependence of binary responses to multivariate features. We consider the predictive performance of the maximum likelihood estimator (MLE) for logistic regression, assessed in terms of the logistic loss of its probabilistic forecasts. We consider two questions: first, that of the existence of the MLE (which occurs when the data is not linearly separated), and second that of its accuracy when it exists. These properties depend on both the dimension of covariates and on the signal strength. In the case of Gaussian covariates and a well-specified logistic model, we obtain sharp non-asymptotic guarantees for the existence and excess prediction risk of the MLE. This complements asymptotic results of Sur and Candès, and refines non-asymptotic upper bounds of Ostrovskii and Bach and Chinot, Lecué and Lerasle. It also complements independent recent results by Kuchelmeister and van de Geer. We then extend these results in two directions: first, to non-Gaussian covariates satisfying a certain regularity condition, and second to the case of a misspecified logistic model.
9:40: Daniil Tiapkin HSE Demonstration-regularized RL
Incorporating expert demonstrations has empirically helped to improve the sample efficiency of reinforcement learning (RL). This talk is on theoretical quantification to what extent this extra information reduces RL’s sample complexity. In particular, we study the demonstration-regularized reinforcement learning that leverages the expert demonstrations by KL-regularization for a policy learned by behavior cloning. Our findings reveal that using \(N_{exp}\) expert demonstrations enables the identification of an optimal policy at a sample complexity of order \(\tcO(\mathrm{Poly}(S,A,H)/(\epsilon^2 \Nexp))\)} in finite and \(\tcO(\mathrm{Poly}(d,H)/(\epsilon^2 \Nexp))\) in linear Markov decision processes, where \(\epsilon\) is the target precision, \(H\) the horizon, \(A\) the number of action, \(S\) the number of states in the finite case and \(d\) the dimension of the feature space in the linear case. As a by-product, we provide tight convergence guarantees for the behaviour cloning procedure under general assumptions on the policy classes. Additionally, we establish that demonstration-regularized methods are provably efficient for reinforcement learning from human feedback (RLHF). In this respect, we provide theoretical evidence showing the benefits of KL-regularization for RLHF in tabular and linear MDPs. Interestingly, we avoid pessimism injection by employing computationally feasible regularization to handle reward estimation uncertainty, thus setting our approach apart from the prior works
10.20: Coffee break
Session: High dimensional testing
Chair: Alexandra Carpentier, Potsdam University
10:40: Yuhao Wang Tsinghua University Residual Permutation Test for High-Dimensional Regression Coefficient Testing
We consider the problem of testing whether a single coefficient is equal to zero in fixed-design linear models with moderately high-dimensional covariates. In the moderate high-dimensional setting where the dimension of covariates p is allowed to be in the same order of magnitude as sample size n, to achieve finite-population validity, existing methods usually require strong distributional assumptions on the noise vector (such as Gaussian or rotationally invariant), which limits their applications in practice. In this paper, we propose a new method, called residual permutation test (RPT), which is constructed by projecting the regression residuals onto the space orthogonal to the union of the column spaces of the original and permuted design matrices. RPT can be proved to achieve finite-population size validity under fixed design with just exchangeable noises, whenever p < n / 2. Moreover, RPT is shown to be asymptotically powerful for heavy tailed noises with bounded (1+t)-th order moment when the true coefficient is at least of order n^{-t / (1 + t)} for t \in [0, 1]. We further proved that this signal size requirement is essentially minimax rate optimal. Numerical studies confirm that RPT performs well in a wide range of simulation settings with normal and heavy-tailed noise distributions. This is based on joint works with Kaiyue Wen and Tengyao Wang.
11:20: Subhodh Kotekal University of Chicago Optimal Heteroskedasticity testing in nonparametric regression
Heteroskedasticity testing in nonparametric regression is a classic statistical problem with important practical applications, yet fundamental limits are unknown. Adopting a minimax perspective, we consider the testing problem in the context of an α-H ̈older mean and a β-H ̈older variance function. For α > 0 and β ∈ (0, 1/2), the sharp minimax separation rate \(n^{−4α}+n^{−4β/(4β+1)}+n^{−2β} \) is established. To achieve the minimax separation rate, a kernel-based statistic using first-order squared differences is developed. Notably, the statistic estimates a proxy rather than a natural quadratic functional (the squared distance between the variance function and its best \(L_2\) approximation by a constant) suggested in previous work. The setting where no smoothness is assumed on the variance function is also studied; the variance profile across the design points can be arbitrary. Despite the lack of structure, consistent testing turns out to still be possible by using the Gaussian character of the noise, and the minimax rate is shown to be \(n^{−4α} + n^{−1/2} \). Exploiting noise information happens to be a fundamental necessity as consistent testing is impossible if nothing more than zero mean and unit variance is known about the noise distribution.