Parameter-free Statistically Consistent Interpolation: Dimension-independent Convergence Rates for Hilbert kernel regression

Mitra, Partha P, Sire, Clément (June 2021) Parameter-free Statistically Consistent Interpolation: Dimension-independent Convergence Rates for Hilbert kernel regression. (Submitted)

[thumbnail of 2106.03354v1.pdf]
Preview
PDF
2106.03354v1.pdf - Submitted Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (693kB) | Preview

Abstract

Previously, statistical textbook wisdom has held that interpolating noisy data will generalize poorly, but recent work has shown that data interpolation schemes can generalize well. This could explain why overparameterized deep nets do not necessarily overfit. Optimal data interpolation schemes have been exhibited that achieve theoretical lower bounds for excess risk in any dimension for large data (Statistically Consistent Interpolation). These are non-parametric Nadaraya-Watson estimators with singular kernels. The recently proposed weighted interpolating nearest neighbors method (wiNN) is in this class, as is the previously studied Hilbert kernel interpolation scheme, in which the estimator has the form f^(x)=∑iyiwi(x), where wi(x)=∥x−xi∥−d/∑j∥x−xj∥−d. This estimator is unique in being completely parameter-free. While statistical consistency was previously proven, convergence rates were not established. Here, we comprehensively study the finite sample properties of Hilbert kernel regression. We prove that the excess risk is asymptotically equivalent pointwise to σ2(x)/ln(n) where σ2(x) is the noise variance. We show that the excess risk of the plugin classifier is less than 2|f(x)−1/2|1−α(1+ε)ασα(x)(ln(n))−α2, for any 0<α<1, where f is the regression function x↦E[y|x]. We derive asymptotic equivalents of the moments of the weight functions wi(x) for large n, for instance for β>1, E[wβi(x)]∼n→∞((β−1)nln(n))−1. We derive an asymptotic equivalent for the Lagrange function and exhibit the nontrivial extrapolation properties of this estimator. We present heuristic arguments for a universal w−2 power-law behavior of the probability density of the weights in the large n limit.

Item Type: Paper
CSHL Authors:
Communities: CSHL labs > Mitra lab
SWORD Depositor: CSHL Elements
Depositing User: CSHL Elements
Date: 7 June 2021
Date Deposited: 13 Oct 2023 17:32
Last Modified: 13 Oct 2023 17:32
Related URLs:
URI: https://repository.cshl.edu/id/eprint/41257

Actions (login required)

Administrator's edit/view item Administrator's edit/view item