Understanding overfitting peaks in generalization error: Analytical risk curves for $l_2$ and $l_1$ penalized interpolation

Mitra, Partha P (June 2019) Understanding overfitting peaks in generalization error: Analytical risk curves for $l_2$ and $l_1$ penalized interpolation. (Submitted)

[thumbnail of 1906.03667v1.pdf]
Preview
PDF
1906.03667v1.pdf - Submitted Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (650kB) | Preview

Abstract

Traditionally in regression one minimizes the number of fitting parameters or uses smoothing/regularization to trade training (TE) and generalization error (GE). Driving TE to zero by increasing fitting degrees of freedom (dof) is expected to increase GE. However modern big-data approaches, including deep nets, seem to over-parametrize and send TE to zero (data interpolation) without impacting GE. Overparametrization has the benefit that global minima of the empirical loss function proliferate and become easier to find. These phenomena have drawn theoretical attention. Regression and classification algorithms have been shown that interpolate data but also generalize optimally. An interesting related phenomenon has been noted: the existence of non-monotonic risk curves, with a peak in GE with increasing dof. It was suggested that this peak separates a classical regime from a modern regime where over-parametrization improves performance. Similar over-fitting peaks were reported previously (statistical physics approach to learning) and attributed to increased fitting model flexibility. We introduce a generative and fitting model pair ("Misparametrized Sparse Regression" or MiSpaR) and show that the overfitting peak can be dissociated from the point at which the fitting function gains enough dof's to match the data generative model and thus provides good generalization. This complicates the interpretation of overfitting peaks as separating a "classical" from a "modern" regime. Data interpolation itself cannot guarantee good generalization: we need to study the interpolation with different penalty terms. We present analytical formulae for GE curves for MiSpaR with l2 and l1 penalties, in the interpolating limit λ→0.These risk curves exhibit important differences and help elucidate the underlying phenomena.

Item Type: Paper
CSHL Authors:
Communities: CSHL labs > Mitra lab
SWORD Depositor: CSHL Elements
Depositing User: CSHL Elements
Date: 9 June 2019
Date Deposited: 13 Oct 2023 17:34
Last Modified: 13 Oct 2023 17:34
Related URLs:
URI: https://repository.cshl.edu/id/eprint/41258

Actions (login required)

Administrator's edit/view item Administrator's edit/view item