GithubHelp home page GithubHelp logo

tensor-fusion / double-descent-deep-nets Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 40.25 MB

Double descent experiments/repros on classical ML models and deep neural nets

Jupyter Notebook 100.00%
double-descent machine-learning regression

double-descent-deep-nets's Introduction

Double descent experiments

In this repo I'm trying to reproduce some double descent results from several papers:

Nothing particularly useful here (unless you're interested in double descent).

Example

Reproducing polynomial regression results from the Double Descent Demystified paper:

Underparameterized regime Interpolation threshold Overparameterized regime

Background

One of the most fervid claims made by modern-day DL researchers was always that "bigger models work better!!". This conflicts with standard statistical learning theory wisdom whose prediction was that bigger models would overfit on training data, interpolate noise and fail to generalize.

Who's right? Enter double descent.

Double descent describes the phenomenon where the error curve of a model as a function of model complexity or size doesn't follow the traditional bias-variance tradeoff U-shaped curve. Instead, after an initial descent (error reduction) and subsequent ascent (error increase due to overfitting), there is a second descent in error even as model complexity continues to grow beyond the interpolation threshold.

One point for modern DL folks (although this doesn't necessarily contradict classic bias-variance).

Linear regression example

Given a model's complexity represented by the number of parameters $p$ and training samples $n$, the interpolation threshold is reached when $p$ equals $n$. Traditionally, if $p$ exceeds $n$, models are expected to generalize poorly on new data due to overfitting, but in double descent, as $p$ grows even larger, the generalization error decreases after surpassing the threshold.

Assume a scenario where you fit a polynomial regression model:

$$y_i=f\left(x_i\right)+\epsilon_i$$

where $f(x)$ is the true function, $x_i$ are the data points, $y_i$ are the observed values, and $\epsilon_i$ represents noise.

As the degree of the polynomial (akin to model complexity) increases, the fit to the training data becomes perfect when the degree $d$ is at least $n-1$. If $d$ surpasses $n-1$, classical theory suggests a blowup in generalization error due to high variance. However, as observed in the double descent phenomena, if $d$ continues to increase significantly beyond $n$, test error decreases again.

double-descent-deep-nets's People

Contributors

tensor-fusion avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.