GithubHelp home page GithubHelp logo

weixin-liang / data-centric-ai-perspective Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 0.0 3.52 MB

License: MIT License

Jupyter Notebook 91.41% Python 8.59%
distribution-shift data-centric-ai data-shapley metashift

data-centric-ai-perspective's Introduction

Data-centric-AI-perspective

License

This repo provides the data and code for reproducing the the image classification experiments.

Data

  • Please download the prepared dataset data-centric-AI-perspective.zip from this google drive url.

  • Extract the files. The resulting file structure will look like:

.
├── README.md
├── data/
    ├── train/              
        ├── cat/
            ├── <ID>.jpg
            ├── ...
        ├── dog/
            ├── <ID>.jpg
            ├── ...
    ├── val/
        ├── cat/
            ├── <ID>.jpg
            ├── ...
        ├── dog/
            ├── <ID>.jpg
            ├── ...
    ├── shapley_val/
        ├── cat/
            ├── <ID>.jpg
            ├── ...
        ├── dog/
            ├── <ID>.jpg
            ├── ...
├── code/
    ├── outputs/ 
    ├── step_1_baseline_finetune.ipynb
    ├── step_2a_extract_features.ipynb
    ├── step_2b_compute_data_shapley.py
    ├── step_3_using_cleaned_data.ipynb              

Code

  • Execute the scripts/notebooks in the following order, as indicated by their file names:
    • step_1_baseline_finetune.ipynb: performs standard fine-tuning on the given dataset. The notebook is adopted from the official PyTorch tutorial on fine-tuning torchvision models.
    • step_2a_extract_features.ipynb: extracts image features. The features are used when applying the data Shapley method to the noisy training set.
    • step_2b_compute_data_shapley.py: estimates the Shapley value of each training point.
    • step_3_using_cleaned_data.ipynb: drops the training data points with negative Shapley value, and trains the classifier on the remaining training data points.

Dependencies

  • Python 3.6.13 (e.g. conda create -n venv python=3.6.13)
  • PyTorch Version: 1.4.0
  • Torchvision Version: 0.5.0

data-centric-ai-perspective's People

Contributors

weixin-liang avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.