GithubHelp home page GithubHelp logo

chengzegang / torchtemplate Goto Github PK

View Code? Open in Web Editor NEW
1.0 4.0 0.0 80 KB

A simple Pytorch template for deep learning.

License: MIT License

Python 91.91% Rust 4.11% Shell 3.98%
cv nlp pytorch ddp

torchtemplate's Introduction

A PyTorch Project Template

CodeFactor test build

This is a simple template for any research projects that require training deep learning models with PyTorch. It is based on my own everyday routine experiences. Originally, most functionalities were nested in some big singletons for the sake of simplicity (that states can be easily accessed from anywhere). However, as a project grows, it becomes more than critical to involve the so-called "Continous integration/Development" (CI/CD) paradigms for readability, testability, and maintainability. After all, it sucks when I need to debug every single function every fxxking time new things came in, especially considering the crazy fast pace in the academic world. Thus, this repo is born, and it aims to be more extensible and modularized, rather than a simple trainer python script.


Introduction

The template is in flat-layout, and it is designed to be runnable without using pip install -e .. There was a (many) time that I want to follow the Python Package Guide and tried to write everything in "src-layout". However, when someone submits their jobs in a cluster or any high-performance computing environment, it is very likely that they will not have permission to install a random custom package, OR, they will be required to install the packages in a container that is then locked when one script is running (Therefore you cannot run another project or change current environment until the current script is done). Thus, the flat-layout is the better choice for a research-oriented project.

The template contains four main submodules:

  • datasets (all dataset creations and intializations)
  • models (all model creations and intializations)
  • trainers (the main experiment logic happens here)
  • transforms (preprocessing, postprocessing, augmentation, etc.)

some helper modules:

  • ddp.py (a helper module for distributed training)
  • lr_lambda.py (a helper module for learning rate scheduling; since in most case "cosine warmup" is the only thing I need, I wrote them in one file)

some configs and inis:

  • pyproject.toml (hatch configurations, it is always preferred to define all metadata in the pyproject.toml when possible)
  • MANIFEST.in (manifest configurations, you may want to bundle these *.ini files or other data files into your compiled package)
  • pytest.ini (pytest configurations, test is CRITIAL)
  • mypy.ini (mypy configurations, python is not static-typed, but enforcing it will greatly reduce bugs in runtime)
  • tox.ini (tox configurations, since we got many things to run and a nature programmer seeks to automate everything)
  • requirements.txt (packages that are required to run the codes, and I don't think cpu version torch is anything related to deep learning)
  • .flake8 (flake8 configurations for linting)
  • .github/workflows/** (all github actions, you may want to test your codes before sync to github, but this one is only for the badges in README.md)
  • .vscode/** (all vscode settings, personal preferences)

and a experimental extension module:

  • extensions/rust_ext/** (a rust extension for Python, my personally interests in trying some rust codes)

It is true that one can abstract trainers into even smaller components, but being over-nerdy will only make the codes uneditable. It is more common in research projects that someone copies and pastes the codes of one trainer.py to be the skeleton of another. Moreover, this is also emphasized by the idea of "isolations": You really don't want your two experiments to interfere with each other. Thus, redundancy is somewhat encouraged and welcomed in trainer scripts (even though I still split the codes a little bit for further investigations).

License

project is distributed under the terms of the MIT license.

torchtemplate's People

Contributors

chengzegang avatar

Stargazers

 avatar

Watchers

James Cloos avatar  avatar Kostas Georgiou avatar  avatar

torchtemplate's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.