GithubHelp home page GithubHelp logo

graphcore / examples-utils Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 2.0 403 KB

Utils and common code for Graphcore's example applications

License: Other

Makefile 0.07% Python 98.17% C++ 0.44% Shell 0.93% Jupyter Notebook 0.40%

examples-utils's People

Contributors

alexgraygc avatar blazejba avatar evawgraphcore avatar halestormai avatar hiteshk-gc avatar hmellor avatar joshlk avatar kundamwiza avatar marcins-gc avatar michaln-gc avatar payoto avatar rahult-graphcore avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

examples-utils's Issues

Formalise convergence testing code (platform assessment) and dockerify it

Currently the code under platform assessment has three issues:

  • Its unofficial and poorly organised
  • Its not fully integrated into examples-utils benchmarking submodule
  • It uses a .sh file to setup environments etc

This task is for making the code more organised and integrated neatly into examples-utils benchmaring (or splitting it off into something else entirely) and converting it to use docker containers.

Examples utils benchmarking: Feedback round 2

Previous round of testing + feedback from SysOps and PSE was very useful and led to the improvment and promotion of examples utils benchmarking.

Round 2 will be performed with the AI-Engineering cloud SDK team.

Benchmarks: checkpoint uploading

The current implementation of finding checkpoints for upload to wands/ s3 assumes:

  1. --checkpoint-output-dir contains subdirectories. However it may be the case that the output directory itself holds the checkpoint files.
  2. Each subdirectory contains only checkpoint related files. However some applications store other files related to checkpoints.
  3. The file that was most recently updated in any subdirectory corresponds to a checkpoint. Tied with point 2, but applications may store files related to checkpoints, but the actual checkpoint file may not be the most recently updated file, a metadata file may be such a file instead

We need to identify all the expected scenarios across all apps for checkpoint outputs:

  • do they provide subdirectories for each ckpt
  • store metadata files
  • do all checkpoints correspond to one file? If so what are the possible allowed extensions

Could it be easier to control the format of the output directories in each application instead?

QOL improvements to `platform_assesment` command

I've extracted the functionality into: requirements_utils.py. I can move it further into its own folder if you think that's valuable but figured I'd get some feedback on this first. There are a few things that I want to improve before merging:

  • Logging and clarity of the log messages:
    • Add header to the setup
    • Remove the "Benchmark elapsed time" for requirement installation
    • Log the name of the file environment_setup.log
    • Capture the output of pip freeze after installing each requirements
  • Add some docs to the new function and module
  • Improve --help for platform assessment

Ideally I don't plan to merge with the platform_assessment script yet but I can open a follow up issue for that.

Originally posted by @payoto in #46 (comment)

Examine possibility of cppimport upgrade

@joshlk told cppimport introduced a fix for the issue arising in parallel compilation from version 22.07.17

Is it possible to upgrade the cppimport dependency to 22.07.17 and remove custom workarounds?

Some CI tests were still failing this week (they don't fail everytime though) (test attention in Bert https://jenkins.sourcevertex.net/job/public_examples/job/public_examples_ci_ubuntu_18_04_hw_pod_mk2/316/testReport/junit/(root)/(empty)/tests_integration_layer_test_attention/), and we could possibly solve the issue upgrading the popxl-addons dependency using cppimport 22.07.17. However, if examples utils depends on a previous version, there is a dependency conflict.

Examples utils self-identified upgrades

There are already some upgrades that we can make, even before feedback round 2:

  • Better spacing and clarity of results in terminal
  • Reduce the amount of logging in the terminal to only the minimum, the output in output.log can be complete
  • A more obvious progress/running inidicator in terminal
  • Ability to unify a distinct poprun common arguments option in the yaml files into all poprun commands from that yaml file.
  • The start of unit testing... (seek inspiration)
  • Remove the need of passing a benchmarks file entirely, make it optional in that case

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.