GithubHelp home page GithubHelp logo

residentmario / fahr Goto Github PK

View Code? Open in Web Editor NEW
4.0 3.0 0.0 7.25 MB

Run remote machine learning model training jobs right from the command line.

Home Page: https://residentmario.github.io/fahr/

License: MIT License

Python 99.93% Shell 0.07%
machine-learning cli model-training aws-sagemaker python

fahr's Introduction

fahr status beta PyPi version docs passing

fahr is a command-line tool for building machine learning models on cloud hardware with as little overhead as possible.

fahr provides a simple unified interface to model training services like AWS SageMaker and Kaggle Kernels. By offloading model training to the cloud, fahr aims to make machine learning experimentation easy and fast.

How it works

First, some lingo:

  • training artifact โ€” A file (either .ipynb or .py) which, when executed correctly, produces a model artifact, e.g. a model training script or notebook.
  • model artifact โ€” A file which defines a machine learning model, e.g. a neural weight matrix.

fahr turns a training artifact into a model artifact, using the magic of the cloud. Or, specifically, by:

  1. Building a Docker image based on your training artifact and uploading it to a container registry.
  2. Executing that Docker image, saving the resulting model artifact somewhere.
  3. Downloading that model artifact to your local machine.

The current model training drivers supported are:

  • sagemaker (AWS SageMaker)
  • kaggle (Kaggle Kernels)

To learn more about fahr check out the docs.

fahr's People

Contributors

residentmario avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

fahr's Issues

Provide GPU training support

In order to perform GPU-enabled training, we must provide a container image that is GPU-compatible. In the SageMaker case, this means that the container image must inherit from the nvidia-docker container image (as described here).

Include stopped jobs in the list of jobs

To successfully submit a training job to sagemaker the job name provided must be unique. fahr deals with this by sequencing jobs by name, e.g. foo-bar-1, foo-bar-2, etc. But the current mechanism for detecting prior jobs doesn't detect prior stopped jobs (jobs that you quit out of via the Console GUI). The job detection logic needs to be updated to include these.

Provide a dry run command

It's very difficult to configure this tool because of all of the cloud security engineering settings involved. A dry run command would be a best practice for a first attempt because it would check that you can successfully train and download a model (using some trivial example) for you, and return helpful and descriptive error messages as to what went wrong if you can't.

Build Docker images remotely

Currently we build Docker images locally, then push them to a cloud Docker registry.

We'd be a lot better off building the Docker image on the cloud as well. This would allow us to, for example, avoid the problem of figuring out how to get the nvidia-docker image layer working locally. This is thus effectively a prereq for #1.

Provide an init command

alekseylearn init should create a template Dockerfile and run.sh that can be edited manually to get it doing other stuff, if the default Dockerfile and/or run.sh isn't sufficient for our purposes.

Provide a watch option to fit

Currently the library operates in background mode: it launches a job and quits out. It would be great to provide additional watch options:

  • A simple watch option that keeps the program running, creating a message in the console window for you when the job has succeeded or failed and way.
  • A more complete watch option that streams the remote logs to your machine.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.