GithubHelp home page GithubHelp logo

resonantgeodata / rd-oasis Goto Github PK

View Code? Open in Web Editor NEW
2.0 6.0 1.0 244 KB

ResonantGeoData OASIS Deployment

License: Apache License 2.0

Shell 0.71% Dockerfile 1.18% Python 58.62% HTML 12.08% Vue 18.15% TypeScript 1.70% JavaScript 0.08% HCL 7.41% Procfile 0.07%
girder-4 django-project hacktoberfest

rd-oasis's Introduction

RD-OASIS

What is OASIS?

OASIS is a Django application designed to allow users to define and run arbitrary algorithms, tracking the progress and results of such algorithms in Django.

The Algorithm Execution Lifecycle

These are the steps that are taken by tasks to execute an algorithm. This is generalized between Kubernetes and Celery, as the steps are the same at a high level.

lifecycle

Main Components

These are the main components that make up OASIS. Each of these components is a Django model, and each entry here will contain a short description, followed by the fields associated with each.

Algorithm

Algorithms are at the core of OASIS, and contains much of the required definitions for any process you want to run.

  • Name - The name of the Algorithm
  • Docker Image - The docker image (documented below) that this algorithm will use.
  • Command - The command used to invoke your algorithm
  • Entrypoint - If necessary, override the default entrypoint of your docker image.
  • Environment - Any environment variables that should be passed into the container when running your algorithm.
  • GPU - Whether GPU access is required by this algorithm.

Algorithm Task

Algorithm tasks are individual runs of an algorithm. Algorithm tasks are isolated from each other, only sharing the underlying algorithm itself.

  • Algorithm - The algorithm which this task belongs to.
  • Status - The current status of this task (its progress in the task lifecycle). One of:
    • Created
    • Queued
    • Running
    • Failed
    • Succeeded
  • Output Log - The output (stdout) of the algorithm, stored as a text field.
  • Input Dataset - The dataset containing the input, to be mounted to and copied into the /<working_dir>/input directory.
  • Output Dataset - The dataset containing any files produced by the algorithm (any files placed into the /<working_dir>/output directory).

Docker Image

This is the defintion of the image to be used when running your algorithm. This docker/container image contains the necessary environment to run your algorithm. Generally, any necessary files, libraries, packages, etc. that are required by your algorithm to run, are included in this image.

  • Name - The name of the image
  • Image ID - The id of this image on Docker Hub.
  • Image File - If this image is uploaded directly to the API, instead of on docker hub, then this field points to the file which contains it.

Dataset

A Dataset is a container of files to be used by algorithms. This is used to facilitate both input to and output from an algorithm task.

  • Name - The name of the dataset.
  • Files - The files contained within this dataset.
  • Size - The size (in bytes) of this dataset.

rd-oasis's People

Contributors

banesullivan avatar github-actions[bot] avatar jjnesbitt avatar mvandenburgh avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rd-oasis's Issues

Generally refactor/cleanup the algorithms app

Rethink these algorithms models and clean them up - the goal is to remove as much technical debt as possible

We should also pull in @AlmightyYakob to think about Danesfield project needs here since I believe there are overlapping goals (perhaps this algorithms app will be made into a re-usable app)

Enable GPU execution of Algorithms

Enable users to run the model on GPU or CPU

This will likely be implemented as a field on the Algorithm model such as use_gpu with the options no, optional, and required

Use RGD's ChecksumFile (or Raster) for data fields

The tasks here use S3FileField directly. We should use RGD's file models... or if we know tasks will be running on rasters, they should point to Raster so that the input/output will be searchable in RGD's UI

CI and Testing

Implement tests for the algorithms app and run those tests on CI

Add tests

There's many tests needed for the algorithms app, which have yet to be implemented.

Add specific dataset file add/remove endpoints

Currently, the only way to modify which files are in a dataset is to submit a PUT, PATCH, or DELETE request to the dataset detail view, which specifying all of the files which should now belong in the files field. For some reason, this is very slow. An approach that might be more performant is to add an endpoint for adding/removing files. I was thinking of the following structure for the request payload:

{
  "insert": [
    1,
    2
  ],
  "delete": [
    12,
    21
  ]
}

Additionally, a /clear endpoint could be added onto the dataset view, which as a convenience, would just clear the dataset in one request, with no payload.

Add tree endpoint for Dataset ViewSet

This should function the same as the tree endpoint for checksum files, but only applied to files in a particular dataset. The endpoint would be /datasets/{id}/tree.

Handle custom containers

Currently, custom containers (one's that are uploaded to django directly) aren't handled, only containers stored on dockerhub are. IMO the way to handle this is to use Amazon ECR to store these custom containers. Then the Kubernetes Job would specify that the image should be pulled from this repo, instead of dockerhub. Here are some kubernetes docs regarding this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.