GithubHelp home page GithubHelp logo

illinois-cs241 / broadway Goto Github PK

View Code? Open in Web Editor NEW
18.0 23.0 0.0 447 KB

A distributed systems framework used running distributable workloads.

License: Other

Python 98.78% Dockerfile 0.54% Shell 0.68%
autograder python broadway

broadway's Introduction

Broadway

Build Status Coverage Status License Python Versions

The Broadway is a distributed grading service that receives, executes, and keeps track of grading jobs and runs.

The aim of this project is to provide a generic interface to a distributed autograding system that can be used by multiple courses. Broadway aims to provide the following benefits:

  • More stable and reliable grading runs. No one student can break the entire AG run.
  • Faster grading runs. Multiple machines can grade the same assignment.
  • Easier tracking and debugging of student failures during grading.
  • A more consistent environment to grade student code.
  • Easier to scale out the infrastructure.

Please read the Wiki for documentation. It explains how Broadway works and how to interact with it. Please be sure to read all the pages if you are planning on using Broadway.

See our contribution guidelines if you want to contribute.

Requirements

MongoDB must be installed and the mongod daemon must be running locally before starting the API. Default options are usually sufficient (but for security purposes, be sure to disallow external access to the store).

Python 3.6 is the minimum supported interpreter version. Python 3.7 should also work just fine.

To install the dependencies (with venv)

python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt

Configuration

Most of our configuration variables can be set from three sources: command-line flags, environment variables, config file, in the order of decreasing precedence.

Broadway API and grader

API and grader are two major parts of a broadway cluster. API is in charge of receiving and scheduling jobs across graders, while graders have the simple job of executing them in containers.

To bring up a functioning broadway cluster, you have spin up API first, then connect grader to the API using the authentication token (either given or automatically generated).

Running the API

(After installing requirements)

python3 -m broadway.api [--token TOKEN] [--bind-addr ADDR] [--bind-port PORT]

More info can be found by running python3 -m broadway.api --help

Running the grader

broadway.grader takes two positional arguments, where TOKEN is the cluster token in API, and GRADER_ID should be a unique identifier of the grader (and only letters, digits, and dashes are allowed)

API_ADDR points to where API was bound to along with the protocol you wish to use. e.g. ws://127.0.0.1:1470 means that grader should find API at 127.0.0.1:1470 and use the websocket version of our protocol.

python3 -m broadway.grader <TOKEN> <GRADER_ID> [--api-host API_ADDR]

More info can be found by running python3 -m broadway.grader --help

broadway's People

Contributors

andyclee avatar ayushr2 avatar bhuvy2 avatar ezhang887 avatar jhenhapl avatar kipyminyman avatar nd-0r avatar nmagerko avatar rod-lin avatar st-arry avatar xiangmingchen avatar zhengyao-lin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

broadway's Issues

Nice to Have: Deploy script

Right now you can deploy the api after knowing a good amount about the internals, setting up mongo etc.

We have codified setup in .travis.yml, but it'd be nice to refactor this to a deploy script for Ubuntu, so we can

  • Use this for travis testing
  • Provision machines with a one step install

I'd recommend fab, but as always there are other ways to skin a cat.

Can we get rid of the need to run as root?

As usual, running a piece of code as root unless absolutely necessary is just good security practice. Can we have a script that creates a new user and sets up that user to be a part of the docker group instead of root?

Improve Linter Checks

Following up from illinois-cs241/broadway-api#58, some warnings like shadowing variable names, type mismatch warning slip by flake8. We should be able to catch such things and not let them slip into production code.

Maybe change the linter? But have been unsuccessful in finding linters which can do this.

Adding Tests + CI

The CI can be mostly copied over from broadway-api, but it'd be nice to have some assurance checks on each commit.

Separate runner logic from grader

Right now there is essentially a single file that has all of the functions needed to start up the grader and run it.

I would suggest we make a Grader class that contains any grading logic and that we put it inside of the grader module. Then at the top level (outside of the module) we have a run.py that instantiates the grader and gets things running.

I think this would make things a bit easier to maintain.

Eventual Jepsen Testing

Assuming and giving a once over of a distributed system is good -- actually verifying it is a little better.

We don't need to do this immediately, but to be a robust piece of software, we should put in some Jepsen Tests for fault tolerance. We'll need to assume just a few likely cases (we aren't trying to make this a true fault tolerant distributed system),

  • (this) API Fails
  • Graders Start Failing in
    • A few at a time
    • Waves
  • Various network delays cause graders to come back online (little less likely).

Improve Testing

Currently, the tests do not test the following:

  • The DB contents after jobs have been scheduled/executed or after courses are uploaded.
  • Worker node failure. We will have to mock the variable which decided how often the heartbeat validator runs and we can make the testing thread to sleep (after polling a job) and then check DB contents to assert that the job is marked as failed.
  • Worker node state (alive/dead)

Moving to docker swarm

Since we have containerized broadway API and grader, it would be a natural step to migrate the entire cluster to docker swarm (or other container orchestration system).

This is mostly an issue of configuration. There are a few things to solve in my limited experience with docker swam:

  • broadway grader needs to interact with a docker daemon. dind doesn't work in this case since there are some permission issues (docker swam doesn't allow privileged container).
  • broadway API need a mongodb instance. So we also need to figure out how to expose a mongodb service to the containers in docker swarm.

A few potential benefits in doing this:

  • auto-deployment
  • more fault-tolerancy
  • easier to monitor the status of the entire cluster (?)

Test Reconnecting Nodes

We should test that when a node reconnects, their info is sustained and they can continue from where they left.

Grader dies while running job

When the grader dies while running a job, that job should be marked as failed. Which does not seem to be the case as shown below:
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.