GithubHelp home page GithubHelp logo

anrgusc / jupiter Goto Github PK

View Code? Open in Web Editor NEW
22.0 9.0 16.0 4.1 GB

Jupiter is an orchestrator for Dispersed (Networked) Computing that uses Docker containers and Kubernetes.

License: Other

Shell 2.73% Python 90.85% Dockerfile 4.72% HTML 0.17% JavaScript 1.47% CSS 0.06%

jupiter's Introduction

Jupiter v5.0

Note: Please see the the k8s_boostrap/mergetb directory for instructions on how to bootstrap a MergeTB cluster. The provided ansible scripts can be adapted and used to bootstrap clusters in other cloud providers.

Jupiter is an orchestrator for Dispersed Computing (distributed computing with networked computers). Jupiter enables complex computing applications that are specified as directed acyclic graph (DAG)-based task graphs to be distributed across an arbitrary network of computers in such a way as to optimize the execution of the distributed computation.

Depending on the task mapper (i.e. scheduling algorithm) used with the Jupiter framework, the optimizations may be for different objectives. For example, the goal may be to try and minimize the total end to end delay (makespan) of the computation for a single set of data inputs. Jupiter includes both centralized task mappers such as one that performs the classical HEFT (heterogeneous earliest finish time) scheduling algorithm. In order to enable optimization-oriented task mapping, Jupiter also provides tools for profiling the application run time on the compute nodes as well as profiling and monitoring the performance of network links between nodes. Jupiter is built on top of Kubernetes and provides a container-based framework for the dispatching and execution of distributed applications at run-time for both single-shot and pipelined (streaming) computations.

The Jupiter system has three main components:

  • Execution Profiler (EP)
  • DRUPE (Network Profiler)
  • Task Mappers
  • CIRCE (Dispatcher)

Profilers

Jupiter comes with two different profiler tools: DRUPE (Network and Resource Profiler) and an one time Execution Profiler.

DRUPE is a tool to collect information about computational resources as well as network links between compute nodes in a dispersed computing system to a central node. DRUPE consists of a network profiler and a resource profiler.

The onetime Execution Profiler is a tool to collect information about the computation time of the pipelined computations described in the form of a directed acyclic graph (DAG) on each of the networked computation resources. This tool runs a sample execution of the entire DAG on every node to collect the statistics for each of the task in the DAG as well as the makespan of the entire DAG.

Task Mappers

Jupiter comes with the following task mappers: HEFT and HEFT Balanced. These mappers effciently map the tasks of a DAG to the processors such that the makespan of the pipelines processing is optimized.

HEFT i.e., Heterogeneous Earliest Finish Time is a static centralized algorithm for a DAG based task graph that efficiently maps the tasks of the DAG into the processors by taking into account global information about communication delays and execution times.

WAVE (supported in v4.0 and under, see releases) is a distributed scheduler for DAG type task graph that outputs a mapping of tasks to real compute nodes by only taking into acount local profiler statistics. Currently we have two types of WAVE algorithms: WAVE Random and WAVE Greedy.

WAVE Random is a very simple algorithm that maps the tasks to a random node without taking into acount the profiler data.

WAVE Greedy is a Greedy algorithm that uses a weighted sum of different profiler data to map tasks to the compute nodes.

CIRCE

CIRCE is a dispatcher tool for dispersed computing, which can deploy pipelined computations described in the form of a directed acyclic graph (DAG) on multiple geographically dispersed computers (compute nodes). CIRCE deploys each task on the corresponding compute node (from the output of HEFT), uses input and output queues for pipelined execution, and takes care of the data transfer between different tasks.

Instructions

Currently supports: Python 3.6

First, setup your Kubernetes cluster and install kubectl. Enable autocompletion for kubectl. Under the k8s_boostrap folder, we have Ansible recipe books in which we use to bootstrap our clusters. These can be used as a blueprint for preparing your cluster.

Clone and install requirements:

git clone [email protected]:ANRGUSC/Jupiter.git
cd Jupiter
pip install -r k8s_requirements.txt

In the application app_config.yaml fill out the node_map key with the hostnames of your k8s cluster. Set the namespace_prefix key and set the k8s_host key for all tasks listed under the nondag_tasks key. See app_specific_files/example/app_config.yaml for an example with instructions.

In jupiter_config.py, set APP_NAME to your application folder name under app_specific_files/. Use APP_NAME = "example" for the example application. Build all containers. Run scripts in separate shells to parallelize.

cd core
python build_push_exec.py
python build_push_profiler.py
python build_push_mapper.py
python build_push_circe.py

Next, run the Execution Profiler, DRUPE (Network Profiler), and Task Mapper. (Shortcut: for a quick start, you can look under core/samples/ for an example mapping.json file and create a custom one for your k8s cluster. Move it to core/mapping.json and skip launching the profilers and mapper entirely.)

python launch_exec_profiler.py
python launch_net_profiler.py
python launch_mapper.py

The Task Mapper will poll the DRUPE home pod until network profiling is complete. This takes about 15 minutes. When it's done, launch_mapper.py will exit and produce a mapping.json file under core/. You can shutdown the profilers.

python delete_all_exec.py
python delete_all_profilers.py

CIRCE will use this to launch the right task as a pod on the correct k8s node.

python launch_circe.py

Use kubectl logs -n={namespace_prefix}-circe to read stdout of task containers. To teardown your application,

python delete_all_circe.py

If you make changes to anything under your application directory, you must rebuild all CIRCE containers before re-running. For example after any code change, you can redeploy using the same mapping.json by following these steps.

python delete_all_circe.py
python build_push_circe.py
python launch_circe.py

Applications:

Jupiter accepts pipelined computations described in a form of a Graph where the main task flow is represented as a Directed Acyclic Graph (DAG). Jupiter also allows the ability to orchestrate containers on specific nodes that are not part of the main DAG. We differentiate the two as "DAG tasks" and "non-DAG tasks."

The example application under app_specific_files/core utilizes the key features of Jupiter. Read the corresponding app_config.yaml to better understand the components.

References

[1] Ghosh, Pradipta, Quynh Nguyen, Pranav K. Sakulkar, Jason A. Tran, Aleksandra Knezevic, Jiatong Wang, Zhifeng Lin, Bhaskar Krishnamachari, Murali Annavaram, and Salman Avestimehr. "Jupiter: a networked computing architecture". In Proceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing Companion, pp. 1-8. 2021.

[2] Poylisher, Alexander, Andrzej Cichocki, K. Guo, J. Hunziker, L. Kant, Bhaskar Krishnamachari, Salman Avestimehr, and Murali Annavaram. "Tactical Jupiter: Dynamic Scheduling of Dispersed Computations in Tactical MANETs". In MILCOM 2021-2021 IEEE Military Communications Conference (MILCOM), pp. 102-107. IEEE, 2021.

[3] Pradipta Ghosh, Quynh Nguyen, and Bhaskar Krishnamachari, “Container Orchestration for Dispersed Computing“, 5th International Workshop on Container Technologies and Container Clouds (WOC ’19), December 9–13, 2019, Davis, CA, USA.

[4] Quynh Nguyen, Pradipta Ghosh, and Bhaskar Krishnamachari, “End-to-End Network Performance Monitoring for Dispersed Computing“, International Conference on Computing, Networking and Communications, March 2018

[5] Pranav Sakulkar, Pradipta Ghosh, Aleksandra Knezevic, Jiatong Wang, Quynh Nguyen, Jason Tran, H.V. Krishna Giri Narra, Zhifeng Lin, Songze Li, Ming Yu, Bhaskar Krishnamachari, Salman Avestimehr, and Murali Annavaram, “WAVE: A Distributed Scheduling Framework for Dispersed Computing“, USC ANRG Technical Report, ANRG-2018-01.

[6] Aleksandra Knezevic, Quynh Nguyen, Jason A. Tran, Pradipta Ghosh, Pranav Sakulkar, Bhaskar Krishnamachari, and Murali Annavaram, “DEMO: CIRCE – A runtime scheduler for DAG-based dispersed computing,”, The Second ACM/IEEE Symposium on Edge Computing (SEC) 2017. (poster)

Acknowledgment

This material is based upon work supported by Defense Advanced Research Projects Agency (DARPA) under Contract No. HR001117C0053. Any views, opinions, and/or findings expressed are those of the author(s) and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

jupiter's People

Contributors

bkrishnamachari avatar caravansary83 avatar iampradiptaghosh avatar jasonatran avatar jiatongw avatar neveisa avatar wasd5 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jupiter's Issues

Create example application

Instead of checking files in for a "dummy app", we should have a very simple example app and call the folder example_multicast.

Unhandled exception in thread started by <function deploy_app_jupiter at 0x7fe925bda400>

What happened (please include outputs or screenshots):
RUN: python3 auto_deploy_system.py
OUTPUT:

Task mapper: Wave random selected
Non pricing scheme selected
CIRCE path-------------------------------------------
/home/master2/Jupiter-develop/circe/original/
1
Task mapper: Wave random selected
Non pricing scheme selected
CIRCE path-------------------------------------------
/home/master2/Jupiter-develop/circe/original/
[8080]
['coded1']
coded1

  • Serving Flask app "auto_deploy_system" (lazy loading)
  • Environment: production
    WARNING: This is a development server. Do not use it in a production deployment.
    Use a production WSGI server instead.
  • Debug mode: off
  • Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)

Task mapper: Wave random selected
Non pricing scheme selected
CIRCE path-------------------------------------------
/home/master2/Jupiter-develop/circe/original/
Deploy WAVE greedy mapper
Task mapper: Wave random selected
Non pricing scheme selected
CIRCE path-------------------------------------------
/home/master2/Jupiter-develop/circe/original/
Exception Occurred
F1112 15:12:32.327170 114217 proxy.go:160] listen tcp 127.0.0.1:8080: bind: address already in use
Exception Occurred
Exception Occurred
Exception Occurred
Task mapper: Wave random selected
Non pricing scheme selected
CIRCE path-------------------------------------------
/home/master2/Jupiter-develop/circe/original/
Exception Occurred


Network Profiling Information:
{}
Execution Profiling Information:
{}



WAVE mapper
Task mapper: Wave random selected
Non pricing scheme selected
CIRCE path-------------------------------------------
/home/master2/Jupiter-develop/circe/original/
{'node1': ['Node1'], 'node2': ['Node2'], 'node3': ['Node3']}
Task mapper: Wave random selected
Non pricing scheme selected
CIRCE path-------------------------------------------
/home/master2/Jupiter-develop/circe/original/
Unhandled exception in thread started by <function deploy_app_jupiter at 0x7fe925bda400>
Traceback (most recent call last):
File "auto_deploy_system.py", line 361, in deploy_app_jupiter
k8s_jupiter_deploy(app_id,app_name,port)
File "auto_deploy_system.py", line 133, in k8s_jupiter_deploy
task_mapping_function(profiler_ips,execution_ips,node_names,app_name)
File "auto_deploy_system.py", line 63, in task_mapping
return f(args[0],args[3])
File "/home/master2/Jupiter-develop/mulhome_scripts/k8s_wave_scheduler.py", line 124, in k8s_wave_scheduler
ser_resp = api.create_namespaced_service(namespace, home_body)
File "/home/master2/.local/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 6820, in create_namespaced_service
(data) = self.create_namespaced_service_with_http_info(namespace, body, **kwargs)
File "/home/master2/.local/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 6905, in create_namespaced_service_with_http_info
collection_formats=collection_formats)
File "/home/master2/.local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 321, in call_api
_return_http_data_only, collection_formats, _preload_content, _request_timeout)
File "/home/master2/.local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 155, in __call_api
_request_timeout=_request_timeout)
File "/home/master2/.local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 364, in request
body=body)
File "/home/master2/.local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 266, in POST
body=body)
File "/home/master2/.local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 222, in request
raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (409)
Reason: Conflict
HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'Date': 'Tue, 12 Nov 2019 07:12:33 GMT', 'Content-Length': '208'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"services "coded1-home" already exists","reason":"AlreadyExists","details":{"name":"coded1-home","kind":"services"},"code":409}

Environment:

  • Jupiter version: 4.0

  • Kubernetes version (kubectl version): v1.15.2

  • OS: Ubuntu 18.04

  • Python version: 3.6.9

@MaLeiOnline we have a working version of Jupiter v5.0 in the following branch. This version is a lot easier to use, and the top-level readme provides some new instructions. It does not, however, have instructions on setting up k8s.

@MaLeiOnline we have a working version of Jupiter v5.0 in the following branch. This version is a lot easier to use, and the top-level readme provides some new instructions. It does not, however, have instructions on setting up k8s.

https://github.com/ANRGUSC/Jupiter/tree/finalccdag

Originally posted by @jasonatran in #30 (comment)

Containerized unit test for single task

Once #28 is finished, a developer who codes a single task may still have dependency issues when it is finally launched along with an entire project. We need a way to allow developers to smoke test a single task by building and running a container which either fakes inputs or creates 0 inputs just to see if the task runs without error.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.