GithubHelp home page GithubHelp logo

doytsujin / caladrius Goto Github PK

View Code? Open in Web Editor NEW

This project forked from twitter/caladrius

0.0 1.0 0.0 1.42 MB

Performance modelling system for Distributed Stream Processing Systems (DSPS) such as Apache Heron and Apache Storm

License: Apache License 2.0

Python 100.00%

caladrius's Introduction

Caladrius

Performance modelling for Distributed Stream Processing Systems (DSPS) such as Apache Heron and Apache Storm.

Full details can be found on the documentation site.

NOTE: Caladrius is a prototype project, which is the result of a 3 month internship with Twitter's Real Time Compute Team. It should be considered alpha level software. All contributions are welcome, please see the contributing page on the documentation website for more details.

Setup

Python

Caladrius requires Python 3.6, additional Python dependencies are listed in the Pipfile. Dependencies can be installed using pipenv by running the following command in the caladrius root directory:

$ pipenv install 

Add the --dev flag to the above command to install development dependencies.

Caladrius should also be added to your PYTHONPATH. The best way to do this is by adding the folder above the Caladrius repo to the PYTHONPATH environment variable using a command like the one below:

$ export PYTHONPATH=$PYTHONPATH:<path/to/folder/above/caladrius>

This line should be added to your .profile (or similar) start up script to preserve this across reboots.

####Troubleshooting Errors such as the following entail that the PYTHONPATH is not set correctly.

File "app.py", line 15, in <module>
    from caladrius import logs

Another way to ensure that pipenv is able to read the environment variables is to create a .env file in the project directory and add the PYTHONPATH there.

Graph Database

Caladrius requires a Gremlin Server instance running TinkerPop 3.3.2 or higher.

The reference gremlin sever can be downloaded from here.

The Gremlin server should have the Gremlin Python plugin installed:

$ gremlin-server.sh install org.apache.tinkerpop gremlin-python 3.3.3

Start the server with the gremlin python config (included in the standard server distribution):

$ bin/gremlin-server.sh start conf/gremlin-server-modern-py.yaml

Please note: The default settings for the Gremlin Server result in an in-memory TinkerPop Server instance. If graphs need to be persisted to disk then these settings can be altered in the appropriate configuration file in the conf directory of the Gremlin Server distribution.

Running Caladrius

Configuration

All configuration is done via the yaml file provided to the app.py script (see section below). This file defines the models run by the various API endpoints and any connection details, modelling variables or other configurations they may require.

An example configuration file with sensible defaults is provided in config/main.yaml.example. You should copy this and edit it with your specific configurations.

Starting the API Server

The Caladrius API server can be started by running the app.py script in the root directory. This can be run in the appropriate virtual environment using pipenv (make sure your python command points to Python 3):

$ pipenv run python app.py --config /path/to/config/file

Additional command line arguments are available via:

$ pipenv run python app.py --help

Documentation

Documentation for stable releases is hosted on ReadTheDocs.

If you want to build the latest documentation then this can be done using Sphinx. Assuming you have installed the development dependencies above, the docs can be built using the following commands in the repository root:

$ pipenv run sphinx-apidoc -f -o docs/source . tests/*
$ cd docs
$ pipenv run make html

This will place the constructed html documentation in the docs/build directory.

Security

If you spot any security or other sensitive issues with the software please report them via the Twitter HackerOne bug bounty program.

Using the API

The software provides multiple endpoints for a user to find out how different packing plans will perform for a single topology. Here, we provide examples of how to call the APIs from the command-line.

Heron Current API

In this example, the WindowedWordCountTopoology has three components (spouts -> bolt -> bolt). Each operator in the job has one running task/instance only.

curl -H 'Content-Type: application/json' -d '{ "1" : {"default": 101.4}, "2": {"default": 104.3}, "3" : {"default" : 101.5}  }'  
-X POST "<Caladrius URL>:5000/model/topology/heron/current/WindowedWordCountTopology?cluster=<cluster-name>&environ=test&model=queueing_theory&source_hours=2"

Caveat: Caladrius currently does not support calculations for topologies that have only two levels. This is because a topology with two levels consists of spouts (with only outgoing streams) and sink bolts (with possibly only incoming streams). Some of Caladrius' calculations such as measuring the input to output tuple ratios cannot be applied to such operators.

caladrius's People

Contributors

decause avatar faria-kalim avatar huijunw avatar juliaferraioli avatar nwangtw avatar tomncooper avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.