GithubHelp home page GithubHelp logo

bedrock-py / bedrock-core Goto Github PK

View Code? Open in Web Editor NEW
5.0 4.0 2.0 3.93 MB

Core framework APIs to support Bedrock applications

License: GNU Lesser General Public License v3.0

Python 83.86% Shell 6.70% HTML 1.24% JavaScript 3.73% Dockerfile 0.21% R 4.26%

bedrock-core's Introduction

bedrock-core

Getting started

If you want to get started right away with an example use the QuickStart.pptx file in the docs/ folder to run an example workflow with a locally installed Bedrock server

For development of bedrock-core one can use a conda environment named bedrock which is described in the environment.yml file in the root of the project directory. To load the environment use conda env create environment.yml. then source activate bedrock this will install all the dependencies into a virtual environment managed by conda. We use conda instead of virtualenv because of the dependencies on the pydata stack which is easier to use with anaconda. if you want to install everything by hand you can read this environment.yml to see what is required. After updating a dependency of the project using pip, you can modify the requirements.txt file and run conda env export > environment.yml to reflect the changes in the conda environment.

Running in docker

The file test_docker.sh is a script that uses docker to build a working installation of the bedrock server and runs the unit tests. See this script for the up to date run commands. The run commands will look something like below. You need to run and then exec because we need mongo to be running in the container in order to finish the installation of the opals. Ideally we could remove this step by installing a preconfigured mongo into the server or removing the dependency of the opals on a running mongo. As for now you need to run the container which starts the server and then exec into it the opal install script.

docker build -t bedrock .
ID=$(docker run -p 81:81 -p 82:82 -d  bedrock)

If your organization disallows public DNS servers, you can follow the steps in this guide to instruct Docker to use specific DNS servers.

You can get started on this by installing docker on your machine and then running the script test/test_docker.sh which will created docker containers and run the tests.

Running without docker

If you would like to run this without docker, see the Dockerfile for the required software on your server and then run the scripts by hand. You should need

  • bin/install.sh

As this procedure will change over time, consult the Dockerfile for steps that work.

Code organization

The codebase is organized so that there is a common module of code in source with tests and scripts in ./test ./bin respectively. Each flask app is a top level wsgi app in the src directory, with common code in src/core.

.
|-- Dockerfile
|-- bin
|-- conf
|-- docs
|-- src
|   |-- CONSTANTS.py
|   |-- analytics
|   |-- core
|   |-- dataloader
|   |-- memo
|   |-- user
|   |-- visualization
|   `-- workflows
|-- test
|-- validation
`-- var

Api Documentation

See Swagger.js docs for detailed documentation about the web api.

Bedrock uses a flask restful web api to allow frontend developers to access machine learning and data analytics codes on remote machines and include these algorithms in their end user applications. As such all apis use json to communicate and are organized into three large categories.

The bedrock services is composed of both a bedrock-core server and a collection of packages/modules/libraries called opals. The opals implement the data loading (ETL), analytics (ML/statistics), and the visualizations. A developer of a new technique in one of these categories will write an opal to implement it and then the bedrock server will manage the data, permissions, and scheduling for these implementations. The bedrock-core will provide python libraries for allowing an opal to ignore the backend data storage mechanism as well as the job scheduling mechanism. In fact data may live in multiple backend stores and compute can happen on multiple job schedulers without the opal having explicit knowledge of this fact.

The contract for an opal is that data will come in the form of a DataFrame (from either pandas or SparkSQL) and outputs will be represented in either a DataFrame or a json document. The json documents will go into some NoSQL storage system and DataFrames will go into either a NoSQL document store or a relational database management system RDBMS.

The analytics opals consume DataFrames and produce outputs, the visualization opals consume dataframes and return visualizations that can be shown in the web browser.

bedrock-core's People

Contributors

ascripka avatar ascripka3 avatar bldrake avatar davidediger avatar jpfairbanks avatar mkaplan8 avatar scottagt avatar sirpoovey avatar tgoodyear avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

bedrock-core's Issues

Docker builds take forever

The problem is that the bin/install.sh script is being run as one step in the docker file. If anything that runs before that script changes, all the packages need to be reinstalled. Including the R packages that need to get recompiled.

The Dockerfile needs to be reworked to improve the caching.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.