GithubHelp home page GithubHelp logo

empiricalci / emp Goto Github PK

View Code? Open in Web Editor NEW
42.0 4.0 7.0 239 KB

:microscope: Empirical CLI

Home Page: https://empiricalci.com

License: MIT License

JavaScript 99.30% Dockerfile 0.70%
science ai docker artificial-intelligence reproducible-research reproducible-science benchmark-framework benchmarking

emp's Introduction

Empirical Library

Use

from empiricalci import empiricalci
...
empiricalci.saveOverall('average', avg)

emp's People

Contributors

alantrrs avatar ivangzz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

emp's Issues

buildImage is calling done() multiple times

  1) buildImage should reject if there is an error:
     Error: done() called multiple times
      at Suite.<anonymous> (test/index.js:149:3)
      at Object.<anonymous> (test/index.js:148:1)
      at Array.forEach (native)
      at node.js:972:3

  2) buildImage should reject if there is an error:
     Error: done() called multiple times
      at Suite.<anonymous> (test/index.js:149:3)
      at Object.<anonymous> (test/index.js:148:1)
      at Array.forEach (native)
      at node.js:972:3

Update Readme for final users

  • README.md should have information directed to the final user on how to setup and run emp.
  • Information directed towards developers should be moved to a DEVELOPMENT.md

Add workspace and data volumes

  • Data directory should persist all builds
  • Create workspace directories per build session
  • mount the /data and /workspace volumes to solver and evaluator
  • Upload workspace contents

Post Results back to the server

  • Create an experiment in the database first
  • Pass the _id of the experiment to the evaluator/standalone container along with API credentials via environment variables
  • The evaluator/standalone container uses the empirical library to post the results back to the server

Alternative to Post

Save the json files in the workspace and list the results in the YML file:

results: /workspace/results.json

Save logs

  • Save them to the /workspace directory for the experiment
  • Upload them to the server when running with --save

Experiment gets stuck

Sometimes the solver terminates before the solver starts.

ci-worker-1 | 2016-03-25T01:25:57.447228208Z RUN EXPERIMENTS: [ { evaluator: 'empirical-bot/my-evaluator:EJJoSOppg' } ]
ci-worker-1 | 2016-03-25T01:25:57.447378981Z RUNNING EXPERIMENT: { evaluator: 'empirical-bot/my-evaluator:EJJoSOppg' }
ci-worker-1 | 2016-03-25T01:25:57.724781683Z RUN LINKED { image: 'empirical-bot/my-solver:VylkzKppg',
ci-worker-1 | 2016-03-25T01:25:57.724846986Z   name: 'solver-56f493a55cfe6d0300d2427e' } { image: 'empirical-bot/my-evaluator:EJJoSOppg',
ci-worker-1 | 2016-03-25T01:25:57.725054188Z   name: 'evaluator-56f493a55cfe6d0300d2427e',
ci-worker-1 | 2016-03-25T01:25:57.725165582Z   env: 
ci-worker-1 | 2016-03-25T01:25:57.725203244Z    [ 'EXPERIMENT_ID=56f493a55cfe6d0300d2427e',
ci-worker-1 | 2016-03-25T01:25:57.725331876Z      'EMPIRICAL_API_URL=http://qa.empiricalci.com/api/x' ] }
ci-worker-1 | 2016-03-25T01:25:57.725446260Z solver-56f493a55cfe6d0300d2427e: Running container
ci-worker-1 | 2016-03-25T01:25:57.753841674Z solver-56f493a55cfe6d0300d2427e: Container created
ci-worker-1 | 2016-03-25T01:25:57.753895885Z evaluator-56f493a55cfe6d0300d2427e: Running container
ci-worker-1 | 2016-03-25T01:25:57.759062220Z evaluator-56f493a55cfe6d0300d2427e: Finished running
ci-worker-1 | 2016-03-25T01:25:57.759158691Z DATA: null
ci-worker-1 | 2016-03-25T01:25:57.759680221Z CONTAINER: null
ci-worker-1 | 2016-03-25T01:25:57.762660314Z stopping container 147e9d963746b717c94073cc1ae84532b3f75d94b49533713097f270d8120c31
ci-worker-1 | 2016-03-25T01:25:57.981389252Z Empirical: Solver is now running

Ability to install via npm

Currently emp is being distributed via a docker image, however this might be too heavy specially for users with node.

Parallel runs

Be able to execute multiple runs of a protocol in parallel.

Build image output is failing due to parse on OS X

BUILD:
{"stream":"Step 1 : FROM python:2.7\n"}
{"stream":" ---\u003e f9a9ac5dcfb8\n"}{"stream":"Step 2 : RUN pip install numpy\n"}
undefined:2
{"stream":"Step 2 : RUN pip install numpy\n"}
^

SyntaxError: Unexpected token {
    at Object.parse (native)
    at IncomingMessage.<anonymous> (/Users/empirical/workspace/emp/lib/build-image.js:16:30)
    at emitOne (events.js:77:13)
    at IncomingMessage.emit (events.js:169:7)
    at readableAddChunk (_stream_readable.js:153:18)
    at IncomingMessage.Readable.push (_stream_readable.js:111:10)
    at HTTPParser.parserOnBody (_http_common.js:127:22)
    at Socket.socketOnData (_http_client.js:322:20)
    at emitOne (events.js:77:13)
    at Socket.emit (events.js:169:7)

Looks like 2 json lines are being merged into one, which makes it invalid json

Run experiment containers interactively

While developing, you may want to explore your container and do multiple tests in the same container without having to build a new image and launch a container every time that you make changes. In this scenario having the option to run experiments interactively is desired.

  • The command would be something like: emp run -i protocol /path/to/code
  • The protocol would have to be marked as interactive and define an interactive entrypoint/command
  • An option to mount the source code is probably desired to persist any changes made to the code

CLI commands for demo

  • emp listen should launch the listen.js script
  • emp run experiment-name /path/ should run a experiment from the given path
  • emp run empiricalci/hello-world/sample/x35cg_d3 should fetch the experiment information from empiricalci.com and run it
  • emp shows usage instructions & version
  • emp versions returns version

Fix Mac run script

It's currently failing due to the use of readlink to convert from relative to absolute path. Maybe use this:

#!/bin/bash

realpath() {
    [[ $1 = /* ]] && echo "$1" || echo "$PWD/${1#./}"
}

realpath "$0"

from here

Separate running and reporting

  • Separate core library from CLI
  • Define report schema
  • emp run will now run the experiment locally only and generate a report of the experiment
  • emp push should publish the report to empiricalci.com

Optimize tests

The experiments that are being currently used to run the tests are too slow. Use dummy experiments instead, to optimize the tests.

Each experiment should build + run independently.

Refactor experiments to build the images related to one experiment first and then execute the experiment. Then move to the next experiment.

In the future this will also allow to parallelize experiments later

Save experiment status to empiricalci.com

The user is assumed to have pushed the commit that is going to be tested to GitHub first. This will trigger a webhook that will create a new version on the server without any experiments.

  • The user will run emp run experiment /path/to/code --save user/project
  • The command will search empiricalci.com for the version associated with the current sha.
  • If a version is found , the experiment will be created and associated with that version
  • If a version is not found, the experiment will not run and it will throw the following error: "Couldn't find theversion with sha 99fb56a on project user/proejct. Did you pushed this commit to GitHub?"

NOTE: Depending on feedback, future work will be saving to empiricalci.com experiments not associated with GitHub.

Hyper-parameters tracking

Some questions to get started:

  • Are the parameters initialized from a file or at runtime?
  • What's the format of the parameters?
  • What's the format of a protocol with hyper-params?
  • How are they're going to be displayed in the dashboard?

Run docker containers from a docker container

  • emp should be a docker container
  • emp should launch and build other containers using the following technique:

From here. The simplest way is to just expose the Docker socket to your CI container, by bind-mounting it with the -v flag.

Simply put, when you start your CI container (Jenkins or other), instead of hacking something together with Docker-in-Docker, start it with:

docker run -v /var/run/docker.sock:/var/run/docker.sock ...

Now this container will have access to the Docker socket, and will therefore be able to start containers. Except that instead of starting "child" containers, it will start "sibling" containers.

If your CI makes use of the Docker binary in scripts, you can include it in your CI image, or bind-mount it from the host was well. Example:

docker run -v /var/run/docker.sock:/var/run/docker.sock \
           -v $(which docker):/bin/docker \
           -ti ubuntu

Track times

  • Dataset download
  • Image build/pull
  • Experiment runtime

Allow users to clone public repos

Don't require keys for public experiments. So that public repos can be cloned by other users:

Try to get keys
If (unauthorized) 
    try using https url without keys 
       If it fails: return unauthorized 
       if it succeeds continue with the process

How to keep track of uncommitted code?

There are a few scenarios:

  • Keep track of code with no version control at all
  • Keep track of code under Git, but uncommitted
  • Keep track of code that's committed locally, but not pushed to a server

Only mount to the container the data files specified in the dataset

Currently the whole data directory is being mounted to /data on the experiment container. The user then needs to refer to the files using its hash which is not very user friendly.
We could instead mount the files individually and refer to them by the key name.

For example a data file with the following config:

"myfile.csv": {
  "url": "http://example.com/myfilesdsds.csv",
  "hash": "sdfd39asaea334353fdfh4sfsd3432353"
}

would be mounted from the host path /home/empirical/data/sdfd39asaea334353fdfh4sfsd3432353
to the container path /data/myfile.csv

Push and Pull images

  • Push images once they're built
  • Pull images when they're not existent on the host

GPU support

  • Add GPU support on Linux using nvidia-docker
  • Document that GPU support is not available on windows nor OSX for now
  • Pass GPU_ENABLED environment variable to experiment container

Data management features

- [ ] Package a directory and get hash emp data tar directory

  • Get hash from directory emp data hash directory
  • Get hash from a file emp data hash file
  • Download file emp data get file
  • Download and extract directory emp data get directory

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.