empiricalci / emp Goto Github PK

View Code? Open in Web Editor NEW

42.0 4.0 7.0 239 KB

:microscope: Empirical CLI

Home Page: https://empiricalci.com

License: MIT License

JavaScript 99.30% Dockerfile 0.70%

science ai docker artificial-intelligence reproducible-research reproducible-science benchmark-framework benchmarking

emp's Introduction

Empirical Library

Use

from empiricalci import empiricalci
...
empiricalci.saveOverall('average', avg)

emp's People

Contributors

Stargazers

Watchers

Forkers

gitter-badger empirical-bot brandoncurtis ddbs codeaudit

emp's Issues

buildImage is calling done() multiple times

  1) buildImage should reject if there is an error:
     Error: done() called multiple times
      at Suite.<anonymous> (test/index.js:149:3)
      at Object.<anonymous> (test/index.js:148:1)
      at Array.forEach (native)
      at node.js:972:3

  2) buildImage should reject if there is an error:
     Error: done() called multiple times
      at Suite.<anonymous> (test/index.js:149:3)
      at Object.<anonymous> (test/index.js:148:1)
      at Array.forEach (native)
      at node.js:972:3

Update Readme for final users

README.md should have information directed to the final user on how to setup and run emp.
Information directed towards developers should be moved to a DEVELOPMENT.md

Add workspace and data volumes

Data directory should persist all builds
Create workspace directories per build session
mount the /data and /workspace volumes to solver and evaluator
Upload workspace contents

Post Results back to the server

Create an experiment in the database first
Pass the _id of the experiment to the evaluator/standalone container along with API credentials via environment variables
The evaluator/standalone container uses the empirical library to post the results back to the server

Alternative to Post

Save the json files in the workspace and list the results in the YML file:

results: /workspace/results.json

Implement configuration workflow

Default to $HOME/empirical
emp configure will allow to change the empirical directory

Save logs

Save them to the /workspace directory for the experiment
Upload them to the server when running with --save

Make sure tests pass for new seeds using saveOverall

After updating all 3 branches of https://github.com/empiricalci/hello-world to use empiricalci.saveOverall(metric, value) and updating the seeds to use the latest commits, make sure that the tests are passing.

Use auth token instead of keys for git clone

Experiment gets stuck

Sometimes the solver terminates before the solver starts.

ci-worker-1 | 2016-03-25T01:25:57.447228208Z RUN EXPERIMENTS: [ { evaluator: 'empirical-bot/my-evaluator:EJJoSOppg' } ]
ci-worker-1 | 2016-03-25T01:25:57.447378981Z RUNNING EXPERIMENT: { evaluator: 'empirical-bot/my-evaluator:EJJoSOppg' }
ci-worker-1 | 2016-03-25T01:25:57.724781683Z RUN LINKED { image: 'empirical-bot/my-solver:VylkzKppg',
ci-worker-1 | 2016-03-25T01:25:57.724846986Z   name: 'solver-56f493a55cfe6d0300d2427e' } { image: 'empirical-bot/my-evaluator:EJJoSOppg',
ci-worker-1 | 2016-03-25T01:25:57.725054188Z   name: 'evaluator-56f493a55cfe6d0300d2427e',
ci-worker-1 | 2016-03-25T01:25:57.725165582Z   env: 
ci-worker-1 | 2016-03-25T01:25:57.725203244Z    [ 'EXPERIMENT_ID=56f493a55cfe6d0300d2427e',
ci-worker-1 | 2016-03-25T01:25:57.725331876Z      'EMPIRICAL_API_URL=http://qa.empiricalci.com/api/x' ] }
ci-worker-1 | 2016-03-25T01:25:57.725446260Z solver-56f493a55cfe6d0300d2427e: Running container
ci-worker-1 | 2016-03-25T01:25:57.753841674Z solver-56f493a55cfe6d0300d2427e: Container created
ci-worker-1 | 2016-03-25T01:25:57.753895885Z evaluator-56f493a55cfe6d0300d2427e: Running container
ci-worker-1 | 2016-03-25T01:25:57.759062220Z evaluator-56f493a55cfe6d0300d2427e: Finished running
ci-worker-1 | 2016-03-25T01:25:57.759158691Z DATA: null
ci-worker-1 | 2016-03-25T01:25:57.759680221Z CONTAINER: null
ci-worker-1 | 2016-03-25T01:25:57.762660314Z stopping container 147e9d963746b717c94073cc1ae84532b3f75d94b49533713097f270d8120c31
ci-worker-1 | 2016-03-25T01:25:57.981389252Z Empirical: Solver is now running

Build is failing undetected

If there's an error during the build, it's not being caught anywhere.

Replace slash with @ on pubnub channels

Because channels with / are not valid channel names.

Fix current experiment page

Remove the mockups that are in there

emp is not checking out the correct version of the project

It's not checking out the correct version. Check the logs. (LOGS.log)

Fail if exit code from docker execution != 0

Ability to install via npm

Currently emp is being distributed via a docker image, however this might be too heavy specially for users with node.

Parallel runs

Be able to execute multiple runs of a protocol in parallel.

Refactor build and experiments runner

Hard requirements

Build multiple images
Be able to launch multiple experiments per build

TBD

~~Separate into:~~

~~Build worker~~
~~Experiments worker~~

Build image output is failing due to parse on OS X

BUILD:
{"stream":"Step 1 : FROM python:2.7\n"}
{"stream":" ---\u003e f9a9ac5dcfb8\n"}{"stream":"Step 2 : RUN pip install numpy\n"}
undefined:2
{"stream":"Step 2 : RUN pip install numpy\n"}
^

SyntaxError: Unexpected token {
    at Object.parse (native)
    at IncomingMessage.<anonymous> (/Users/empirical/workspace/emp/lib/build-image.js:16:30)
    at emitOne (events.js:77:13)
    at IncomingMessage.emit (events.js:169:7)
    at readableAddChunk (_stream_readable.js:153:18)
    at IncomingMessage.Readable.push (_stream_readable.js:111:10)
    at HTTPParser.parserOnBody (_http_common.js:127:22)
    at Socket.socketOnData (_http_client.js:322:20)
    at emitOne (events.js:77:13)
    at Socket.emit (events.js:169:7)

Looks like 2 json lines are being merged into one, which makes it invalid json

Run experiment containers interactively

While developing, you may want to explore your container and do multiple tests in the same container without having to build a new image and launch a container every time that you make changes. In this scenario having the option to run experiments interactively is desired.

The command would be something like: emp run -i protocol /path/to/code
The protocol would have to be marked as interactive and define an interactive entrypoint/command
An option to mount the source code is probably desired to persist any changes made to the code

Nodegit install is failing on node v7

See nodegit/nodegit#1153

CLI commands for demo

emp listen should launch the listen.js script
emp run experiment-name /path/ should run a experiment from the given path
emp run empiricalci/hello-world/sample/x35cg_d3 should fetch the experiment information from empiricalci.com and run it
emp shows usage instructions & version
emp versions returns version

Fix Mac run script

It's currently failing due to the use of readlink to convert from relative to absolute path. Maybe use this:

#!/bin/bash

realpath() {
    [[ $1 = /* ]] && echo "$1" || echo "$PWD/${1#./}"
}

realpath "$0"

from here

Implement multiple authentication methods for Git

user/password
ssh agent

Create a small image using Alpine linux

WARNING: nodegit doesn't build correctly on alpine linux. See https://github.com/cucumber/alpine-node-nodegit/blob/master/Dockerfile

Separate running and reporting

Separate core library from CLI
Define report schema
emp run will now run the experiment locally only and generate a report of the experiment
emp push should publish the report to empiricalci.com

Test CLI on Mac

Install
Configure
Run experiment

Separate the client api from and test it with the server

That way this repo can run all its test without depending on the Server.

Optimize tests

The experiments that are being currently used to run the tests are too slow. Use dummy experiments instead, to optimize the tests.

Test CLI on Linux

Install
Configure
Run

Each experiment should build + run independently.

Refactor experiments to build the images related to one experiment first and then execute the experiment. Then move to the next experiment.

In the future this will also allow to parallelize experiments later

Consolidate host and emp paths

They should be the same path.
Make sure the don't overwrite emp install dir.

Install script for Windows

Create a install script
Document installation and update workflow

Save experiment status to empiricalci.com

The user is assumed to have pushed the commit that is going to be tested to GitHub first. This will trigger a webhook that will create a new version on the server without any experiments.

The user will run emp run experiment /path/to/code --save user/project
The command will search empiricalci.com for the version associated with the current sha.
If a version is found , the experiment will be created and associated with that version
If a version is not found, the experiment will not run and it will throw the following error: "Couldn't find theversion with sha 99fb56a on project user/proejct. Did you pushed this commit to GitHub?"

NOTE: Depending on feedback, future work will be saving to empiricalci.com experiments not associated with GitHub.

Hyper-parameters tracking

Some questions to get started:

Are the parameters initialized from a file or at runtime?
What's the format of the parameters?
What's the format of a protocol with hyper-params?
How are they're going to be displayed in the dashboard?

Use new "protocols" nomenclature on empirical.yml

Replace "experiments" for "protocols" to be in line with the rest of the system's nomenclature

Make sure emp working dir is not overwritten when mounting volumes

When using the docker distribution, make sure that the install path is not overwritten when mounting files and directories from the host.

Windows compatibility

Windows compatibility using node.

Run docker containers from a docker container

emp should be a docker container
emp should launch and build other containers using the following technique:

From here. The simplest way is to just expose the Docker socket to your CI container, by bind-mounting it with the -v flag.

Simply put, when you start your CI container (Jenkins or other), instead of hacking something together with Docker-in-Docker, start it with:

docker run -v /var/run/docker.sock:/var/run/docker.sock ...

Now this container will have access to the Docker socket, and will therefore be able to start containers. Except that instead of starting "child" containers, it will start "sibling" containers.

If your CI makes use of the Docker binary in scripts, you can include it in your CI image, or bind-mount it from the host was well. Example:

docker run -v /var/run/docker.sock:/var/run/docker.sock \
           -v $(which docker):/bin/docker \
           -ti ubuntu

User friendly errors

Only print stack trace on development mode

Track times

Dataset download
Image build/pull
Experiment runtime

Cache a local file or directory

Get hash
Move file or dir to cache folder

Update status of experiments

Define install & update workflow

Define Installation and update workflow for:

Linux
Mac
Windows

Allow users to clone public repos

Don't require keys for public experiments. So that public repos can be cloned by other users:

Try to get keys
If (unauthorized) 
    try using https url without keys 
       If it fails: return unauthorized 
       if it succeeds continue with the process

How to keep track of uncommitted code?

There are a few scenarios:

Keep track of code with no version control at all
Keep track of code under Git, but uncommitted
Keep track of code that's committed locally, but not pushed to a server

Only mount to the container the data files specified in the dataset

Currently the whole data directory is being mounted to /data on the experiment container. The user then needs to refer to the files using its hash which is not very user friendly.
We could instead mount the files individually and refer to them by the key name.

For example a data file with the following config:

"myfile.csv": {
  "url": "http://example.com/myfilesdsds.csv",
  "hash": "sdfd39asaea334353fdfh4sfsd3432353"
}

would be mounted from the host path /home/empirical/data/sdfd39asaea334353fdfh4sfsd3432353
to the container path /data/myfile.csv

Push and Pull images

Push images once they're built
Pull images when they're not existent on the host

Update status of the build

GPU support

Add GPU support on Linux using nvidia-docker
Document that GPU support is not available on windows nor OSX for now
Pass GPU_ENABLED environment variable to experiment container

Data management features

~~- [ ] Package a directory and get hash emp data tar directory~~

Get hash from directory emp data hash directory
Get hash from a file emp data hash file
Download file emp data get file
Download and extract directory emp data get directory