GithubHelp home page GithubHelp logo

civisanalytics / datascience-python Goto Github PK

View Code? Open in Web Editor NEW
25.0 25.0 34.0 101 KB

Common dependencies for data science workflows

License: BSD 3-Clause "New" or "Revised" License

Dockerfile 43.48% Shell 6.45% Python 50.07%

datascience-python's People

Contributors

beckermr avatar byndcivilization avatar elsander avatar jacksonlee-civis avatar jacksonllee avatar keithing avatar kmshelley avatar mheilman avatar shelbrudy avatar thatguyinabeanie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datascience-python's Issues

Docker build fails due to missing package

Hi there!

I attempted to build this docker image locally and the build failed with the following error:

Solving environment: ...working... failed

ResolvePackageNotFound:
  - openblas=0.3.5

The command '/bin/sh -c conda install -y boto &&     conda install -y nomkl &&     conda env update -f environment.yml -n root &&     conda clean --all -y &&     rm -rf ~/.cache/pip' returned a non-zero code: 1

This can be replicated by pulling down this repository and running docker build -t datascience-python .

Incompatible dependencies: boto3 and botocore

When pip installing packages on top of the datascience-python docker image v4.0.1, the following warning shows up in the logging:

boto3 1.5.11 has requirement botocore<1.9.0,>=1.8.25, but you'll have botocore 1.5.38 which is incompatible.

As a sanity check, environment.yml indeed has boto3 pinned down at 1.5.11 and botocore at 1.5.38:

https://github.com/civisanalytics/datascience-python/blob/v4.0.1/environment.yml#L7-L9

Also, boto3 1.5.11 does specify botocore>=1.8.25,<1.9.0:

https://github.com/boto/boto3/blob/1.5.11/setup.py#L17

Instead of submitting a pull request right away, I thought I should create an issue first, just in case the incompatible dependencies between boto3 and botocore were intentional for reasons I'm unaware of, and also because there are likely other dependencies that we should probably update as well (it's been two months since the v4.0.1 release in early February 2018) but I don't know if the maintainers (@stephen-hoover?) are already working on it.

Update dependencies

The latest release was 4.2.0 from ~8 months ago, which means lots of the dependency versions in environment.yml are not up-to-date by now. It's time to go through it with a fine-tooth comb and update the dependencies.

FWIW, I noticed this issue with cloudpickle, pinned at 0.5.2 through this datascience-python 4.2.0 Docker image, apparently unable to properly unpickle certain custom class objects; the problem was resolved after I force installed the latest cloudpickle 0.6.1.

@stephen-hoover for viz

Remove dependencies that are of no or little use

There's probably no way to tell with certainty, but there's a good chance that some of the packages in environment.yml are seldom or never used in our current and foreseeable production environments, and therefore can be considered for removal to slim down the Docker image. Here's a list to start:

  • beautifulsoup4
  • dask
  • nose
  • urllib3 (requests is already included, though I don't know if production workflows depend on some crucial urllib3 features that requests doesn't support)
  • dropbox
  • python-simple-hipchat

Even if some of these packages are needed very occasionally, they can naturally be installed on the fly. In general, I feel like this ds-py image should only include packages that have at least medium (if not high) usage.

Links on DockerHub README are broken

In the current README.md, we use relative links. This is nice, because as @stephen-hoover pointed out, the links are correct across branches and commits. However, since these links are relative and we use the same README.md for Dockerhub as for Github, the links work on Github but not on Dockerhub.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.