GithubHelp home page GithubHelp logo

pdeitel / pythondatasciencefullthrottle Goto Github PK

View Code? Open in Web Editor NEW
251.0 251.0 215.0 199.38 MB

Downloads for my Safari Online Learning live training course Python Data Science Full Throttle: Introductory Artificial Intelligence (AI), Big Data and Cloud Case Studies

HTML 1.72% Jupyter Notebook 96.76% Python 1.50% Dockerfile 0.02%

pythondatasciencefullthrottle's People

Contributors

dependabot[bot] avatar pdeitel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pythondatasciencefullthrottle's Issues

reducing png images size in this repo "ch14" - please check if possible

Hello Paul,

I was practicing one of your lessons on DataScience, and when I downloaded this repo zip its near to 200 MB
and png image files, have taken big share here.
please check if it can be compressed or converted to jpg. may be it can just save some memory & time for others.

Huge Files

Thanks,
KK

Dockerfile change

Had to do 2 local changes listed below in the Dockerfile to make it work. Only the first time it took long to create the image because it was downloading the jupyter/pyspark-notebook base image(s) and all those spacy packages. I could be wrong on this but noticed it used at least 40GB of my local drive (that included me trying to find the correct tag for the base image), in order to produce a 12.9GB docker image.

Dockerfile changes:

  • had to set a specific Python3.8 version
  • added an ENTRYPOINT using "jupyter-lab"

Also, created a docker-compose file to simplify the cli-command [ docker compose up -d --build ] to build and (re)deploy/run the image.

# Based on the Dockerfiles from the Jupyter Development Team which 
# are Copyright (c) Jupyter Development Team and distributed under 
# the terms of the Modified BSD License.
ARG OWNER=jupyter
ARG BASE_CONTAINER=$OWNER/pyspark-notebook:python-3.8
FROM $BASE_CONTAINER

LABEL maintainer="Paul Deitel <[email protected]>"

# Fix: https://github.com/hadolint/hadolint/wiki/DL4006
# Fix: https://github.com/koalaman/shellcheck/wiki/SC3014
SHELL ["/bin/bash", "-o", "pipefail", "-c"]

RUN mamba install --yes \
    'dnspython' \
    'folium' \
    'geopy' \
    'imageio' \
    'nltk'  \
    'pymongo' \
    'scikit-learn' \
    'spacy' \
    'tweepy' 
     
RUN pip install --upgrade \
    'tensorflow' \
    'openai' \
    'beautifulsoup4' \
    'deepl' \
    'mastodon.py' \
    'better_profanity'  \
    'tweet-preprocessor' \
    'ibm-watson' \
    'pubnub' \
    'textblob' \
    'wordcloud' \
    'dweepy' \
    'sounddevice'
    

# download data required by textblob and spacy
RUN python -m textblob.download_corpora && \
    python -m spacy download en_core_web_sm && \
    python -m spacy download en_core_web_md && \
    python -m spacy download en_core_web_lg 

# clean up
RUN mamba clean --all -f -y && \
    fix-permissions "${CONDA_DIR}" && \
    fix-permissions "/home/${NB_USER}"

ENTRYPOINT ["start.sh", "jupyter-lab"]

Docker compose file:

version: "3"

services:
  deitelpydsft:
    container_name: deitelpydsft
    user: root
    volumes:
      - .:/home/jovyan/work
    build: .
    restart: always
    # env_file: .env
    ports:
      - "8888:8888"
      - "4040:4040"

Twitter v1.1 vs. v2 APIs

This morning I became aware that the Twitter 1.1 APIs are no longer available for new Twitter Developer accounts. Existing accounts were grandfathered, but most people taking this course, reading our book or watching our videos are not likely to have had one.

This means anyone with a new developer account following our Twitter instructions will not be able to run the examples. Nor will they be able to run two major Twitter-based case studies in our Big Data content.

We are currently:

  • Mastering the new Twitter v2 APIs, which operate differently.
  • Updating all our code from the v1.1 APIs to the v2 APIs.
  • Rewriting our code discussions and case studies to match the new code.
  • Update the corresponding source files.

As soon as these updates are available, I'll post the new code files and Jupyter Notebooks in this repository.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.