GithubHelp home page GithubHelp logo

qpc-github / jupyterhub-dataprocspawner Goto Github PK

View Code? Open in Web Editor NEW

This project forked from googleclouddataproc/jupyterhub-dataprocspawner

1.0 2.0 0.0 410 KB

License: Apache License 2.0

Shell 5.05% Python 74.24% HTML 19.61% Dockerfile 1.11%

jupyterhub-dataprocspawner's Introduction

DataprocSpawner

DataprocSpawner enables JupyterHub to spawn single-user [jupyter_notebooks][Jupyter notebooks] that run on Dataproc clusters. This provides users with ephemeral clusters for data science without the pain of managing them.

  • Product Documentation
  • DISCLAIMER: DataprocSpawner only supports zonal DNS names. If your project uses global DNS names, click this for instructions on how to migrate.

Supported Python Versions: Python >= 3.6

Before you begin

In order to use this library, you first need to go through the following steps:

  1. Select or create a Cloud Platform project
  2. Enable billing for your project
  3. Enable the Google Cloud Dataproc API
  4. Setup Authentication

Installation example

Locally

To try is locally for development purposes. From the root folder:

chmod +x deploy_local_example.sh
./deploy_local_example.sh <YOU_PROJECT_ID> <YOUR_GCS_CONFIG_LOCATIONS> <YOUR_AUTHENTICATED_EMAIL>

The script will start a local container image and authenticate it using your local credentials.

Note: Although you can try the Dataproc Spawner image locally, you might run into networking communication problems.

Google Compute Engine

To try it out in the Cloud, the quickest way is to to use a test Compute Engine instance. The following takes you through the process.

  1. Set your working project

    PROJECT_ID=<YOUR_PROJECT_ID>
    VM_NAME=vm-spawner
  2. Run the example script which:

    a. Creates a Dockerfile b. Creates a jupyter_config.py example file that uses a dummy authenticator. c. Deploy a Docker image of the JupyterHub spawner in Google Container Registry d. Create a container-based Compute Engine e. Returns the IP of the instance that runs JupyterHub.

    bash deploy_gce_example.sh ${PROJECT_ID} ${VM_NAME}
  3. After the script finishes, you should see an IP displayed. You can use that IP to access your setup at <IP>:8000. You might have to wait for a few minutes until the container is deployed on the instance.

Troubleshooting

To troubleshoot

  1. ssh into the VM:

    gcloud compute ssh ${VM_NAME}
  2. From the VM console, install some useful tools:

    apt-get update
    apt-get install vim
  3. From the VM console, you can:

    • List the running containers with docker ps
    • Display container logs docker logs -f <CONTAINER_ID>
    • Execute code in the container docker exec -it <CONTAINER_ID> /bin/bash
    • Restart the container for changes to take effect docker restart <CONTAINER_ID>

Notes

  • DataprocSpawner defaults to port 12345, the port can be set within jupyterhub_config.py. More info in JupyterHub's jupyterhub_documentation.

    c.Spawner.port = {port number}

  • The region default is us-central1 for Dataproc clusters. The zone default is us-central1-a. Using global is currently unsupported. To change region, pick a region and zone from this list and include the following lines in jupyterhub_config.py:

    .. code-block:: console

    c.DataprocSpawner.region = '{region}' c.DataprocSpawner.zone = '{zone that is within the chosen region}'

Next

Disclaimer

This is not an official Google product.

jupyterhub-dataprocspawner's People

Contributors

annyue avatar jerryleiding avatar m-mayran avatar ojarjur avatar oleksiilopasov avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.