GithubHelp home page GithubHelp logo

earthlab / hub-ops Goto Github PK

View Code? Open in Web Editor NEW
4.0 3.0 8.0 564 KB

Infrastructure and operations for the Earth Lab JupyterHub

Home Page: https://earthlab-hub-ops.readthedocs.io/en/latest/

Python 58.51% Dockerfile 41.49%

hub-ops's Introduction

DOI

Multiple Hub Deployment for Earth Lab JupyterHubs | hub-ops

Infrastructure and operations for the Earth Lab JupyterHubs.

Deployment status: Build Status Documentation status: Docs Status

Documentation

Read the documentation at: https://earthlab-hub-ops.readthedocs.io/en/latest/index.html

Build the documentation locally using:

$ cd docs $ make html

Monitoring

Visit https://grafana.hub.earthdatascience.org/ for monitoring of the hubs.

Available hubs

Hubs which are running:

Hubs available for deployment (currently not running):

  • The nbgrader hub. This is a hub that uses a development version of nbgrader while we wait for a PR to be merged into the base repo. It is configured using the chart in hub-charts/nbgrader-hub/.
  • The workshop hub. It is configured via the chart in hub-charts/wshub/. Used for a 45 person workshop with temporary logins.
  • The bootcamp hub. It is configured via the chart in hub-charts/bootcamp-hub/.

Development

TODO: Add instructionst o build the docs locally here

hub-ops's People

Contributors

betatim avatar consideratio avatar kcranston avatar nkorinek avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

hub-ops's Issues

Small docs update - dat-to-day.rst

Next decide how you'd like to authenticate your hub. You can use Github,
Google or a "hash" based authenticator. Read more about that here
`Read more about that here <https://earthlab-hub-ops.readthedocs.io/en/latest/>`_
`Read more about that here <https://earthlab-hub-ops.readthedocs.io/en/latest/authentication.html>`_

from tim
"We should find a way to do this without having to have the explicit URL here, something to fix in a new PR though, let's not hold this one up."

see: #87

Odd issue with hub loading

Sometimes the hub loading page hangs and hangs and hangs... but ifyou put the same link in another tab, the hub actually has already loaded -- ie the pod was created / started up.

we should look into this.
Screen Shot 2019-08-30 at 10 27 20 AM

location of installation instructions for jupyter extensions

We have two options for where to put the installation instructions for the jupyter extensions (i.e. anything that looks like RUN jupyter serverextension enable --py <extension> --sys-prefix, which would we need for nbgrader, nbzip, nbgitpuller). We can put them in the user-images Dockerfile or we can put them in the earth-analytics-python-env Dockerfile. If we put them in the earth-analytics image, then we can just pull from Dockerhub, and the hub doesn't need to build another image just to install the extensions.

Document git workflows

There are some assumptions baked into the way travis is setup, Tim has opinions on how one should use git, Leah knows some cool tricks (the reverse PR!).

Should we start a section in the documentation that explains how to do various things as a user and write down assumptions that might break if you do things differently?

The reason not to write it down (yet) is that it creates more work for us as we have to adjust how we work and the docs if we do change how things work.

cc @lwasser

Tooling to remove hubs

Tooling to delete a hub. For example removing all the PVCs associated to users from one particular hub deployment

Add percentage to upload files button

Currently when a student uploads a file, you have to click on "uploading". If the file takes some time, that uploading icon stays the same and it's hard to see if progress is being made. can we add a progress bar to the uploading button?

Questions about hub create for Tim

Just pulling questions together...

  • in requirements.yml -- does this Jupyter version need to be updated from time to time?

    • how do we know when ?
    • where does the version number come from? version: "v0.7-560a7cd"
  • do all of these items need to be updates with the hub name?
    prometheus.io/path: /wshub/hub/metrics

  • I believe here rsync is moving data...but where are the data actually fetched in this example
    rsync --ignore-existing -razv --progress /data/ /home/jovyan/earth-analytics/data;
    for the hub i'm creating, i don't need to add data. but i'd like to understand where the data are added.

  • This is stupid of me but i don't understand where this is? You need to edit :code:jupyterhub.hub.baseUrl` this is not the prometheus path.

  • You also need to configure the authentication setup. <- i'm not sure where this is either. in the yaml?? can we please add a section about where this is and what i need to do?

  • QUESTIon: in the travis instructions - why isn't wshub there?

script:
- |
  # Build staginghub
  python ./deploy.py --no-setup --build staginghub

LEAH CAN ANSWER THIS. TIM SHUT IT DOWN THIS MORNING. i wish i could strike out text in gh issues!!

Proxy token docs

The docs don't mention the need for the proxy token (see for example #94).

We should add a section on how to make a proxy token, where it needs to go, etc.

Cluster scaling up

Currently when the cluster scales up because there aren't enough resources for the next user pod to be scheduled it takes so long that the launch times out.

Two options: we increase the spawn time out from 300s to a larger number or we find out why it takes so long to spawn a new node.

Earth analytics course hub

This issue is a check list of things that need to be done, information gathered and questions relevant to deploying a hub for Leah's semester long course.

  • class size ~28 - expect 23-30 - you never know who will drop
  • git repo containing the material to sync to students i need to make something! but i know how to add it
  • memory requirement - similar to the spatial data workshop to begin with? then we can scale back
  • login/authentication system github whitelist but for the first day i'll collect github usernames so first day just github
  • software libraries needed earth analytics envt just like the workshop!
  • install nbzip with zip support (just got merged) https://github.com/data-8/nbzip/archive/b32bd3441a760a518f0bcdd55def6ac247d471d8.zip yay is this to download zip vs tar files?
  • the hub needs a name: ea-hub
  • are student names known upfront? no - please see above - my plan is to use generic github and then move to a whitelist
  • going live date -- aug 29 - i'd prefer the 28th in the evening!
  • use jupyterhub v0.9.2 with self-restarting hub

Please edit this message to add things to the list and answer questions by writing the answer after it.

Let's discuss things that need discussing in comments in this thread.

Class startup experience

We discussed that students would have to wait while the cluster scales up and how to solve this by logging in before class etc.

A much simpler solution is to visit the admin panel and start the servers for students there. Requires a few clicks and will cause the cluster to scale up before class. We can add users before they login for the first time if we know their usernames.

cc @lwasser

workshop setup for friday including data, ea-environment and a working space

For our workshop, we will need a similar setup to what a student would need.

  1. the students will need a semi persistent (for the time of the workshop) working area where they can export files
  2. the students will need access to 1 or more datasets (on figshare) that will be setup for them in a drive on their computer
  3. the students will need access to the full earth analytics python environment that we have build

Questions

  1. how do students log in to the environment? i'm fine with the way we discussed!!
  2. How do i setup the data so it's uniform for them? ie i'll need to know what the path is to the data for them to access it.

@betatim do we have this in place to use for friday? or what do you need to get things going so we are ready? thank you!!

Expected class sizes

How many classes and of what size will be running next semester? Need some data so we can seutp things appropriately and do some simulations.

This is related to the cluster discussion in #75 (comment)

tagging @lwasser

Multi hub setup

There will be several courses and workshops. Each with potentially their own set of students, required libraries/software, homework, and so on.

Instead of having one big JupyterHub instance that works for everyone, we will have one hub for each course.

Each hub will be reachable at http://hub.earthdatascience.org/earth-analytics or http://hub.earthdatascience.org/earth-workshop or http://hub.earthdatascience.org/something-something.

Technically we will make that happen by having an Ingress that takes care of directing traffic to each hub. Each hub will be configured by its own chart and deployed into a new namespace but on the same cluster (to share resources). The chart for each hub will reside in this repository as a distinct sub-directory. This should work well if we have "a few" hubs (like less than ten).

Each hub that uses the CU Google auth would need its own OAuth secrets and callback.

resource scaling issues?

ok @betatim i've been working on this a bit more and i'm noticing some consistent errors. it seems like for whatever reason, a certain number of users are able to access the hub. and then ... it crashes because of insufficient resources. by crash i mean you can't login. see below. this is a new "error" where it's trying to launch but not explicitely telling me what the issue is until i click on the log. if i knew a bit more i may try to play around with the setup but i really don't want to break things!
hoping this is a small thing to fix and that you see it in the morning pre-workshop which starts at 9am (5 your time). i will try to be up early just in case i can catch you!!

i should have tested it with more people earlier in the day. i just saw that it worked and got excited and didn't think to test more.
screen shot 2018-07-19 at 10 25 45 pm

Monitoring to see what is going on

Get started on collecting data to monitor via Prometheus and then displaying it via Grafana. The basics seem to be up and running: https://grafana.earthlab.wtte.ch/d/wqCsGavik/hub-monitoring?refresh=1m&orgId=1

@lwasser when you get back could you create two new domains:

  • grafana.hub.earthdatascience.org

  • prometheus.hub.earthdatascience.org
    they should both point to the same IP as hub.earthdatascience.org does. Once we have that the dashboard can be at grafana.hub.earthdatascience.org

  • can grafana remain public? If no secure it

  • get Leah setup with git-crypt for the secrets

  • what are useful things to show in grafana

  • docs docs docs

Fix travis setup for building the docs

Fix the travis setup so that sphinx errors actually lead to errors on a PR so we don't end up with broken docs being merged.

The problem is that the exit code of make html gets lost to popd I think.

a few questions about authentication & config

ok @betatim i have a few questions.

  1. looking at the various yaml files - specifically values.yaml - the items are in different order which makes it difficult to know what things i need to add / are missing etc. Can we make a unified template that we use to add things in the right order and hierarchy.
  2. the authentication. I don't love having a whitelist or authentication "out in the open" is it possible to also stash that in the secrets folder so someone doesn't grab a list of username from us? Or do you think that is an issue to be concerned with at all?
  3. I tried to add myself and you as admins. however when i look at the control panel i dont see a way to see and shutdown other servers so i'm not an admin. I tried the code below to add us all. do i need to change this: c.JupyterHub.authenticator_class = 'hashauthenticator.HashAuthenticator' or something else for this to work? this part was confusing because each yaml file has things in different orders so it's hard to know what to add where and where hiererarchy is important.
    extraConfig:
      auth: |
        c.JupyterHub.authenticator_class = 'hashauthenticator.HashAuthenticator'
      admin: |
        c.Authenticator.admin_users = {'lwasser', 'jlpalomino ', 'betatim'}
        c.JupyterHub.admin_access = True
  1. I tried to change my authenticator on the hub. Now something is happening where i see at the command line that there are several pods but i can't launch a new server from the website url without gettinga 504 gateway error. i'm not sure if this is because i made significant changes to the hub and jenny and my servers were not fully shut down. BUT because i wasn'ta n admin i couldn't shut down our servers so now how do i reset things?

This doesn't work. i suspect because the image-puller hash may be wrong? but i'm not sure where that came from.

kubectl describe pod hook-image-puller-1533307711-84jsw --namespace bootcamp-hub 

SO - how do i reset a hub?

What i did:

kubectl delete --all pods --namespace=bootcamp-hub

this deleted all pods! but it did seem like my workspace persisted when i logged in again... is this kosher behaviour??

Minimum cluster size

Make sure minimum size of the cluster is set to 1 and auto-scaler is switched on.

Continuous deployment

Finish the travis setup so we have continuous deployment.

The workflow would be:

  1. make a PR to this repository
  2. (run some tests)
  3. merge the PR
  4. travis deploys it

It can happen that a deploy fails so people need to keep an eye on that and then revert the PR. Is there a better story we can make here?

  • finish CD setup in .travis.yml
  • enable travis
  • create a new hub by PR
  • setup a basic check that runs before we merge a PR to see if things will "work"

Permissions, permissions, permissions

Leah create "TravisDeployer" role in https://console.cloud.google.com/iam-admin/roles?project=ea-jupyter with these permission:


    container.clusters.get
    container.clusters.getCredentials
    container.clusters.list
    container.daemonSets.get
    container.daemonSets.list
    container.deployments.create
    container.deployments.get
    container.deployments.list
    container.deployments.update
    container.pods.get
    container.pods.list
    container.pods.portForward
    container.services.get

Tim create a service account "travis-deployer" on https://console.cloud.google.com/iam-admin/serviceaccounts?project=ea-jupyter that uses the new role. Get private key and add it to the repo in the secrets/ directory.

Leah Needs Help :) -- A Few Questions

Ok @betatim first and foremost this is so amazing. i'm in the hub and it's working
ok so on to questions and things to work out.

  1. Currently this hub is linked to a repo, here: https://github.com/earthlab-education/2018-07-20-spatial-python-workshop/ it drops the stuff from that repo into a materials folder.
    QUESTION: How do I
  • change the name of the folder "materials" to something else?
  • let's say i add some notebooks to it today, it doesn't seem to update. At the same time if i add a notebook that a student might edit, what would happen if i update that notebook? what i'm wondering is if we need to make a materials folder that is either not editable that the students then make a copy to start working in?
  1. I have a similar question about the data. Currently it drops the data in home/earth-analytics/data/ this is so perfect! however... same question as above. let's say the student starts working in that folder. but i want to add a new dataset to it. Example - i just added 2 new datasets yesterday and updated earthpy. How does that work? should we always have students save stuff outside of the data and what is now called materials directory?
  2. can we start a google doc with instructions on how to do stuff ? documentation essential? or a file somewhere i'm open to how we document i just want to start documenting as i'm already getting lost. i can help with the docs.

This is so so great... going to play more.

  1. i just updated the ea-env commit hoping everything would rebuild

FROM earthlab/earth-analytics-python-env:ddf68823b186c4ab5e6d14abee1caa33bea4cda8
this was easy to change. i can't seem to find where nbgitpuller is pulling files from our lesson repo however.

Also when i login again, i don't see the new data. how do i know when things have been rebuilt and should thus appear in the hub?

Does the hub reset a users environment each time they login?

Hey @betatim today a student told me that he has updated earthpy several times. And each time he logs out of the hub it seems to reset so he needs to update again when he logs in again. I am going to try to recreate this issue but does this make sense? I suspect it could make sense if each time the student logs in the hub pod is build using the docker container.

Better experience for adding a new hub

Right now adding a new hub requires several steps and you need to remember to edit several files. See for example #39 and how many commits it took.

A better story that is based around auto discovering what needs deploying from a central configuration and the like would be nice to have.

Preemptible nodes

Try out https://cloud.google.com/preemptible-vms/ and see what happens to the user experience.

They are a lot cheaper than normal nodes. Need to have two node pools, one to pack the hubs into with normal nodes and one with preemptible nodes for user pods.

Needs some kinda monitoring to get insight into how often a node vanishes "mid use".

For AWS users of this type of node anecdotal evidence is that it "never" happens.

Command Line setup in the hub

Ok @betatim two new questions.

  1. Can we setup the hub with git available? or is it available and ready to configure? i'll do some testing today.
  2. What if we want to have multiple sets of files available for students from different hubs? <i'll do some testing on this today as well. >

How to encrypt secrets

Ok @betatim i think i understand this process (i hope) BUT currently i'm not sure how to

  1. encrypt my secret file so it's truly secret. Can we add that to the docs safely OR locally somehow document it?
  2. once i push things to the repo - should the new hub build? or do i have to run things at the command line as well?

Thank you!
Leah

leah's breaking & fixing things again

Hey @betatim
i'm running into some issues with my ea-hub setup.

for one i keep getting this error: 0 this happens when i use the conda tab in jupyter. i suppose we could remove it since terminal works quite well.

screen shot 2018-09-04 at 8 21 37 pm

i'm not sure what is causing this. do you have any idea?


I fixed the issue i had as a second issue but now have a new one

ISSUE 3

import earthpy is throwing an error. on the hub tho not locally. i am tracing it back to rasterio and then gdal, i am not sure what the issue is.

  1. takeaway - i think we need a test to ensure packages not only are there but load on travis
  2. i am not sure why gdal is throwing errors which impacts rasterio and then earthpy.

Here is the error:
screen shot 2018-09-04 at 8 32 59 pm
i can't recreate it locally. but i reverted to an older dockerimage and i'm good for now. but still need to update things.

ISSUE 3

nbgitpuller is giving me problems. all i can say is when i try to add a repo is just seems to spin and spin and spin even tho it builds quickly and passes. what am i doing wrong? i may just remove it for tomorrow!

How do i create a custom whitelist for github oath?

HEy @betatim so i'm getting closer to understanding authentcation. but i don't understand now --

how do i create a custom whitelist of users - ie github usernames - to allow into a course hub? i see how i allow an organization to authenticate.

Removing a Hub - questions

ok @betatim here i go again.
I'm reading hte removing a hub section.

At the end of a workshop or semester you should consider removing a hub again. While a hub scales down to use minimal resources when no one is logged in, it does use some resources (like disk space) that will only be reclaimed once the hub has been turned off.

I'm unclear about

  1. when i a hub using resources vs not using resources? IE is it possible to have a hub ready to go but that is not deployed /using resources? Like the WSHUB. we want to use it for the workshop, then let them download what they'd like. THen shut it down completely But be prepared to launch it again in the future with perhaps new users in it!
  2. While a hub scales down to use minimal resources when no one is logged in, - i'd like to understand how this works and what minimal resources means in dollars and also computer power. there may be a section on this but if not can we kindly add it?
  3. This section is a bit confusing. I suspect there is a bunch of setup that i need to have in order to be connected to google cloud and such here. i don't have any of this setup. i've been playing with google cloud using the google terminal in the cloud. do i have to have this: https://earthlab-hub-ops.readthedocs.io/en/latest/initial-setup.html stuff setup first? or can you help me understand this section?
The second step is to uninstall the helm release to shutdown
your hub. You will need :code:`kubectl` and :code:`helm` installed and configured
on your local machine to perform this step.

To check for the installation

One way to check this is to
run :code:`kubectl get pods --namespace=<hubname>`. This should show that there are
two pods running::

do we still need nbzip and nbgitpuller?

Updating the ea-hub, and wanting to know if we still need nbzip and nbgitpuller. I don't have these installed in the nbgrader hub we used for the bootcamp test. We can always add them back later - just trying for a simple-as-possible initial build.

There are student pods available for students we've removed

Looking at the control panel, i noticed some student pods for students who do not exist in our whitelist anymore. what is the process for cleaning our and removing a pod? Are those pods still available and should we delete them via admin or the cloud terminal?

in the example below anne - windage i believe is frm last year. @jlpalomino should be able to confirm. yoji i suppose still is in the program so we could keep him??

Screen Shot 2019-08-30 at 3 24 13 PM

Things that are not working in the bootcamp-hub

ok @betatim here is where i've gotten today with the hub!

It works but ONLY with organization based authentication. Can you have a look at the hub and help me understand how to

  1. add a whitelist that actually works to the hub. what i have does not work.
  2. add admins to the hub that can shutdown servers. i actually ended up killing pods at the command line a few times. it would be great to have admin access to this hub.

then please see the other issues i posted with questions on managing a hub... so we can perhaps update docs accordingly. Thank you!!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.