computational-plant-science / plantit Goto Github PK

License: BSD 3-Clause "New" or "Revised" License

Shell 1.09% Python 24.65% HTML 0.06% JavaScript 5.26% Dockerfile 0.11% Vue 68.15% Sass 0.65% PLpgSQL 0.02% PowerShell 0.01%

hpc science-gateways phenotyping phenomics singularity

plantit's Introduction

Contents

About
Development

About

PlantIT is a framework for deploying research apps to high-performance/-throughput clusters. Specifically, it's a science gateway for image-based plant phenotyping. Future work may generalize the platform for any domain.

Status

PlantIT debuted (with pre-release v0.1.0) at NAPPN 2022 (Feb. 22-25). See Releases for changelog and Roadmap for planned features and fixes. Capabilities will likely continue to evolve for some time, with "official" releases following an eventually forthcoming publication.

Motivation

High-throughput phenotyping is resource-intensive and often demands virtual computing resources. This presents deployment challenges related to packaging and portability, and raises barriers to entry. Research software should:

be highly configurable when necessary
automate deployment details where possible
let users (and developers) focus on the problem domain

Overview

PlantIT aims to bridge two user groups: researchers and developers. Of course one may wear both hats. The idea is an open-source conveyor belt for scientific software: make it easier to 1) package and share research applications, and 2) deploy them to clusters.

PlantIT is just glue between version control, data storage, container engine, and cluster scheduler. To publish an app, containerize it (e.g., write a Dockerfile) and add a plantit.yaml file to the GitHub repository. Then run it from the browser with a few clicks.

Development

Read on if you're interested in contributing to plantit or hosting your own instance somewhere.

Requirements

The following are required to develop or deploy plantit in a Unix environment:

Docker
npm

Installation

First, clone the repository:

git clone https://github.com/Computational-Plant-Science/plantit.git

Setting up a development environment

To set up a new (or restore a clean) development environment, run scripts/bootstrap.sh from the project root (you may need to use chmod +x first). You can use the -n option to disable the Docker build cache. This command will:

Stop and remove project containers and networks
If an .env file (to configure environment variables) does not exist, generate one with default values
Build the Vue front end
Build Docker images
Run migrations

Then bring everything up with docker-compose -f docker-compose.dev.yml up (-d for detached mode).

This will start a number of containers:

plantit: Django web application (http://localhost:3000)
postgres: PostgreSQL database
celery: Celery prefork worker
celerye: Celery eventlet worker
flower: Flower web UI for Celery (http://localhost:5555)
redis: Redis instance (caching, Celery message broker)
sandbox: Ubuntu test environment

The Django admin interface is at http://localhost:3000/admin/. To use it, you'll need to log into the site at least once (this will create a Django account for you), then shell into the plantit container, run ./manage.py shell, and update your profile with staff/superuser privileges. For instance:

from django.contrib.auth.models import User
user = User.objects.get(username=<your CyVerse username>)
user.is_staff = True
user.is_superuser = True
user.save()

You can also run ./scripts/configure-superuser.sh -u <your CyVerse username> to accomplish the same thing.

Note that the bootstrap script will not clear migrations. To restore to a totally clean database state, you will need to remove all *.py files from the plantit/plantit/migrations directory (except for __init__.py).

Running tests

Once the containers are up, tests can be run with docker-compose -f docker-compose.dev.yml exec plantit ./manage.py test.

Reverse proxying with `ngrok`

To test remote job submissions using a local development version of the plantit web application, you will need a service of some kind to accept and forward job completion signals to your machine's localhost. One convenient tool is ngrok. After downloading and adding the ngrok executable to your path, run ngrok http 3000 to start a tunnel, then set the DJANGO_API_URL variable in .env to the URL ngrok reports it's listening on.

Deploying to production

In production configuration, NGINX serves static assets and reverse-proxies Django via Gunicorn (both in the same container).

To configure PlantIT for deployment, first clone the repo, then, from the root directory, run:

chmod +x /scripts/deploy.sh
./scripts/deploy.sh <configuration ('rc' or 'prod')> <host IP or FQDN> <admin email address>

This script is idempotent and may safely be triggered to run by e.g., a CI/CD server. This will:

Bring containers down
Fetch the latest version of the project
Pull the latest versions of Docker containers
Build the Vue front end
Collect static files
Configure NGINX (replace localhost in config/ngnix/conf.d/local.conf with the host's IP or FQDN, configured via environment variable)
Update environment variables (disable debugging, enable SSL and secure cookies, etc)
Bring containers up
Run migrations

At this point the following containers should be running:

nginx: NGINX server (reverse proxy)
plantit: Django web application behind Gunicorn (http://localhost:80)
postgres: PostgreSQL database
celery: Celery background worker
redis: Redis instance

SSL Certificates

PlantIT uses Let's Encrypt and Certbot for SSL certification. The production configuration includes a certbot container which can be used to request new or renew existing certificates from Let's Encrypt. Standard certificates last 90 days.

In production the certbot container is configured by default to automatically renew certs when necessary:

certbot:
  image: certbot/certbot
  volumes:
    - ./config/certbot/conf:/etc/letsencrypt/
    - ./config/certbot/www:/var/www/certbot
  entrypoint: "/bin/sh -c 'trap exit TERM; while :; do certbot renew; sleep 24h & wait $${!}; done;'"

To manually request a new certificate, run:

docker-compose -f docker-compose.prod.yml run certbot

To renew an existing certificate, use the renew command, then restart all containers:

docker-compose -f docker-compose.prod.yml run certbot renew
docker-compose -f docker-compose.prod.yml restart

Use the --dry-run flag with any command to test without writing anything to disk.

Configuring environment variables

Docker will read environment variables in the following format from a file named .env in the project root directory (if the file exists):

key=value
key=value
...

bootstrap.sh will generate an .env file like the following if one does not exist:

VITE_TITLE=plantit
MAPBOX_TOKEN=<your Mapbox token>
MAPBOX_FEATURE_REFRESH_MINUTES=60
CYVERSE_REDIRECT_URL=http://localhost:3000/apis/v1/idp/cyverse_handle_temporary_code/
CYVERSE_CLIENT_ID=<your cyverse client id>
CYVERSE_CLIENT_SECRET=<your cyverse client secret>
CVVERSE_USERNAME=<your cyverse username>
CYVERSE_PASSWORD=<your cyverse password>
CYVERSE_TOKEN_REFRESH_MINUTES=60
NODE_ENV=development
DJANGO_SETTINGS_MODULE=plantit.settings
DJANGO_SECRET_KEY=<your django secret key>
DJANGO_DEBUG=True
DJANGO_API_URL=http://plantit:3000/apis/v1/
DJANGO_SECURE_SSL_REDIRECT=False
DJANGO_SESSION_COOKIE_SECURE=False
DJANGO_CSRF_COOKIE_SECURE=False
DJANGO_ALLOWED_HOSTS=*
DJANGO_ADMIN_USERNAME=<your django admin username>
DJANGO_ADMIN_PASSWORD=<your django admin password>
DJANGO_ADMIN_EMAIL=<your django admin email>
CELERY_EVENTLET_QUEUE=eventlet
USERS_CACHE=/code/users.json
USERS_REFRESH_MINUTES=60
USERS_STATS_REFRESH_MINUTES=10
STATS_WINDOW_WIDTH_DAYS=30
MORE_USERS=/code/more_users.json
AGENT_KEYS=/code/agent_keys
WORKFLOWS_CACHE=/code/workflows.json
WORKFLOWS_REFRESH_MINUTES=60
TASKS_LOGS=/code/logs
TASKS_TIMEOUT_MULTIPLIER=2
TASKS_STEP_TIME_LIMIT_SECONDS=20
LAUNCHER_SCRIPT_NAME=launch
INPUTS_FILE_NAME=inputs.txt
ICOMMANDS_IMAGE=computationalplantscience/icommands
SQL_ENGINE=django.db.backends.postgresql
SQL_HOST=postgres
SQL_PORT=5432
SQL_NAME=postgres
SQL_USER=postgres
SQL_PASSWORD=<your database password>
GITHUB_AUTH_URI=https://github.com/login/oauth/authorize
GITHUB_REDIRECT_URI=http://localhost:3000/apis/v1/users/github_handle_temporary_code/
GITHUB_SECRET=<your github secret>
GITHUB_CLIENT_ID=<your github client ID>
DOCKER_USERNAME=<your docker username>
DOCKER_PASSWORD=<your docker password>
NO_PREVIEW_THUMBNAIL=/code/plantit/front_end/src/assets/no_preview_thumbnail.png
AWS_ACCESS_KEY=<your AWS access key>
AWS_SECRET_KEY=<your AWS secret key>
AWS_REGION=<your AWS region>
AWS_FEEDBACK_ARN=<your AWS feedback ARN>
AGENTS_HEALTHCHECKS_MINUTES=5
AGENTS_HEALTHCHECKS_SAVED=12
TUTORIALS_FILE=/code/tutorials.pdf
FEEDBACK_FILE=/code/feedback.pdf
CELERY_AUTH=user:password
HTTP_TIMEOUT=15
CURL_IMAGE=curlimages/curl
GH_USERNAME=<your github username>
FIND_STRANDED_TASKS=True

Note that the following environment variables must be supplied manually:

MAPBOX_TOKEN
CYVERSE_CLIENT_ID
CYVERSE_CLIENT_SECRET
CVVERSE_USERNAME
CYVERSE_PASSWORD
GITHUB_CLIENT_ID
GITHUB_SECRET
AWS_ACCESS_KEY
AWS_SECRET_KEY
AWS_REGION
AWS_FEEDBACK_ARN

Several others will be auto-generated by scripts/bootstrap.sh in a clean install directory:

DJANGO_ADMIN_PASSWORD
DJANGO_SECRET_KEY
SQL_PASSWORD

Some variables must be reconfigured for production environments (scripts/deploy will automatically do so):

NODE_ENV should be set to production
DJANGO_DEBUG should be set to False
DJANGO_SECURE_SSL_REDIRECT should be set to True
DJANGO_API_URL should point to the host's IP or FQDN

Configuring deployment targets

An agent is an abstraction of a computing resource, such as a cluster or supercomputer. plantit interacts with agents via key-authenticated SSH and requires the SLURM scheduler to be installed. (Support for additional schedulers is in development.)

Deployment targets may be configured programmatically or with the Django admin interface. To configure an agent via the Django admin site, make sure you're logged into plantit, then navigate to http://localhost:3000/admin/ (https://<host>/admin/ in production). Select the Agents tab on the left side of the screen, then Add Agent.

On many clusters it is customary to configure dependencies on a per-user basis with a module system, e.g. module load <some software>. The pre_commands agent property is the place for commands like these: when provided, they will be prepended to all commands plantit sends to the cluster for job orchestration.

Agent requirements

plantit deployment targets must run some Linux distribution with either the sh or bash shells available. Only 2 dependencies are required:

SLURM
Singularity

plantit tasks expect standard SLURM commands (e.g., sbatch, scancel) to be available. Singularity must also be installed and available on the $PATH.

Authenticating with Docker

Docker Hub applies rate limits to unauthenticated users. These are easy to meet or exceed, since Singularity queries the Docker API on each singularity exec docker://<some container>. It is recommended to use singularity remote login --username <your Docker username> docker://docker.io with a paid Docker account: this will cache your Docker credentials on the deployment target for Singularity to use thereafter.

Building the documentation

To build the sphinx documentation locally, use:

docker run -v $(pwd):/opt/dev -w /opt/dev computationalplantscience/plantit sphinx-build -b html docs docs_output

Testing DIRT migrations

The DIRT migration feature allows users of the original DIRT web application to migrate their data to plantit. To test this feature, you will need to have access to the DIRT server and database. The following environment variables must be set:

DIRT_MIGRATION_DATA_DIR: the directory on the DIRT server where DIRT data is stored
DIRT_MIGRATION_HOST: the hostname of the DIRT server
DIRT_MIGRATION_PORT: the SSH port of the DIRT server
DIRT_MIGRATION_USERNAME: the SSH username for the DIRT server
DIRT_MIGRATION_DB_HOST: the hostname of the DIRT database server
DIRT_MIGRATION_DB_PORT: the port of the DIRT database server
DIRT_MIGRATION_DB_USER: the username of the DIRT database user
DIRT_MIGRATION_DB_DATABASE: the name of the DIRT database
DIRT_MIGRATION_DB_PASSWORD: the DIRT database password

An SSH tunnel must also be opened to the DIRT server, as the database is not open to external connections. For instance, to open a tunnel from port 5678 on the DIRT server to port 3306 on a development machine:

ssh -L 3306:localhost:3306 -p <DIRT server SSH port> <your cyverse username>@<DIRT server IP or FQDN>

On some Linux systems it may be necessary to substitute the loopback IP address 127.0.0.1 for localhost

Be sure to set DIRT_MIGRATION_DB_HOST=host.docker.internal to point the Docker containers to the host's loopback/localhost address.

Some extra configuration is necessary for Linux systems to allow containers to access services running on the local host. The docker-compose.dev.yml configuration file configures the plantit, celery, and celerye containers with the extra_hosts option:

  extra_hosts:
    - "host.docker.internal:host-gateway"

This is only necessary on Linux systems. On Mac and Windows, the host.docker.internal hostname is automatically configured. See this post for more information.

plantit's People

Contributors

Stargazers

Watchers

Forkers

chris-schnaufer wpbonelli jieqiuganlanly

plantit's Issues

Restart of a run does not keep run settings

All settings for the run have to be entered again when doing the restart.

Periodically clear runs

7- or 30-day default policy? Should also probably support manual deletion

MIAPPE metadata

Filesystem tree overlay

Input data verification checksums

Corresponding to Computational-Plant-Science/pycyapi#18, PlantIT should obtain checksums for each input file selected for a run, then pass those checksums to the CLI for verification that no files have changed between submission time and download time on the deployment target.

Input selection on flow submision page

Folders that are deeper than the second hierarchy level of the folder structure can not be selected for job submissions.

Flow configuration validation

Disable flow runs if configuration doesn't pass validation
Show users validation status & show authors validation details on flow submission page

This should probably be done on the backend: validation logic can be exposed via an ad-hoc Django endpoint, with re-validation occurring whenever whenever flow details are queried.

Show output directory (w/ download) when run completes

Cache and lazily refresh users

Like we currently do with flows

Periodically crawl and cache public flows

Instead of loading flows on the client (requires a separate query per user), crawl the userbase on the backend and cache public flows. Client should be able to retrieve a full list of public flows with a single request.

Tags for flows

Such as:

"Example" for Hello World and other sample flows
default tags (e.g., "segmentation", "simulation", "3D")
user-defined tags for personal flows

Clicking runs in the file tree does not respond

The file tree on the top only responds to clicks in the first two levels.

Make all run page sections collapsible

Deployment target access policies

These should be maintained on a per-user basis. Users should be able to request access to deployment targets, as well as automatically access those for which they already have an account/allocation (how to handle this?)

Jobs that time out are not correctly updated on the webpage

See Job 6850988 on stampede2 (PlantIT ID: 1605842632). The job hit the wall time limit but shows up as running on the webpage

Only users with github account are shown in the user list

User that do not contribute flows don't necessarily have a github account. These users are not shown on the user page. Distinguishing contributors from users would be a good.

Update: Also new users with linked github account are not shown.

Scheduler timeouts on non-sandbox deployment targets not detected

When walltime is exceeded on cluster deployment targets, run status is not updated and runs never enter the Completed or Failed state.

Some possible options:

server-side polling (server periodically checks cluster scheduler; aborts run if associated job completes before completion status update is received)
cluster-side health checks (CLI emits periodic status updates; server aborts run if completion status update not received within acceptable delay)

Dataset sharing

3 use cases:

~~Share with individual users~~
Make public
~~Publish (with DOI)~~ (moved to #114)

Workflow configuration editing

Allow users to edit plantit.yaml in the browser

E.g., specifying which files to include when uploading outputs back to CyVerse:

S3 data storage integration

Some potential users likely store their data in S3. S3 also provides strong consistency guarantees now (https://aws.amazon.com/s3/consistency/)

Searchable workflows, servers, and users

Allow search for flows and users by name and/or tags (topics).

Eventual support for public MIAPPE metadata would be good as well.

Depends on #74.

Allow flow config files to specify test input files/datasets from the CyVerse Data Commons (optional)

Move navigation from sidebar to dropdown from profile avatar

Quicker access to various resources, a la GitHub:

Docs and Guide pages do not render correctly when dark mode is enabled

Headers and code block text remains black

Visual feedback when downloading files

Currently when files are downloaded from the run page, no visual cues are presented until the download is complete and a popup appears. We should show an alert or notification that a download is in progress.

Since not all browsers can be relied upon to emit an onload event when the download completes, we can't disable the download button until completion, although this would be ideal.

Allow zip download of all run outputs from run page

Write run logs to file (and allow user download)

Failed jobs do not clean up correctly on the cluster side

Failed jobs leave their data on the cluster side causing a disk quota reached error.

Data could be stored on scratch with no quota rather than home.

Slack integration

~~https://github.com/rauchg/slackin~~ permanent Slack share link should be sufficient

Wall time and memory limit

Submission is limited to 10GB memory and 10min wall time. Copy-paste error from the sandbox?

Upgrade plantit and celery images to Python 3.8

Paginate output files on run page

Runs with lots of output files slow down the browser. Allow adjustable pagination.

Fix GitHub repo link on run page

GitHub links are dead on run page

User defined names for runs

It is hard to identify a particular run as a user. Would it be possible to give user-defined names when creating the run?

Make CyVerse Data Store input/output optional

As a togglable alternative to direct upload/download. Depends on Computational-Plant-Science/pycyapi#21

Move to ASGI/Uvicorn/async Django views

Image thumbnails

On flow submission page and run page

Wrap SSH connections with retry logic

...to address transient faults (e.g., intermittent failure to recognize SSH key on Stampede2)

Allow single-file input selection

Upload/download/delete actions in data tree

Log user out if CyVerse JWT is expired

Broken GitHub link displayed on user page when user hasn't linked their GitHub account

Retrieve and show output files on run page

Detect and show output files on run page as they are produced (allow download as well). Show thumbnails for image files. Also detect if path no longer exists on deployment target (e.g., due to automatic cleanup).

SLURM GPU support

Add an option to enable GPU on supported cluster deployment targets.

Allow flow config files to specify default output paths (optional)

E.g., the Arabidopsis Rosette Analysis flow writes output files to a directory output in the working directory. Users should not need to know this; we should specify it in the flow config, then PlantIT should autofill the value on the submission page.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.