GithubHelp home page GithubHelp logo

ajs3nj / nextflow-infra Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sage-bionetworks-workflows/nextflow-infra

1.0 0.0 0.0 748 KB

AWS CloudFormation templates for deploying Nextflow infrastructure

Home Page: https://tower.sagebionetworks.org

License: Apache License 2.0

Python 67.69% Shell 9.19% Jinja 23.12%

nextflow-infra's Introduction

Nextflow Infrastructure

The AWS infrastructure for hosting a private instance (see link below) of Nextflow Tower and executing Nextflow workflows is defined in this repository and deployed using CloudFormation via Sceptre.

The Nextflow infrastructure has been vetted by Sage IT to process sensitive or controlled-access (e.g. PHI) data. Notably, only HIPAA eligible AWS services are deployed.

Access Tower

Click the link below and login with your @sagebase.org Google account:

➡️ Nextflow Tower @ Sage Bionetworks ⬅️

Getting Started

Prospective Tower Users

Follow the Tower User Onboarding instructions below. Access is currently restricted to Sage Bionetworks staff. See below for how to get help if you run into any issues.

Prospective Contributors

Read through the contribution guidelines for more information. Contributions are welcome from anyone!

Getting Help

Message us in the #workflow_users Slack channel or email us at nextflow-admins[at]sagebase[dot]org.

Tower User Onboarding

Before you can use Nextflow Tower, you need to first deploy a Tower project, which consists of an encrypted S3 bucket and the IAM resources (i.e. users, roles, and policies) that Tower requires to access the encrypted bucket and execute the workflow on AWS Batch. Once these resources exist, they need to be configured in Nextflow Tower, which is a process that has been automated using CI/CD.

  1. Create a 'stack name' by following this naming convention: concatenate a project name with the suffix -project (e.g. imcore-project, amp-ad-project, commonmind-project). Due to limits imposed by Tower, the stack name cannot contain more than 32 characters.

    N.B.: Anytime that <stack_name> appears below with the angle brackets, replace the placeholder with the actual stack name, omitting any angle brackets.

  2. Create an IT JIRA ticket requesting membership to the following JumpCloud groups for anyone who needs read/write or read-only access to the S3 bucket:

    • aws-sandbox-developers
    • aws-workflow-nextflow-tower-viewer

    To confirm whether you're already a member of these JumpCloud groups, you can expand the AWS Account list on this page (after logging in with JumpCloud) and check if you have Developer listed under org-sagebase-sandbox and TowerViewer under workflows-nextflow-dev and workflows-nextflow-prod.

    AWS SSO Screenshot

  3. Open a pull request on this repository in which you duplicate config/projects/example-project.yaml as <stack_name>.yaml in the projects/ subdirectory and then follow the numbered steps listed in the file. Note that some steps are required whereas others are optional.

    N.B. Here, read/write vs read-only access refers to the level of access granted to users for the encrypted S3 bucket and to the Tower workspace (more details below). Given that access is granted to the entire bucket, you might want to create more specific Tower projects that provide more granular access control.

    Getting Help: If you are unfamiliar with Git/GitHub or don't know how to open a pull request, see above for how to get help.

  4. Once the pull request is approved and merged, confirm that your PR was deployed successfully. If so, the following happened on your behalf:

    • Two S3 buckets were created (listed below), and users listed under S3ReadWriteAccessArns and S3ReadOnlyAccessArns have read/write and read-only access, respectively. They each serve different purposes:

      • s3://<stack_name>-tower-bucket/: This bucket is intended for archival purposes, i.e. to store files in the long term. It can also be indexed by Synapse by default. Whenever you specify the outdir or publishDir parameters for a workflow, they should generally point to an S3 prefix in this bucket.
      • s3://<stack_name>-tower-scratch/: This bucket is intended to be used as scratch storage, i.e. to store files in the short term. The important difference with this bucket is that files will automatically be deleted after 6 months. This delay can be adjusted with the ScratchLifecycleExpiration parameter. This is intended as a convenience feature so users don't have to worry about cleaning up after themselves while benefitting from caching if the need arises (presumed here to be generally within 6 months). This bucket cannot be indexed by Synapse. It's ideal for storing the Nextflow work directories (configured on each compute environment by default) and for staging files from Synapse since they already exist somewhere else.
    • All users listed under S3ReadWriteAccessArns and S3ReadOnlyAccessArns were added to the Sage Bionetworks organization in Tower.

    • A new Tower workspace called <stack_name> was created under this organization.

    • Users listed under S3ReadWriteAccessArns were added to a workspace team with the Maintain role, which grants the following permissions:

      The users can launch pipeline and modify pipeline executions (e.g. can change the pipeline launch compute env, parameters, pre/post-run scripts, nextflow config) and create new pipeline configuration in the Launchpad. The users cannot modify Compute env settings and Credentials

    • Users listed under S3ReadOnlyAccessArns were added to a workspace team with the View role, which grants the following permissions:

      The users can access to the team resources in read-only mode

    • A set of AWS credentials called <stack_name> was added under this Tower workspace.

    • An AWS Batch compute environment called <stack_name> (default) was created using these credentials with a default configuration that should satisfy most use cases.

    N.B. If you need have special needs (e.g. more CPUs, on-demand EC2 instances, FSx for Lustre), see above for how to contact the administrators, who can create additional compute environments in your workspace.

  5. Log into Nextflow Tower using the link at the top of this README and open your project workspace. If you were listed under S3ReadWriteAccessArns, then you'll be able to add pipelines to your workspace and launch them on your data.

  6. Check out the Getting Started with Nextflow and Tower wiki page for additional instructions on how to develop workflows in Nextflow and deploy/launch them in Tower.

License

This repository is licensed under the Apache License 2.0.

Copyright 2021 Sage Bionetworks

nextflow-infra's People

Contributors

tthyer avatar thomasyu888 avatar daisyhan97 avatar allaway avatar adamjtaylor avatar xschildw avatar jaybee84 avatar ajs3nj avatar bwmac avatar zaro0508 avatar

Stargazers

Johon Li Tuobang 李拓邦 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.