lmb-nextflow

Configuration files and other information regarding the Nextflow setup at the LMB

Detailed instructions on running pipelines - click here

lmb-nextflow's People

Contributors

Watchers

lmb-nextflow's Issues

Should we add new environment variables?

Should we add the following to the script that sets up Nextflow on the cluster?

Do not fill up your home directory with cache files

mkdir -p /beegfs3/$USER/NEXTFLOW/NXF_HOME
mkdir -p /beegfs3/$USER/NEXTFLOW/NXF_TEMP

NXF_HOME=/beegfs3/$USER/NEXTFLOW/NXF_HOME
NXF_TEMP=/beegfs3/$USER/NEXTFLOW/NXF_TEMP

Improve genome download script compatibility with Nextflow config file

Make it automatically generate an output file listing genomes that can be simply copied into the nextflow config file (i.e. correct formatting with indents and brackets etc.)

Correct STAR index listing in genome_overview.txt

The genome download script 'download_genomes.py' needs to give the correct STAR index folder in the output file 'genome_overview.txt'.

e.g. change:
star = '/public/genomics/species_references/nextflow/Genome_References/Ensembl/saccharomyces_cerevisiae/R64-1-1/Release_105/STAR_index/'

star = '/public/genomics/species_references/nextflow/Genome_References/Ensembl/saccharomyces_cerevisiae/R64-1-1/Release_105/STAR_index/saccharomyces_cerevisiae.R64-1-1.dna.105.STAR_index/'

Suggestions to improve genome download script

Just finished my rescheduled meeting. The format we agreed is:

The existing species_releases folder will have per species folders in it. When you're ready to start downloading you'll move the existing ones into old_data, but leave nextflow there till you've rewritten code.

Inside the species folder structure will look like this:

Ensembl

GRCh38

    Release 103

        BED

        FASTA

        GTF

        INDEXES

In the FASTA folder you will keep the original file names, but duplicate the genome file to a standard renaming scheme for nextflow. You'll add .genome or similar to it to indicate what this file is. You'll also download the cDNA fasta to this folder.

For indexes possibilities will be Bowtie, Bowtie2, Hisat2, STAR (both the version in the genomics/soft/bin and the nextflow version, folder names should indicate which version they were made with), Hi-CUP, 10X, PARSE. Not all will be made for all species - for all releases? Can your code be release specific?

It would be good to add the Human T2T assembly as an option as well.

Recommend Projects

stevenwingett / lmb-nextflow Goto Github PK

lmb-nextflow's Introduction

lmb-nextflow

lmb-nextflow's People

Contributors

Watchers

Forkers

lmb-nextflow's Issues

Should we add new environment variables?

Do not fill up your home directory with cache files

Improve genome download script compatibility with Nextflow config file

Correct STAR index listing in genome_overview.txt

Suggestions to improve genome download script

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs