ribosomeprofiling / riboflow Goto Github PK

View Code? Open in Web Editor NEW

13.0 5.0 9.0 252 KB

Pipeline for Ribosome Profiling Data

License: MIT License

Groovy 96.29% Dockerfile 1.51% Shell 1.10% Nextflow 1.10%

ribosome-profiling-data ribosome ribosome-profiling translation pipeline

riboflow's People

Contributors

Stargazers

Watchers

Forkers

novoalab bioinfonerd-forks lucacozzuto mjgeng papareddy gregor-mendel-institute hakanozadam l-modolo yzhong5021

riboflow's Issues

.ribo file does not contain counts when deduplicate = false

General File Information:
             info                 
   format version                1
        reference        appris-v1
  min read length               28
  max read length               32
        left span               35
       right span               10
 transcript count            19822
     has.metadata             TRUE
  metagene radius               50
        has.alias            FALSE

Dataset Information:
  experiment total.reads    coverage     rna.seq    metadata
  GSM1606107           0        TRUE        TRUE        TRUE
  GSM1606108           0        TRUE        TRUE        TRUE

I am wondering if this is somehow related to #28

Test config file is attached
project_nodedup.yaml.zip

Documentation Needed on libtbb

In Ubuntu 20.04 (and maybe in other distributions) we see this error when running bowtie2.

error while loading shared libraries: libtbb.so.2: cannot open shared object file: No such file or directory

Following the solution here, we figured out that installing libtbb-dev solves this issue. On Ubuntu 20.04, running the following does the installation:

sudo apt-get install libtbb-dev

Error executing process > 'put_rnaseq_into_ribo'

Hello,

I tried running the pipeline on RiboSeq+RNAseq data, and it ran until almost the end when it failed:

Error executing process > 'put_rnaseq_into_ribo (1)'

Caused by:
  Process `put_rnaseq_into_ribo (1)` terminated with an error exit status (1)

Command executed:

  ribopy rnaseq set -n KO -a KO.merged.pre_dedup.bed -f bed --force KO.ribo

Command exit status:
  1

Command output:
  (empty)

Command error:
  + ribopy rnaseq set -n KO -a KO.merged.pre_dedup.bed -f bed --force KO.ribo
  Traceback (most recent call last):
    File "/miniconda3/bin/ribopy", line 10, in <module>
      sys.exit(cli())
    File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 764, in __call__
      return self.main(*args, **kwargs)
    File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 717, in main
      rv = self.invoke(ctx)
    File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
      return _process_result(sub_ctx.command.invoke(su[9d/b03848] Submitted process > put_rnaseq_into_ribo (2)
/click/core.py", line 1137, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 956, in invoke
      return ctx.invoke(self.callback, **ctx.params)
    File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 555, in invoke
      return callback(*args, **kwargs)
    File "/miniconda3/lib/python3.6/site-packages/ribopy/cli/rnaseq.py", line 54, in set
      force         = force)
    File "/miniconda3/lib/python3.6/site-packages/ribopy/core/verify.py", line 104, in cli_func_wrapper
      return func(*args, **kwargs)
    File "/miniconda3/lib/python3.6/site-packages/ribopy/rnaseq.py", line 229, in set_rnaseq_wrapper
      with h5py.File(ribo_file, "r+") as ribo_handle:
    File "/miniconda3/lib/python3.6/site-packages/h5py/_hl/files.py", line 394, in __init__
      swmr=swmr)
    File "/miniconda3/lib/python3.6/site-packages/h5py/_hl/files.py", line 172, in make_fid
      fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
    File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
    File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
    File "h5py/h5f.pyx", line 85, in h5py.h5f.open
  OSError: Unable to open file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable')

Work dir:
  /nfs/users2/enovoa/imilenkovic/software/riboflow/riboflow/work/45/f654e3edc0b937986d14d450917fcf

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Pipeline RiboFlow completed!
Started at  2021-03-24T00:07:38.048+01:00
Finished at 2021-03-24T01:25:25.332+01:00
Time elapsed: 1h 17m 47s
Execution status: failed
WARN: Killing pending tasks (1)

Do you maybe have an idea what went wrong? Thank you!

Errors running example data on umi_devel branch

Hi @hakanozadam ,

I tried doing a fresh install of Riboflow umi_devel on Mozart but ran into errors running the example data. I am running via conda.

Looks like one issue is FASTQC files are not found for the process transcriptome_aligned_individual_fastqc:

nextflow RiboFlow.groovy -params-file project.yaml
N E X T F L O W  ~  version 19.04.1
Launching `RiboFlow.groovy` [marvelous_easley] - revision: 1eb5b20f2c
[warm up] executor > local
[skipping] Stored process > clip (4)
[skipping] Stored process > clip (3)
[skipping] Stored process > clip (2)
[skipping] Stored process > clip (1)
[skipping] Stored process > extract_umi_via_umi_tools (1)
[skipping] Stored process > extract_umi_via_umi_tools (3)
[skipping] Stored process > extract_umi_via_umi_tools (2)
[skipping] Stored process > extract_umi_via_umi_tools (4)
[skipping] Stored process > filter (1)
[skipping] Stored process > filter (2)
[skipping] Stored process > filter (3)
[skipping] Stored process > filter (4)
[skipping] Stored process > transcriptome_alignment (2)
[skipping] Stored process > transcriptome_alignment (1)
[skipping] Stored process > transcriptome_alignment (3)
[skipping] Stored process > transcriptome_alignment (4)
[skipping] Stored process > merge_transcriptome_alignment (1)
[skipping] Stored process > merge_transcriptome_alignment (2)
executor >  local (14)
executor >  local (15)
[47/2cad5f] process > clipped_fastqc                            [100%] 4 of 4 ✔
executor >  local (17)
[47/2cad5f] process > clipped_fastqc                            [100%] 4 of 4 ✔
[c5/fbec5c] process > raw_fastqc                                [100%] 4 of 4 ✔
executor >  local (18)
[47/2cad5f] process > clipped_fastqc                            [100%] 4 of 4 ✔
[c5/fbec5c] process > raw_fastqc                                [100%] 4 of 4 ✔
executor >  local (19)
[47/2cad5f] process > clipped_fastqc                            [100%] 4 of 4 ✔
[c5/fbec5c] process > raw_fastqc                                [100%] 4 of 4 ✔
[2f/590158] process > transcriptome_unaligned_individual_fastqc [ 50%] 2 of 4
executor >  local (21)
[47/2cad5f] process > clipped_fastqc                            [100%] 4 of 4 ✔
[c5/fbec5c] process > raw_fastqc                                [100%] 4 of 4 ✔
[53/15cc5f] process > transcriptome_unaligned_individual_fastqc [ 75%] 3 of 4
[47/c9d300] process > write_fastq_correspondence                [100%] 1 of 1 ✔
[7b/40c88b] process > transcriptome_aligned_individual_fastqc   [ 50%] 4 of 8, failed: 4
[92/0631c4] NOTE: Missing output file(s) `GSM1606108.1.transcriptome.aligned_fastqc.html` expected by process `transcriptome_aligned_individual_fastqc (1)` -- Execution is retried (1)
[a4/308e18] NOTE: Missing output file(s) `GSM1606107.1.transcriptome.aligned_fastqc.html` expected by process `transcriptome_aligned_individual_fastqc (2)` -- Execution is retried (1)
executor >  local (21)
[47/2cad5f] process > clipped_fastqc                            [100%] 4 of 4 ✔
[c5/fbec5c] process > raw_fastqc                                [100%] 4 of 4 ✔
[cd/3bddd9] process > transcriptome_unaligned_individual_fastqc [100%] 4 of 4, failed: 1
[47/c9d300] process > write_fastq_correspondence                [100%] 1 of 1 ✔
[7b/40c88b] process > transcriptome_aligned_individual_fastqc   [100%] 8 of 8, failed: 8
[92/0631c4] NOTE: Missing output file(s) `GSM1606108.1.transcriptome.aligned_fastqc.html` expected by process `transcriptome_aligned_individual_fastqc (1)` -- Execution is retried (1)
[a4/308e18] NOTE: Missing output file(s) `GSM1606107.1.transcriptome.aligned_fastqc.html` expected by process `transcriptome_aligned_individual_fastqc (2)` -- Execution is retried (1)
[78/401fec] NOTE: Missing output file(s) `GSM1606108.2.transcriptome.aligned_fastqc.html` expected by process `transcriptome_aligned_individual_fastqc (3)` -- Execution executor >  local (21)
[47/2cad5f] process > clipped_fastqc                            [100%] 4 of 4 ✔
[c5/fbec5c] process > raw_fastqc                                [100%] 4 of 4 ✔
[cd/3bddd9] process > transcriptome_unaligned_individual_fastqc [100%] 4 of 4, failed: 1
[47/c9d300] process > write_fastq_correspondence                [100%] 1 of 1 ✔
[7b/40c88b] process > transcriptome_aligned_individual_fastqc   [100%] 8 of 8, failed: 8
[92/0631c4] NOTE: Missing output file(s) `GSM1606108.1.transcriptome.aligned_fastqc.html` expected by process `transcriptome_aligned_individual_fastqc (1)` -- Execution is retried (1)
[a4/308e18] NOTE: Missing output file(s) `GSM1606107.1.transcriptome.aligned_fastqc.html` expected by process `transcriptome_aligned_individual_fastqc (2)` -- Execution is retried (1)
[78/401fec] NOTE: Missing output file(s) `GSM1606108.2.transcriptome.aligned_fastqc.html` expected by process `transcriptome_aligned_individual_fastqc (3)` -- Execution is retried (1)
[e5/a6ed71] NOTE: Missing output file(s) `GSM1606107.2.transcriptome.aligned_fastqc.html` expected by process `transcriptome_aligned_individual_fastqc (4)` -- Execution is retried (1)
WARN: Killing pending tasks (4)
ERROR ~ Error executing process > 'transcriptome_aligned_individual_fastqc (1)'

Caused by:
  Missing output file(s) `GSM1606108.1.transcriptome.aligned_fastqc.html` expected by process `transcriptome_aligned_individual_fastqc (1)`

Command executed:

  if [ ! -f GSM1606108.1.transcriptome.aligned.fastq.gz ]; then
     ln -s GSM1606108.1.aligned.transcriptome_alignment.fastq.gz GSM1606108.1.transcriptome.aligned.fastq.gz
  fi
  fastqc GSM1606108.1.transcriptome.aligned.fastq.gz --outdir=$PWD -t 1

Command exit status:
  0

Command output:
  Analysis complete for GSM1606108.1.transcriptome.aligned.fastq.gz

Command error:
  + '[' '!' -f GSM1606108.1.transcriptome.aligned.fastq.gz ']'
  + ln -s GSM1606108.1.aligned.transcriptome_alignment.fastq.gz GSM1606108.1.transcriptome.aligned.fastq.gz
  + fastqc GSM1606108.1.transcriptome.aligned.fastq.gz --outdir=/home/ihoskins/riboflow_umi/riboflow/work/29/b062e85fb941e26442063b70a3d1b8 -t 1
  Started analysis of GSM1606108.1.transcriptome.aligned.fastq.gz
  Failed to process file GSM1606108.1.transcriptome.aligned.fastq.gz
  java.lang.ArrayIndexOutOfBoundsException: -1
  p: wheat uk.ac.babraham.FastQC.Modules.SequenceLengthDistribution.calculateDistribution(SequenceLengthDistribution.java:101)
  	at uk.ac.babraham.FastQC.Modules.SequenceLengthDistribution.raisesError(SequenceLengthDistribution.java:190)
  - Checat uk.ac.babraham.FastQC.Report.HTMLReportArchive.startDocument(HTMLReportArchive.java:336)
  	at uk.ac.babraham.FastQC.Report.HTMLReportArchive.<init>(HTMLReportArchive.java:84)
  	at uk.ac.babraham.FastQC.Analysis.OfflineRunner.analysisComplete(OfflineRunner.java:178)
  	at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:110)
  	at java.lang.Thread.run(Thread.java:750)

Work dir:
  /home/ihoskins/riboflow_umi/riboflow/work/29/b062e85fb941e26442063b70a3d1b8

Tip: when you have fixed the problem you can continue the execution appending to the nextflow command line the option `-resume`

 -- Check '.nextflow.log' file for details

Then, if I set do_fastqc = False, I run into an error during deduplication:

nextflow RiboFlow.groovy -params-file project.yaml
N E X T F L O W  ~  version 19.04.1
Launching `RiboFlow.groovy` [insane_watson] - revision: 1eb5b20f2c
[warm up] executor > local
[skipping] Stored process > clip (4)
[skipping] Stored process > clip (3)
[skipping] Stored process > clip (1)
[skipping] Stored process > clip (2)
[skipping] Stored process > extract_umi_via_umi_tools (1)
[skipping] Stored process > extract_umi_via_umi_tools (2)
[skipping] Stored process > extract_umi_via_umi_tools (3)
[skipping] Stored process > extract_umi_via_umi_tools (4)
[skipping] Stored process > filter (2)
[skipping] Stored process > filter (1)
[skipping] Stored process > filter (3)
[skipping] Stored process > filter (4)
[skipping] Stored process > transcriptome_alignment (2)
[skipping] Stored process > transcriptome_alignment (1)
[skipping] Stored process > transcriptome_alignment (3)
[skipping] Stored process > transcriptome_alignment (4)
[skipping] Stored process > merge_transcriptome_alignment (1)
[skipping] Stored process > merge_transcriptome_alignment (2)
executor >  local (19)
[01/51ddee] process > write_fastq_correspondence  [100%] 1 of 1 ✔
[f8/7325fa] process > quality_filter              [100%] 4 of 4 ✔
[b1/af98e0] process > bam_to_bed                  [100%] 4 of 4 ✔
[8d/2ca045] process > merge_bam_post_qpass        [100%] 2 of 2 ✔
[9a/070bfd] process > add_sample_index_col_to_bed [100%] 4 of 4 ✔
[46/1950df] process > deduplicate_umi_tools       [ 50%] 2 of 4, failed: 2
executor >  local (19)
[01/51ddee] process > write_fastq_correspondence  [100%] 1 of 1 ✔
[f8/7325fa] process > quality_filter              [100%] 4 of 4 ✔
[b1/af98e0] process > bam_to_bed                  [100%] 4 of 4 ✔
[8d/2ca045] process > merge_bam_post_qpass        [100%] 2 of 2 ✔
[9a/070bfd] process > add_sample_index_col_to_bed [100%] 4 of 4 ✔
[44/491ce9] process > deduplicate_umi_tools       [100%] 4 of 4, failed: 4
[da/f6a586] NOTE: Process `deduplicate_umi_tools (2)` terminated with an error exit status (1) -- Execution is retried (1)
executor >  local (19)
[01/51ddee] process > write_fastq_correspondence  [100%] 1 of 1 ✔
[f8/7325fa] process > quality_filter              [100%] 4 of 4 ✔
[b1/af98e0] process > bam_to_bed                  [100%] 4 of 4 ✔
[8d/2ca045] process > merge_bam_post_qpass        [100%] 2 of 2 ✔
[9a/070bfd] process > add_sample_index_col_to_bed [100%] 4 of 4 ✔
[44/491ce9] process > deduplicate_umi_tools       [100%] 4 of 4, failed: 4
[da/f6a586] NOTE: Process `deduplicate_umi_tools (2)` terminated with an error exit status (1) -- Execution is retried (1)
[57/252c60] NOTE: Process `deduplicate_umi_tools (1)` terminated with an error exit status (1) -- Execution is retried (1)
WARN: Killing pending tasks (1)
ERROR ~ Error executing process > 'deduplicate_umi_tools (1)'

Caused by:
  Process `deduplicate_umi_tools (1)` terminated with an error exit status (1)

Command executed:

  umi_tools dedup --read-length               -I GSM1606107.merged.pre_dedup.bam --output-stats=GSM1606107.dedup.stats -S GSM1606107.dedup.bam -L GSM1606107.dedup.log
  
  bamToBed -i GSM1606107.dedup.bam > GSM1606107.dedup.bed

Command exit status:
  1

Command output:
  (empty)

Command error:
  + umi_tools dedup --read-length -I GSM1606107.merged.pre_dedup.bam --output-stats=GSM1606107.dedup.stats -S GSM1606107.dedup.bam -L GSM1606107.dedup.log
  Traceback (most recent call last):
    File "/home/ihoskins/miniconda3/envs/ribo_umi/bin/umi_tools", line 11, in <module>
      sys.exit(main())
    File "/home/ihoskins/miniconda3/envs/ribo_umi/lib/python3.6/site-packages/umi_tools/umi_tools.py", line 66, in main
      module.main(sys.argv)
    File "/home/ihoskins/miniconda3/envs/ribo_umi/lib/python3.6/site-packages/umi_tools/dedup.py", line 312, in main
      barcode_getter=bundle_iterator.barcode_getter)
    File "/home/ihoskins/miniconda3/envs/ribo_umi/lib/python3.6/site-packages/umi_tools/umi_methods.py", line 187, in __init__
      self.fill()
    File "/home/ihoskins/miniconda3/envs/ribo_umi/lib/python3.6/site-packages/umi_tools/umi_methods.py", line 220, in fill
      self.refill_random()
    File "/home/ihoskins/miniconda3/envs/ribo_umi/lib/python3.6/site-packages/umi_tools/umi_methods.py", line 192, in refill_random
      list(self.umis.keys()), self.random_fill_size, p=self.prob)
    File "mtrand.pyx", line 908, in numpy.random.mtrand.RandomState.choice
  ValueError: 'a' cannot be empty unless no samples are taken

Work dir:
  /home/ihoskins/riboflow_umi/riboflow/work/46/1950dfa4a153305c3aca44b9d5a765

Tip: when you have fixed the problem you can continue the execution appending to the nextflow command line the option `-resume`

 -- Check '.nextflow.log' file for details

Docker Settings: Memory Requirements

Mention the following on README or FAQ:

RiboFlow won't run in 2GB memory. So we recommend providing at least. We recommend providing at least 8GB of memory.

transcriptome question

I want to use RiboFlow for a species other than human/mouse, so I will need to provide my own transcriptome. Does the transcriptome need to only have 1 isoform per gene or can RiboFlow handle multiple isoforms per gene?

Thanks!

.ribo file not created when do_metadata = false and deduplicate = false

Running RiboFlow v0.0.1 in conda environment.

I encountered an issue where a ribo file is not created without metadata, only when combined with deduplicate = false.

If do_rnaseq = true, failure to create the ribo file leads to the error I recently posted:
ribosomeprofiling/ribopy#15

See attached for the following configs to reproduce the error with the test data.
project_no_metadata_nodedup_norna.yaml.zip

Given the support for deduplication in v0.0.1, I will see if this error still occurs with v0.0.0.

Omitting dedup from the pipeline

Hello,

Would it be possible to omit the dedup step from the pipeline? I would like to see how my final result would look like without this filtering step.

Thanks a lot!

RiboFlow Compatibility with Nextflow DSL2

When I was first installing RiboFlow, I was getting error messages when using the latest version of NextFlow, which implements DLS 2 and has deprecated DSL 1. I had to switch to NextFlow version 20.10.0 in order to run the program.

Paired END data set

Hi, do this riboflow pipeline suitable for paired end data set.

Cheers,
Ranj

Mouse annotation

Hello,

I would like to run the pipeline on some mouse Ribo-seq data, and I was wondering if you might have a mouse annotation file, such as this one for human: appris_human_24_01_2019_actual_regions.bed ?
Alternatively, could you point me to how I could generate such a file?

Thanks a lot!

Best,
Ivan

Typo in rna-seq arguments

there is a typo in the RNA-Seq argument of bowtie2 reference:

bt2_argumments

Fix this in the next issue and make it back compatible.

Running Riboflow on a transcriptome without CDS, 3' or 5' UTR defined?

My purpose for using Riboseq is to help define what the CDS, 3' or 5' UTR are for the RNA-Seq and Ribo-Seq data I have. I like the ribo data structure developed and the implementation of riboflow, but the criteria to require the UTR and CDS may be a deal-breaker for me.

I have to ask, is there a way to deactivate that criteria?

Bug Error When Attempting to Run Test Data

Riboflow looks like potentially a powerful tool so I tried downloading and seeing if I could run the tool. It looks like there is a file missing, but I'm unsure what? I have attached the output log.txt of the Next flow run.

Running the conda environment on MAC

Test RiboFlow using the conda environment on MAC.

Simpler Output

For an average user, the output only needs to have

"all.ribo"
stats folder containing ONLY the "stats.csv" file
fastqc results if do_fastqc flag is set

Every other file can go to the "intermediates" folder.

Docker image needs an update

Docker image is running on the previous version of ribopy.
We need to make a Docker image having the current version of the ribopy.

Suggestions for project.yaml file

Here are a few thoughts about the project.yaml file to make it more intuitive:

The default range of read lengths in riboflow parameter file could be 16-40 instead of 28-32 as most of the human data is commonly on that range.
Provide a few sentences about the clipping arguments for adapter filtering along the lines of:
"You might want to alter the adapter sequence for your data."
Fastq Files:
"To process your own experiment files, specify their names and provide their locations."

Optional Adapter Trimming

Make adapter trimming optional with a flag such as

trim_adapter: True/False

Some sequencing data might already be trimmed. In the current version, one way to get around this problem is providing only quality threshold to cutadapt parameters.

Samtools idxstats

Check and remove "-@" in samtools idx stats. It is likely to cause an error.

Ribo file not created when deduplication parameter is set to none or false

Hello,

When I changed the dedup_method parameter in the project_umi.yaml file to "none" or the deduplicate parameter in the project.yaml file to "false", Riboflow runs without error, but creates no ribo folder, all.ribo file, or experiments folder in the output directory. I ran an older version of Riboflow and the missing files are there.

Thank you!

This is being run with the latest versions of Docker and Nextflow as of December 4th, 2021. The OS is macOS Monterey version 12.0.1.

FASTQC on clipped Data

Add Fastqc step to clipped reads.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble