ribosomeprofiling / riboflow Goto Github PK
View Code? Open in Web Editor NEWPipeline for Ribosome Profiling Data
License: MIT License
Pipeline for Ribosome Profiling Data
License: MIT License
General File Information:
info
format version 1
reference appris-v1
min read length 28
max read length 32
left span 35
right span 10
transcript count 19822
has.metadata TRUE
metagene radius 50
has.alias FALSE
Dataset Information:
experiment total.reads coverage rna.seq metadata
GSM1606107 0 TRUE TRUE TRUE
GSM1606108 0 TRUE TRUE TRUE
I am wondering if this is somehow related to #28
Test config file is attached
project_nodedup.yaml.zip
In Ubuntu 20.04 (and maybe in other distributions) we see this error when running bowtie2.
error while loading shared libraries: libtbb.so.2: cannot open shared object file: No such file or directory
Following the solution here, we figured out that installing libtbb-dev
solves this issue. On Ubuntu 20.04, running the following does the installation:
sudo apt-get install libtbb-dev
Hello,
I tried running the pipeline on RiboSeq+RNAseq data, and it ran until almost the end when it failed:
Error executing process > 'put_rnaseq_into_ribo (1)'
Caused by:
Process `put_rnaseq_into_ribo (1)` terminated with an error exit status (1)
Command executed:
ribopy rnaseq set -n KO -a KO.merged.pre_dedup.bed -f bed --force KO.ribo
Command exit status:
1
Command output:
(empty)
Command error:
+ ribopy rnaseq set -n KO -a KO.merged.pre_dedup.bed -f bed --force KO.ribo
Traceback (most recent call last):
File "/miniconda3/bin/ribopy", line 10, in <module>
sys.exit(cli())
File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(su[9d/b03848] Submitted process > put_rnaseq_into_ribo (2)
/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/miniconda3/lib/python3.6/site-packages/ribopy/cli/rnaseq.py", line 54, in set
force = force)
File "/miniconda3/lib/python3.6/site-packages/ribopy/core/verify.py", line 104, in cli_func_wrapper
return func(*args, **kwargs)
File "/miniconda3/lib/python3.6/site-packages/ribopy/rnaseq.py", line 229, in set_rnaseq_wrapper
with h5py.File(ribo_file, "r+") as ribo_handle:
File "/miniconda3/lib/python3.6/site-packages/h5py/_hl/files.py", line 394, in __init__
swmr=swmr)
File "/miniconda3/lib/python3.6/site-packages/h5py/_hl/files.py", line 172, in make_fid
fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 85, in h5py.h5f.open
OSError: Unable to open file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable')
Work dir:
/nfs/users2/enovoa/imilenkovic/software/riboflow/riboflow/work/45/f654e3edc0b937986d14d450917fcf
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
Pipeline RiboFlow completed!
Started at 2021-03-24T00:07:38.048+01:00
Finished at 2021-03-24T01:25:25.332+01:00
Time elapsed: 1h 17m 47s
Execution status: failed
WARN: Killing pending tasks (1)
Do you maybe have an idea what went wrong? Thank you!
Hi @hakanozadam ,
I tried doing a fresh install of Riboflow umi_devel on Mozart but ran into errors running the example data. I am running via conda.
Looks like one issue is FASTQC files are not found for the process transcriptome_aligned_individual_fastqc:
nextflow RiboFlow.groovy -params-file project.yaml
N E X T F L O W ~ version 19.04.1
Launching `RiboFlow.groovy` [marvelous_easley] - revision: 1eb5b20f2c
[warm up] executor > local
[skipping] Stored process > clip (4)
[skipping] Stored process > clip (3)
[skipping] Stored process > clip (2)
[skipping] Stored process > clip (1)
[skipping] Stored process > extract_umi_via_umi_tools (1)
[skipping] Stored process > extract_umi_via_umi_tools (3)
[skipping] Stored process > extract_umi_via_umi_tools (2)
[skipping] Stored process > extract_umi_via_umi_tools (4)
[skipping] Stored process > filter (1)
[skipping] Stored process > filter (2)
[skipping] Stored process > filter (3)
[skipping] Stored process > filter (4)
[skipping] Stored process > transcriptome_alignment (2)
[skipping] Stored process > transcriptome_alignment (1)
[skipping] Stored process > transcriptome_alignment (3)
[skipping] Stored process > transcriptome_alignment (4)
[skipping] Stored process > merge_transcriptome_alignment (1)
[skipping] Stored process > merge_transcriptome_alignment (2)
executor > local (14)
executor > local (15)
[47/2cad5f] process > clipped_fastqc [100%] 4 of 4 ✔
executor > local (17)
[47/2cad5f] process > clipped_fastqc [100%] 4 of 4 ✔
[c5/fbec5c] process > raw_fastqc [100%] 4 of 4 ✔
executor > local (18)
[47/2cad5f] process > clipped_fastqc [100%] 4 of 4 ✔
[c5/fbec5c] process > raw_fastqc [100%] 4 of 4 ✔
executor > local (19)
[47/2cad5f] process > clipped_fastqc [100%] 4 of 4 ✔
[c5/fbec5c] process > raw_fastqc [100%] 4 of 4 ✔
[2f/590158] process > transcriptome_unaligned_individual_fastqc [ 50%] 2 of 4
executor > local (21)
[47/2cad5f] process > clipped_fastqc [100%] 4 of 4 ✔
[c5/fbec5c] process > raw_fastqc [100%] 4 of 4 ✔
[53/15cc5f] process > transcriptome_unaligned_individual_fastqc [ 75%] 3 of 4
[47/c9d300] process > write_fastq_correspondence [100%] 1 of 1 ✔
[7b/40c88b] process > transcriptome_aligned_individual_fastqc [ 50%] 4 of 8, failed: 4
[92/0631c4] NOTE: Missing output file(s) `GSM1606108.1.transcriptome.aligned_fastqc.html` expected by process `transcriptome_aligned_individual_fastqc (1)` -- Execution is retried (1)
[a4/308e18] NOTE: Missing output file(s) `GSM1606107.1.transcriptome.aligned_fastqc.html` expected by process `transcriptome_aligned_individual_fastqc (2)` -- Execution is retried (1)
executor > local (21)
[47/2cad5f] process > clipped_fastqc [100%] 4 of 4 ✔
[c5/fbec5c] process > raw_fastqc [100%] 4 of 4 ✔
[cd/3bddd9] process > transcriptome_unaligned_individual_fastqc [100%] 4 of 4, failed: 1
[47/c9d300] process > write_fastq_correspondence [100%] 1 of 1 ✔
[7b/40c88b] process > transcriptome_aligned_individual_fastqc [100%] 8 of 8, failed: 8
[92/0631c4] NOTE: Missing output file(s) `GSM1606108.1.transcriptome.aligned_fastqc.html` expected by process `transcriptome_aligned_individual_fastqc (1)` -- Execution is retried (1)
[a4/308e18] NOTE: Missing output file(s) `GSM1606107.1.transcriptome.aligned_fastqc.html` expected by process `transcriptome_aligned_individual_fastqc (2)` -- Execution is retried (1)
[78/401fec] NOTE: Missing output file(s) `GSM1606108.2.transcriptome.aligned_fastqc.html` expected by process `transcriptome_aligned_individual_fastqc (3)` -- Execution executor > local (21)
[47/2cad5f] process > clipped_fastqc [100%] 4 of 4 ✔
[c5/fbec5c] process > raw_fastqc [100%] 4 of 4 ✔
[cd/3bddd9] process > transcriptome_unaligned_individual_fastqc [100%] 4 of 4, failed: 1
[47/c9d300] process > write_fastq_correspondence [100%] 1 of 1 ✔
[7b/40c88b] process > transcriptome_aligned_individual_fastqc [100%] 8 of 8, failed: 8
[92/0631c4] NOTE: Missing output file(s) `GSM1606108.1.transcriptome.aligned_fastqc.html` expected by process `transcriptome_aligned_individual_fastqc (1)` -- Execution is retried (1)
[a4/308e18] NOTE: Missing output file(s) `GSM1606107.1.transcriptome.aligned_fastqc.html` expected by process `transcriptome_aligned_individual_fastqc (2)` -- Execution is retried (1)
[78/401fec] NOTE: Missing output file(s) `GSM1606108.2.transcriptome.aligned_fastqc.html` expected by process `transcriptome_aligned_individual_fastqc (3)` -- Execution is retried (1)
[e5/a6ed71] NOTE: Missing output file(s) `GSM1606107.2.transcriptome.aligned_fastqc.html` expected by process `transcriptome_aligned_individual_fastqc (4)` -- Execution is retried (1)
WARN: Killing pending tasks (4)
ERROR ~ Error executing process > 'transcriptome_aligned_individual_fastqc (1)'
Caused by:
Missing output file(s) `GSM1606108.1.transcriptome.aligned_fastqc.html` expected by process `transcriptome_aligned_individual_fastqc (1)`
Command executed:
if [ ! -f GSM1606108.1.transcriptome.aligned.fastq.gz ]; then
ln -s GSM1606108.1.aligned.transcriptome_alignment.fastq.gz GSM1606108.1.transcriptome.aligned.fastq.gz
fi
fastqc GSM1606108.1.transcriptome.aligned.fastq.gz --outdir=$PWD -t 1
Command exit status:
0
Command output:
Analysis complete for GSM1606108.1.transcriptome.aligned.fastq.gz
Command error:
+ '[' '!' -f GSM1606108.1.transcriptome.aligned.fastq.gz ']'
+ ln -s GSM1606108.1.aligned.transcriptome_alignment.fastq.gz GSM1606108.1.transcriptome.aligned.fastq.gz
+ fastqc GSM1606108.1.transcriptome.aligned.fastq.gz --outdir=/home/ihoskins/riboflow_umi/riboflow/work/29/b062e85fb941e26442063b70a3d1b8 -t 1
Started analysis of GSM1606108.1.transcriptome.aligned.fastq.gz
Failed to process file GSM1606108.1.transcriptome.aligned.fastq.gz
java.lang.ArrayIndexOutOfBoundsException: -1
p: wheat uk.ac.babraham.FastQC.Modules.SequenceLengthDistribution.calculateDistribution(SequenceLengthDistribution.java:101)
at uk.ac.babraham.FastQC.Modules.SequenceLengthDistribution.raisesError(SequenceLengthDistribution.java:190)
- Checat uk.ac.babraham.FastQC.Report.HTMLReportArchive.startDocument(HTMLReportArchive.java:336)
at uk.ac.babraham.FastQC.Report.HTMLReportArchive.<init>(HTMLReportArchive.java:84)
at uk.ac.babraham.FastQC.Analysis.OfflineRunner.analysisComplete(OfflineRunner.java:178)
at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:110)
at java.lang.Thread.run(Thread.java:750)
Work dir:
/home/ihoskins/riboflow_umi/riboflow/work/29/b062e85fb941e26442063b70a3d1b8
Tip: when you have fixed the problem you can continue the execution appending to the nextflow command line the option `-resume`
-- Check '.nextflow.log' file for details
Then, if I set do_fastqc = False, I run into an error during deduplication:
nextflow RiboFlow.groovy -params-file project.yaml
N E X T F L O W ~ version 19.04.1
Launching `RiboFlow.groovy` [insane_watson] - revision: 1eb5b20f2c
[warm up] executor > local
[skipping] Stored process > clip (4)
[skipping] Stored process > clip (3)
[skipping] Stored process > clip (1)
[skipping] Stored process > clip (2)
[skipping] Stored process > extract_umi_via_umi_tools (1)
[skipping] Stored process > extract_umi_via_umi_tools (2)
[skipping] Stored process > extract_umi_via_umi_tools (3)
[skipping] Stored process > extract_umi_via_umi_tools (4)
[skipping] Stored process > filter (2)
[skipping] Stored process > filter (1)
[skipping] Stored process > filter (3)
[skipping] Stored process > filter (4)
[skipping] Stored process > transcriptome_alignment (2)
[skipping] Stored process > transcriptome_alignment (1)
[skipping] Stored process > transcriptome_alignment (3)
[skipping] Stored process > transcriptome_alignment (4)
[skipping] Stored process > merge_transcriptome_alignment (1)
[skipping] Stored process > merge_transcriptome_alignment (2)
executor > local (19)
[01/51ddee] process > write_fastq_correspondence [100%] 1 of 1 ✔
[f8/7325fa] process > quality_filter [100%] 4 of 4 ✔
[b1/af98e0] process > bam_to_bed [100%] 4 of 4 ✔
[8d/2ca045] process > merge_bam_post_qpass [100%] 2 of 2 ✔
[9a/070bfd] process > add_sample_index_col_to_bed [100%] 4 of 4 ✔
[46/1950df] process > deduplicate_umi_tools [ 50%] 2 of 4, failed: 2
executor > local (19)
[01/51ddee] process > write_fastq_correspondence [100%] 1 of 1 ✔
[f8/7325fa] process > quality_filter [100%] 4 of 4 ✔
[b1/af98e0] process > bam_to_bed [100%] 4 of 4 ✔
[8d/2ca045] process > merge_bam_post_qpass [100%] 2 of 2 ✔
[9a/070bfd] process > add_sample_index_col_to_bed [100%] 4 of 4 ✔
[44/491ce9] process > deduplicate_umi_tools [100%] 4 of 4, failed: 4
[da/f6a586] NOTE: Process `deduplicate_umi_tools (2)` terminated with an error exit status (1) -- Execution is retried (1)
executor > local (19)
[01/51ddee] process > write_fastq_correspondence [100%] 1 of 1 ✔
[f8/7325fa] process > quality_filter [100%] 4 of 4 ✔
[b1/af98e0] process > bam_to_bed [100%] 4 of 4 ✔
[8d/2ca045] process > merge_bam_post_qpass [100%] 2 of 2 ✔
[9a/070bfd] process > add_sample_index_col_to_bed [100%] 4 of 4 ✔
[44/491ce9] process > deduplicate_umi_tools [100%] 4 of 4, failed: 4
[da/f6a586] NOTE: Process `deduplicate_umi_tools (2)` terminated with an error exit status (1) -- Execution is retried (1)
[57/252c60] NOTE: Process `deduplicate_umi_tools (1)` terminated with an error exit status (1) -- Execution is retried (1)
WARN: Killing pending tasks (1)
ERROR ~ Error executing process > 'deduplicate_umi_tools (1)'
Caused by:
Process `deduplicate_umi_tools (1)` terminated with an error exit status (1)
Command executed:
umi_tools dedup --read-length -I GSM1606107.merged.pre_dedup.bam --output-stats=GSM1606107.dedup.stats -S GSM1606107.dedup.bam -L GSM1606107.dedup.log
bamToBed -i GSM1606107.dedup.bam > GSM1606107.dedup.bed
Command exit status:
1
Command output:
(empty)
Command error:
+ umi_tools dedup --read-length -I GSM1606107.merged.pre_dedup.bam --output-stats=GSM1606107.dedup.stats -S GSM1606107.dedup.bam -L GSM1606107.dedup.log
Traceback (most recent call last):
File "/home/ihoskins/miniconda3/envs/ribo_umi/bin/umi_tools", line 11, in <module>
sys.exit(main())
File "/home/ihoskins/miniconda3/envs/ribo_umi/lib/python3.6/site-packages/umi_tools/umi_tools.py", line 66, in main
module.main(sys.argv)
File "/home/ihoskins/miniconda3/envs/ribo_umi/lib/python3.6/site-packages/umi_tools/dedup.py", line 312, in main
barcode_getter=bundle_iterator.barcode_getter)
File "/home/ihoskins/miniconda3/envs/ribo_umi/lib/python3.6/site-packages/umi_tools/umi_methods.py", line 187, in __init__
self.fill()
File "/home/ihoskins/miniconda3/envs/ribo_umi/lib/python3.6/site-packages/umi_tools/umi_methods.py", line 220, in fill
self.refill_random()
File "/home/ihoskins/miniconda3/envs/ribo_umi/lib/python3.6/site-packages/umi_tools/umi_methods.py", line 192, in refill_random
list(self.umis.keys()), self.random_fill_size, p=self.prob)
File "mtrand.pyx", line 908, in numpy.random.mtrand.RandomState.choice
ValueError: 'a' cannot be empty unless no samples are taken
Work dir:
/home/ihoskins/riboflow_umi/riboflow/work/46/1950dfa4a153305c3aca44b9d5a765
Tip: when you have fixed the problem you can continue the execution appending to the nextflow command line the option `-resume`
-- Check '.nextflow.log' file for details
Mention the following on README or FAQ:
RiboFlow won't run in 2GB memory. So we recommend providing at least. We recommend providing at least 8GB of memory.
I want to use RiboFlow for a species other than human/mouse, so I will need to provide my own transcriptome. Does the transcriptome need to only have 1 isoform per gene or can RiboFlow handle multiple isoforms per gene?
Thanks!
Running RiboFlow v0.0.1 in conda environment.
I encountered an issue where a ribo file is not created without metadata, only when combined with deduplicate = false.
If do_rnaseq = true, failure to create the ribo file leads to the error I recently posted:
ribosomeprofiling/ribopy#15
See attached for the following configs to reproduce the error with the test data.
project_no_metadata_nodedup_norna.yaml.zip
Given the support for deduplication in v0.0.1, I will see if this error still occurs with v0.0.0.
Hello,
Would it be possible to omit the dedup step from the pipeline? I would like to see how my final result would look like without this filtering step.
Thanks a lot!
When I was first installing RiboFlow, I was getting error messages when using the latest version of NextFlow, which implements DLS 2 and has deprecated DSL 1. I had to switch to NextFlow version 20.10.0 in order to run the program.
Hi, do this riboflow pipeline suitable for paired end data set.
Cheers,
Ranj
Hello,
I would like to run the pipeline on some mouse Ribo-seq data, and I was wondering if you might have a mouse annotation file, such as this one for human: appris_human_24_01_2019_actual_regions.bed ?
Alternatively, could you point me to how I could generate such a file?
Thanks a lot!
Best,
Ivan
there is a typo in the RNA-Seq argument of bowtie2 reference:
bt2_argumments
Fix this in the next issue and make it back compatible.
My purpose for using Riboseq is to help define what the CDS, 3' or 5' UTR are for the RNA-Seq and Ribo-Seq data I have. I like the ribo
data structure developed and the implementation of riboflow, but the criteria to require the UTR and CDS may be a deal-breaker for me.
I have to ask, is there a way to deactivate that criteria?
Riboflow looks like potentially a powerful tool so I tried downloading and seeing if I could run the tool. It looks like there is a file missing, but I'm unsure what? I have attached the output log.txt of the Next flow run.
Test RiboFlow using the conda environment on MAC.
For an average user, the output only needs to have
Every other file can go to the "intermediates" folder.
Docker image is running on the previous version of ribopy.
We need to make a Docker image having the current version of the ribopy.
Here are a few thoughts about the project.yaml file to make it more intuitive:
The default range of read lengths in riboflow parameter file could be 16-40 instead of 28-32 as most of the human data is commonly on that range.
Provide a few sentences about the clipping arguments for adapter filtering along the lines of:
"You might want to alter the adapter sequence for your data."
Fastq Files:
"To process your own experiment files, specify their names and provide their locations."
Make adapter trimming optional with a flag such as
trim_adapter: True/False
Some sequencing data might already be trimmed. In the current version, one way to get around this problem is providing only quality threshold to cutadapt parameters.
Check and remove "-@" in samtools idx stats. It is likely to cause an error.
Hello,
When I changed the dedup_method parameter in the project_umi.yaml file to "none" or the deduplicate parameter in the project.yaml file to "false", Riboflow runs without error, but creates no ribo folder, all.ribo file, or experiments folder in the output directory. I ran an older version of Riboflow and the missing files are there.
Thank you!
Transcript lengths can be inferred from annotation or transcriptome reference. Hence a separate transcript length file is redundant. We can add this functionality as a separate command to rftools and incorporate this in the next version of RiboFlow.
There is no way to specify a mapping quality cutoff for RNA-Seq data. However, such a cut-off can be specified for Ribosome Profiling Data.
Adding an argument like follows
rnaseq_mapping_quality_cutoff: 2
and adding "-q" argument in the filtering step will add this feature.
Current version only works with fastq.gz inputs.
When running RiboFlow using Docker + Nextflow, the process seems to be getting stuck at the stage "creating the ribo file GSM1606107.ribo...". The docker statistics are below:
Notably, there seems to be almost no CPU usage.
This is being run with the latest versions of Docker and Nextflow as of December 4th, 2021. The OS is macOS Monterey version 12.0.1.
Add Fastqc step to clipped reads.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.