GithubHelp home page GithubHelp logo

rapt's Introduction

Read Assembly and Annotation Pipeline Tool (RAPT)

RAPT is an NCBI pipeline designed for assembling and annotating short genomic sequencing reads obtained from bacterial or archaeal isolates de novo. It takes an SRA run or a fasta or fastq file of Illumina reads as input and produces an assembled and annotated genome of quality comparable to RefSeq in a couple of hours. RAPT consists of three major components, the genome assembler SKESA, the taxonomic assignment tool ANI and the Prokaryotic Genome Annotation Pipeline (PGAP).

With RAPT you will:

  • assemble your reads into contigs
  • assign a scientific name to the assembly
  • predict coding and non-coding genes de novo, including anti-microbial resistance (AMR) genes and virulence factors, based on expert-curated data such as hidden Markov models and conserved domain architectures
  • estimate the completeness and contamination level of the annotated assembly

If you are new to RAPT, please visit our wiki page for detailed information, and watch a short webinar.

RAPT

To use the latest version, download the RAPT command-line interface with the following commands:

~$ curl -sSLo rapt.tar.gz https://github.com/ncbi/rapt/releases/download/v0.5.5/rapt-v0.5.5.tar.gz
~$ tar -xzf rapt.tar.gz && rm -f rapt.tar.gz

There should be two scripts in your directory now, run_rapt_gcp.sh and run_rapt.py, corresponding to two variations of RAPT: Google Cloud Platform (GCP) RAPT and Standalone RAPT. GCP RAPT is designed to run on GCP and is for users with GCP accounts (please note this is different from a gmail account), while Stand-alone RAPT can run on any computing environments meeting a few pre-requisites.

For instructions on running RAPT, please go to their respective documentation pages: GCP RAPT or Stand-alone RAPT.

rapt's People

Contributors

lwagnerdc avatar ncbidave avatar slottad avatar techshine2018 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rapt's Issues

RAPT running problem

When I run RAPT, by the following command:

./run_rapt.py -a SRS10489852

I got the error in verbose.log:
Found input: SRA run accession SRS10489852
Reference data presents and is intact, skip downloading ...
SRA connection failed for accession SRS10489852:...

When, I run the recommended test code

./run_rapt.py -a SRR3496277

I met success and I see the assembly result.

I have tried several other SRA accession numbers. I got the same problem. Any comment or help will be appreciated.

> Thanks for contacting us with your error log, sorry you've again encountered a problem with this.

Thanks for contacting us with your error log, sorry you've again encountered a problem with this.

I see in NCBI's web logs that your first step (whose logging output you've given above)
succeeded in reaching our web host.

So the network access verification we have in place as part of RAPT is the next place to check and for us to fix.

We tested on linux hosts running docker; it looks like there are some differences in how networking is handled by MacOS running docker. To help us understand what's happening, could you share the version of Docker and the OS version that you are running?

Thank you for your reply. I'm running Docker 18.09.6 under Ubuntu 16.04. Anyway, you say I'm successufully reaching your web host but the error log's stated the message "problem with tcp access to ncbi.gov sdl host".

Originally posted by @mglgc in #8 (comment)

Test suite fails when running PGAP

Hi all,

I'm trying to run RAPT on a HPC system running CentOS8 with singularity version 3.5.3 (Docker is not an option for me). I'm able to run the first 5 tests, but a failure on the 6th test.

This is the output I get in concise.log for that test:

[2021-02-17 15:47:54] Starting test #6: Contigs with matching SRA run and taxonomy information in submol file
[2021-02-17 15:47:54] No submol yaml provided. Try to make one from acxn ERR3938362
[2021-02-17 15:47:59] Acquired genus_species=Mycoplasma feriruminatoris
[2021-02-17 15:59:01] Pipeline pgap failed on FASTA file /dkm_output_dir/ERR3938362_sk_out.fa and submol file /dkm_output_dir/.rapt_scratch/ERR3938362_Mycoplasma_feriruminatoris.yaml
[2021-02-17 15:59:01] **FAILED**


[2021-02-17 15:59:01] Test cycle complete, total tests run: 6, total succeeded: 5.
[2021-02-17 15:59:01] Sending PINGER url https://www.ncbi.nlm.nih.gov/stat?ncbi_app=raptdocker&version=2021-01-11.build5132&uuid=fddd345e-7160-11eb-83d6-1866daee9238&evt=test_end
[2021-02-17 15:59:06] status=21
[2021-02-17 15:59:06] Sending PINGER url https://www.ncbi.nlm.nih.gov/stat?ncbi_app=raptdocker&version=2021-01-11.build5132&uuid=fddd345e-7160-11eb-83d6-1866daee9238&evt=rapt_exit

I'm really not sure why this is failing, is there any other information I can provide?

How to know if RAPT is running?

I've had success running RAPT but not so much today. How to know if it's doing anything?

./run_rapt.py -a srr3496277
RAPT is now running, it may take a long time to finish. To see the progress, track the verbose log file /Users/rcohen/rapt/raptout_c7d9dee3bd/verbose.log.

No directory created with that name. And subsequently no output (logs or otherwise).

Trying:

./run_rapt.py --test       
RAPT is now running, it may take a long time to finish. To see the progress, track the verbose log file /Users/rcohen/rapt/raptout_436562b3bd/verbose.log.

No directory created with that name. And subsequently no output (logs or otherwise).

rcohen@iMac-Pro rapt % ./run_rapt.py -v
rcohen@iMac-Pro rapt % 

And well '-v' doesn't do anything.

Thanks.

rapt offline mode

In ticket #10 there is a reference to running rapt in offline mode. We also have a need to do this.
I've managed to modify the python script so it uses a local rapt.sif image via singularity. lsof confirms it's using the local image, but the last statement when running the --test says:
[2023-04-21 16:44:04] Running functional tests...
With no further activity as it can't access the internet.
Using an actual fastq file for input, the program appears to run as expected, but after completing with DONE, it seems to hang at "Error sending metrics to NCBI ..." without ever returning a prompt.
When using a real it tries to download more data from amazon.

Summary: Could the instructions on running locally be made publicly available. The compute nodes on the cluster have no/limited internet access (same as I suspect the user in #10) so an offline option is the only way we can get usage of the tool.
Thanks.

run annotation only

Hi, there,

I am wondering if we can run rapt with annotation only, for example, we have assembly sequence, and want to do the annotation only.

Thanks.

George

RAPT running failure

The ./run_rapt.py --test command outputs the below error message contained in the verbose.log file.
Please, could the developers contribute with any hints to fix it? Thank you in advance.

[2020-11-21 00:34:08] Sending PINGER url https://www.ncbi.nlm.nih.gov/stat?ncbi_app=raptdocker&version=2020-09-24.build4894&uuid=43cbaa1a-2b91-11eb-ada1-901b0e950242&evt=rapt_start&
[2020-11-21 00:34:48] rapt-29571188

[2020-11-21 00:34:48] Sending PINGER url https://www.ncbi.nlm.nih.gov/stat?ncbi_app=raptdocker&version=2020-09-24.build4894&uuid=43cbaa1a-2b91-11eb-ada1-901b0e950242&evt=test_start
[2020-11-21 00:35:29] Running functional tests...
[2020-11-21 00:35:30] problem with tcp access to ncbi.gov sdl host

[2020-11-21 00:35:30] port closed

[2020-11-21 00:35:30] SRA connection check failed with code 1, abort..
[2020-11-21 00:35:30] Sending PINGER url https://www.ncbi.nlm.nih.gov/stat?ncbi_app=raptdocker&version=2020-09-24.build4894&uuid=43cbaa1a-2b91-11eb-ada1-901b0e950242&evt=test_end
[2020-11-21 00:36:10] status=1
[2020-11-21 00:36:10] Sending PINGER url https://www.ncbi.nlm.nih.gov/stat?ncbi_app=raptdocker&version=2020-09-24.build4894&uuid=43cbaa1a-2b91-11eb-ada1-901b0e950242&evt=rapt_exit

Question about ANI tax report

This might be a little stupid question. But i have a question about the ANI report, and i cant seem to find a description anywhere. In the output (see the next example) there are three numeric values in every row. The first value is the ANI, but it is unclear to me wat the numbers between the brackets indicate.
Example:
ANI report for assembly: skesa_out.fa
Submitted organism: Paenibacillus xylanexedens (taxid = 528191, rank = species, lineage = Bacteria; Bacillota; Bacilli; Bacillales; Paenibacillaceae; Paenibacillus)
Predicted organism: Paenibacillus xylanexedens (taxid = 528191, rank = species, lineage = Bacteria; Bacillota; Bacilli; Bacillales; Paenibacillaceae; Paenibacillus)
Submitted organism has type: Yes
Status: CONFIRMED
Confidence: HIGH
98.007 (90.3 93.7) 8202498 assembly 2794858 Paenibacillus amylolyticus (GCA_004001025.1, ASM400102v1)
96.104 (80.9 85.4) 26148418 assembly 3983832 Paenibacillus xylanexedens (GCA_017874615.1, ASM1787461v1)
83.763 (48.2 48.5) 2788808 assembly 23349 Paenibacillus pabuli NBRC 13638 (GCA_001514495.1, ASM151449v1)
83.703 (46.2 48.1) 2311378 assembly 23858 Paenibacillus xylanivorans (GCA_001280595.1, ASM128059v1)

fungi reads

Dear RAPT developers,
The documentation explicitly says that RAPT handles bacterial or archaeal reads, but I wonder whether fungi reads (even being from eukaryotic) can be processed also. Thank yout in advance for the reply.

Can't download data

I ran an SRA sequence and am attempting to download the data for that sequence, but it comes up with a "404 page not found" error. How can I fix this issue?

PGAP LTP issue and --auto-correct-tax

Hello,
I assembled a genome with SPAdes --> checkM. CheckM showed it as Rhizobiales with 99.6% completeness. But ANI identified it as Altermonas.

I thought of trying RAPT in one go in local workstation.

./run_rapt.py -q R1.fq.gz,R2.fq.gz --organism "Alteromonas" --strain "AK250" -o RAFT_testout --auto-correct-tax -c 64

The number of contigs, N50 is same as SPAdes assembly. However, ANI report shows the following:

Submitted organism: Alteromonas (taxid = 226, rank = genus, lineage = Bacteria; Pseudomonadota; Gammaproteobacteria; Alteromonadales; Alteromonadaceae; Alteromonas/Salinimonas group)
Predicted organism: Martelella lutilitoris (taxid = 2583532, rank = species, lineage = Bacteria; Pseudomonadota; Alphaproteobacteria; Hyphomicrobiales; Aurantimonadaceae; Martelella)

Also, auto-correct-tax did not override the Alteromonas to Matelella in the final output.

The completeness is also low with 49.6%

Another question is: How to provide LTP prefix for PGAP?

Error: no LTP specified, locus tag prefix 'pgaptmp' will be used

I would like to provide a desired prefix.

Thanks in advance.

Pipeline taxcheck failed with code 1 (Example SRA)

When running the example SRA file (I'm using google cloud) I get the error, "Pipeline taxcheck failed with code 1."

[2022-09-21 18:08:04] Starting taxcheck pipeline ...
[2022-09-21 18:10:05] Retrying taxcheck pipeline with genus_species = Mycoplasma pirum...
[2022-09-21 18:12:06] Retrying taxcheck pipeline with genus_species = Mycoplasma...
[2022-09-21 18:14:07] Pipeline taxcheck failed with code 1.

Attached are the output files. Any help would be very welcome!

dffc933e00_run (1).log
dffc933e00_output (1).tar.gz

EDIT: I also forgot to include that i'm also having trouble running the FASTA of the example SRA as well. This time the SKESA is failing with code 1.

[2022-09-21 18:56:01] Start to download input-2022-04-14.build6021.prod.ani.tgz
[2022-09-21 18:56:01] Start to download input-2022-04-14.build6021.prod.tgz
[2022-09-21 18:56:31] Error: skesa failed with code 1

Attached are the output files.

34242a7082_run.log
34242a7082_output.tar.gz

Paired end reads

How to feed paired end reads into RAPT? It's not clear from he documentation how one feeds paired end reads into RAPT.

Thanks.

Failure for RAPT testing mode

The ./run_rapt.py --test command outputs some error messages below as contained in the verbose.log file. My institutional firewall has the https:443 port open for next servers:
www.ncbi.nlm.nih.gov
locate.ncbi.nlm.nih.gov
sra-download.ncbi.nlm.nih.gov
Please, could the developers contribute with additional info to fix it? Thank you in advance.


[2021-02-10 23:11:19] Sending PINGER url https://www.ncbi.nlm.nih.gov/stat?ncbi_app=raptdocker&version=2020-09-24.build4894&uuid=47b31f2e-6bf5-11eb-82ab-901b0e950242&evt=rapt_start&
[2021-02-10 23:11:59] rapt-29571188

[2021-02-10 23:11:59] Sending PINGER url https://www.ncbi.nlm.nih.gov/stat?ncbi_app=raptdocker&version=2020-09-24.build4894&uuid=47b31f2e-6bf5-11eb-82ab-901b0e950242&evt=test_start
[2021-02-10 23:12:40] Running functional tests...
[2021-02-10 23:12:41] problem with tcp access to ncbi.gov sdl host

[2021-02-10 23:12:41] port closed

[2021-02-10 23:12:41] SRA connection check failed with code 1, abort..
[2021-02-10 23:12:41] Sending PINGER url https://www.ncbi.nlm.nih.gov/stat?ncbi_app=raptdocker&version=2020-09-24.build4894&uuid=47b31f2e-6bf5-11eb-82ab-901b0e950242&evt=test_end
[2021-02-10 23:13:21] status=1
[2021-02-10 23:13:21] Sending PINGER url https://www.ncbi.nlm.nih.gov/stat?ncbi_app=raptdocker&version=2020-09-24.build4894&uuid=47b31f2e-6bf5-11eb-82ab-901b0e950242&evt=rapt_exit

RAPT Running Problem

I have a docker installed and whenever I run the code "python3 ./run_rapt.py --test" I get an error that says "docker: Error response from daemon: manifest for ncbi/rapt:v0.3.0 not found: manifest unknown." How should I fix this error?

skesa failed with code -9

I got this error when I ran RAPT on paired end illumina fastq files.

**Error: skesa failed with code -9
[2022-07-08 20:56:34] Sending PINGER url https://www.ncbi.nlm.nih.gov/stat?ncbi_app=raptdocker&version=2022-04-14.build6021&uuid=3b0d8616-ff00-11ec-810c-501fc65b9c25&evt=skesa_failed&rcode=-9
[2022-07-08 20:56:34] Usage metrics sent to NCBI
[2022-07-08 20:56:36] Download blob input-2022-04-14.build6021.prod.ani.tgz aborted
[2022-07-08 20:56:37] Download blob input-2022-04-14.build6021.prod.tgz aborted
[2022-07-08 20:56:37] status=5
[2022-07-08 20:56:37] Sending PINGER url https://www.ncbi.nlm.nih.gov/stat?ncbi_app=raptdocker&version=2022-04-14.build6021&uuid=3b0d8616-ff00-11ec-810c-501fc65b9c25&evt=rapt_exit
[2022-07-08 20:56:37] Usage metrics sent to NCBI

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.