GithubHelp home page GithubHelp logo

ebi-metagenomics / emg-viral-pipeline Goto Github PK

View Code? Open in Web Editor NEW
108.0 7.0 13.0 29.46 MB

VIRify: detection of phages and eukaryotic viruses from metagenomic and metatranscriptomic assemblies

License: Apache License 2.0

Dockerfile 4.44% Nextflow 24.69% Ruby 2.09% Python 34.88% R 2.61% Shell 8.34% Common Workflow Language 22.95%
nextflow viruses cwl workflow pipeline

emg-viral-pipeline's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

emg-viral-pipeline's Issues

error running the pipeline with Nextflow version 20.10

I get the following error when running the pipeline with Nextflow docker option

Error executing process > 'preprocess:rename (1)'
...
Command error:
Command 'ps' required by nextflow to collect task metrics cannot be found

I have checked that the relevant Docker image is missing the ps command

$ docker run --rm microbiomeinformatics/emg-viral-pipeline-python3:v1 ps
docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: exec: "ps": executable file not found in $PATH: unknown.

--mahmut

Unable to locate package docker-ce-cli

Command: sudo apt-get install -y docker-ce-cli containerd.io

I got his
E: Unable to locate package docker-ce-cli
E: Unable to locate package containerd.io
E: Couldn't find any package by glob 'containerd.io'
E: Couldn't find any package by regex 'containerd.io'

Availability of species specific HMMs

Hey :)

We would love to use the VIRify species-specific HMMs to analyze our collections of contigs. We saw the scripts here to subset these from the complete ViPhogs. Since you already generated these files, could you provide the ViPhOG_viral_specific_df.pkl file on the GitHub? or a tabular format of the species-specific HMMs.

Thanks in advance!

Update PPR-Meta install

Hi,

I'm trying to install and run the pipeline using Singularity on an HPC. The install fails during the PRR-Meta installation process giving me the following error:

Error executing process > 'download_pprmeta:pprmetaGet'

Caused by:
  Process `download_pprmeta:pprmetaGet` terminated with an error exit status (128)

Command executed:

  git clone https://github.com/Stormrider935/PPR-Meta.git
  mv PPR-Meta/* .  
  rm -r PPR-Meta

Command exit status:
  128

Command output:
  Cloning into 'PPR-Meta'...

Command error:
  fatal: unable to look up current user in the passwd file: no such user
  Unexpected end of command stream

A quick look at github shows that the current git repository for PRR-Meta is https://github.com/mult1fractal/PPR-Meta not Stormrider935.

Are there any possible workarounds you would suggest? Thanks a lot for the pipeline!

Nextflow pull failure

Dear developer,
Thanks for your such good jobs.
My problem is that I can not install emg-viral-pipeline by any method of your provided.
Nextflow method:

(PathoFact) [u@h@software]$ nextflow run EBI-Metagenomics/emg-viral-pipeline --fasta "/home/$USER/.nextflow/assets/EBI-Metagenomics/emg-viral-pipeline/nextflow/test/assembly.fasta" --cores 4 -profile local,docker -r v0.1
N E X T F L O W  ~  version 21.04.1
Launching `EBI-Metagenomics/emg-viral-pipeline` [maniac_saha] - revision: 5c1279c24b [v0.1]
WARN: DSL 2 PREVIEW MODE IS DEPRECATED - USE THE STABLE VERSION INSTEAD -- Read more at https://www.nextflow.io/docs/latest/dsl2.html#dsl2-migration-notes

Profile: local,docker

Current User: yut
Nextflow-version: 21.04.1
Starting time: 14-05-2021 15:20 UTC
Workdir location:
  /tmp/nextflow-work-yut
Databases location:
  nextflow-autodownload-databases

Dev ViPhOG database: v3
Dev Meta database: v2

Only run annotation: false

This workflow requires Nextflow version 20.01 or greater -- You are running version 21.04.1

Issue to run stable version v0.2.0

Hello folks,

I'm getting this error while trying to run VIRfy.

Preview nextflow mode 'preview' is not supported anymore -- Please use nextflow.enable.dsl=2 instead.

I'm running the command below using N E X T F L O W ~ version 22.04.4

nextflow run EBI-Metagenomics/emg-viral-pipeline -r v0.2.0 --help

Obs. This command works well - nextflow run EBI-Metagenomics/emg-viral-pipeline -r master --help

Best regards,
Gabriel Wallau

Different results between older and current version

I just run the nextflow pipeline again using v0.3.0 on the ~/.nextflow/assets/EBI-Metagenomics/emg-viral-pipeline/nextflow/test/kleiner_virome_2015.fasta example data set.

Now, the pipeline only predicts few low-confidence contigs:

contig_ID       genus   subfamily       family  order
NODE_20_length_41715_cov_14831.579165           Sepvirinae      Podoviridae     Caudovirales
NODE_22_length_38841_cov_7038.120250    Teseptimavirus  Studiervirinae  Autographiviridae       Caudovirales
NODE_23_length_37379_cov_9295.445022    Teseptimavirus  Studiervirinae  Autographiviridae       Caudovirales
NODE_193_length_1739_cov_4.978622
NODE_66_length_5441_cov_4793.546417

while before w/ v0.2.0 we detected them as high-confidence contigs and also, in addition, some putative prophages. Because the predition of high-confidence contigs and prophages heavily depends on VirSorter I guess that something here changed between the versions.

Singularity on Slurm

Hi All,

I'm quite new to Nextflow and encountered an issue which maybe related to Singularity. I tried to run the basic test for the pipeline

module load nextflow
module load singularity/latest

HOME_DIR="/home/users/astar/gis/wijayai"
OUTPUT_DIR="${HOME_DIR}/scratch/virify"

nextflow run EBI-Metagenomics/emg-viral-pipeline -r v0.4.0 \
--fasta ${HOME_DIR}/.nextflow/assets/EBI-Metagenomics/emg-viral-pipeline/nextflow/test/assembly.fasta \
--cores 4 \
--outout ${OUTPUT_DIR} \
--workdir ${OUTPUT_DIR}/work \
--databases ${OUTPUT_DIR}/virifyDB \
--cachedir ${HOME_DIR}/.singularity \
-profile slurm,singularity

and got this error

Error executing process > 'preprocess:rename (1)'

Caused by:
  Failed to pull singularity image
  command: singularity pull  --name microbiomeinformatics-emg-viral-pipeline-python3-v1.img.pulling.1654852796106 docker://microbiomeinformatics/emg-viral-pipeline-python3:v1 > /dev/null
  status : 255
  message:
    INFO:    Converting OCI blobs to SIF format
    INFO:    Starting build...
    Getting image source signatures
    Copying blob sha256:8559a31e96f442f2c7b6da49d6c84705f98a39d8be10b3f5f14821d0ee8417df
    Copying blob sha256:62e60f3ef11eb77464958fc8fed447414fe652a21fc69aea87038a02e05e6000
    Copying blob sha256:93c8ae15378255764d0f4929a03e87b1a2497ca34b479bc3d66a53c6f99d1333
    Copying blob sha256:ea222f757df7f8b1f7f73b74a335497a015e3a254ce99cc27fe2c591132ad4ce
    Copying blob sha256:e97d3933bbbe5af2dcfa3af027e5fbf95562d4bd37ed7b85bef9f5c5e0dc2862
    Copying blob sha256:adafb644e4de15d419df2c1ebcbdec9666d4d1b0336b1910088a04743f6f387f
    Copying blob sha256:5c7900f7bd30eec8a4e096ce021092672609b00170d570524e96ba914311c1f0
    Copying config sha256:b9f1fd37db9a9e00804a77f504cbb80ff3c8ff3bbac09d0df07e40853b5d554a
    Writing manifest to image destination
    Storing signatures
    2022/06/10 17:20:40  info unpack layer: sha256:8559a31e96f442f2c7b6da49d6c84705f98a39d8be10b3f5f14821d0ee8417df
    2022/06/10 17:20:42  info unpack layer: sha256:62e60f3ef11eb77464958fc8fed447414fe652a21fc69aea87038a02e05e6000
    2022/06/10 17:20:43  info unpack layer: sha256:93c8ae15378255764d0f4929a03e87b1a2497ca34b479bc3d66a53c6f99d1333
    2022/06/10 17:20:43  info unpack layer: sha256:ea222f757df7f8b1f7f73b74a335497a015e3a254ce99cc27fe2c591132ad4ce
    2022/06/10 17:20:43  info unpack layer: sha256:e97d3933bbbe5af2dcfa3af027e5fbf95562d4bd37ed7b85bef9f5c5e0dc2862
    2022/06/10 17:20:44  info unpack layer: sha256:adafb644e4de15d419df2c1ebcbdec9666d4d1b0336b1910088a04743f6f387f
    2022/06/10 17:20:44  info unpack layer: sha256:5c7900f7bd30eec8a4e096ce021092672609b00170d570524e96ba914311c1f0
    INFO:    Creating SIF file...
    FATAL:   While making image from oci registry: error fetching image to cache: while building SIF from layers: while creating SIF: while creating container: writing data object for SIF file: copying data object file to SIF file: write /home/users/astar/gis/wijayai/cache/oci-tmp/tmp_693074224: copy_file_range: resource temporarily unavailable

I googled and thought that this maybe because insufficient of space for the cache. And then, I changed the --cachedir to my scratch folder which has more space. I have also changed NXF_SINGULARITY_CACHEDIR and SINGULARITY_CACHEDIR to my scratch. But, I still got similar error. Do you know what might cause this error? Thank you :)

Performance

Performance bottlenecks

Currently the slowest steps are:

  • VirSorter
  • Hmmscan

Proposed solutions

hmmscan

Chunk fasta

VirSorter

There is a new version coming up in the next few weeks, wait for that release.

Missing CheckV output when there are multiple prophages in contig

Hello there!

I'm running Virify through Nextflow and realized the CheckV directory is missing when more than one prophage is predicted in the same contig. Tracing back the log files, I found the error:

Please remove duplicated sequence IDs from the input FASTA file: XXX: cannot stat 'prophages/quality_summary.tsv': No such file or directory

Adding the coordinates to the sequence IDs in the provirus fasta file would solve the problem.

Bests!

VirSorter predicts prophage larger than contig size

Test assembly:

kleiner_2015.fasta.gz

Observation

It seems that VirSorter predicts a prophage in a range that is actually larger than the contig size. Example:

>NODE_51_length_63443_cov_50.479870

So contig NODE_51 (seq51 after renaming) has a length of 63443 nt.

Now VirSorter predicts a prophage for this contig from position 19922-63493:

(base) [mhoelzer@hh-yoda-11-01 ~]$ grep seq51 /hps/nobackup2/metagenomics/mhoelzer/nextflow-results/virify/v0.1/kleiner_2015/kleiner_2015/01-viruses/virsorter/Predicted_viral_sequences/VIRSorter_prophages_cat-4.fasta 
>VIRSorter_seq51_gene_20_gene_72-19922-63493-cat_4

So the predicted prophage's stop position is larger than the actual contig size. I have the feeling this is a VirSorter problem or maybe a wanted feature. Maybe we should restrict the predicted prophage stop to the length of the contig.

@mberacochea will add some checks and print a message to let us know.

SARS-CoV-2 workflow comparison - kindly check if your work is represented correctly

Hello VIRify Team,

I am from the University Hospital Essen, Germany, and we work extensively with SARS-CoV-2 in our research. We have also developed a SARS-CoV-2 workflow. In preparation for the publication of our workflow, we have looked at several other SARS-CoV-2 related workflows, including your work. We will present this review in the publication and want to ensure that your work is represented as accurately as possible.

Moreover, there is currently little to no current overview of SARS-CoV-2 related workflows. Therefore, we have decided to make the above comparison publicly available via this GitHub repository. It contains a table with an overview of the functions of different SARS-CoV-2 workflows and the tools used to implement these functions.

We would like to give you the opportunity to correct any misunderstandings on our side. Please take a moment to make sure we are not misrepresenting your work or leaving out important parts of it by taking a look at this overview table. If you feel that something is missing or misrepresented, please feel free to give us feedback by contributing directly to the repository.

Thank you very much!

cc @alethomas

CheckV failed

Hello,
I try to run the virify pipeline using slurm singularity on our cluster, but I get : annotate:checkV failed.
At first , run a small number of contigs (10contigs) using core 1 and max_cores 1 and everything is fine. But when I run with more cores the same contigs (10 and in another analysis more contigs) the error occurred.
I submit with sbatch the following script (varying the cores parameter).
I observed that the docker images are stored in a folder called "false". Is this ok ? In addition, I created a "/singularity_cachdir" which doesn't contain data.
Is it necessary or favorable to indicate cpus an mem when running the script; as for example:
sbatch --cpus-per-task=8 --mem=50G script_run_on_cluster_AQ_virify3.sh
I would like to get a stable installation before running a high number of contigs.

Thanks for your help
Achim

here my script
#! /bin/bash
#$ -S /bin/bash
#$ -M [email protected]
#$ -m bea

. /softs/local/env/envnextflow-22.10.4.sh
. /softs/local/env/envsingularity-3.8.5.sh

nextflow run EBI-Metagenomics/emg-viral-pipeline -r v0.4.0 --fasta 'Virsorter_vBig_10contigs.fasta' --cores 1 --max_cores 1 -profile slurm,singularity --databases /scratch/aquaiser/databases --singularity /scratch/aquaiser/singularity_cachdir --output /home/genouest/rbpe/aquaiser/virify_output

Allow user adjustment of minimal informative protein threshold

Currently, the minimum number of proteins with ViPhOG annotations at each taxonomic level, required for taxonomic assignment, is 2.

However, for certain use cases (see discussion in #64 ) it would be good if the user can change this, e.g. to "1" for short contigs or segmented viruses. However, this might lead to more FP hits.

The parameter is used in this script:

https://github.com/EBI-Metagenomics/emg-viral-pipeline/blob/master/bin/contig_taxonomic_assign.py#L103

which is called in this process:

https://github.com/EBI-Metagenomics/emg-viral-pipeline/blob/master/nextflow/modules/assign.nf#L15

VirFinder w/ a more specific model for eukaryotic viruses

Currently, we are using the default model for VirFinder predictions.

However, we are particularly interested in predicting also eukaryotic viruses (and not only phages) with VirFinder. I tested the prediction using a specific model and implemented this in the nextflow version of the pipeline:
hoelzer/virify#21

Basically, the model needs to be downloaded (or deposited somewhere):

wget https://github.com/jessieren/VirFinder/raw/master/EPV/VF.modEPV_k8.rda

and then I am using a simplified version of a script from Guillermo:

run_virfinder_modEPV.Rscript VF.modEPV_k8.rda ${fasta} .
awk '{print $1"\t"$2"\t"$3"\t"$4}' ${name}*.txt > ${name}.txt

The script can be found here:
https://github.com/hoelzer/virify/tree/master/bin

I just introduced the awk filter because the resulting txt file has additional columns in comparison to what the pipeline is currently expecting in the next parse step and to avoid any problems here.

I think what needs to be done is:

  • clean the R script
  • use it instead of the current one in the CWL pipeline
    • the CWL is currently using a parallelized version of VirFinder, however, VirSorter is much slower than VirFinder so the parallelization is at the moment not really a speed-up because the parsing process waits for output of both tools

filter_contigs_len.py command not fout

I run command as fellow:
cd ~/20T/DataBase/SoftwaresEnsembel/MAG
git clone --recursive https://github.com/EBI-Metagenomics/emg-viral-pipeline.git

export PATH=$PATH:~/20T/DataBase/SoftwaresEnsembel/MAG/emg-viral-pipeline/bin

cd /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/test_2021_12_27

~/Softwares/Miniconda3/nextflow-21.03.0-edge/nextflow run virify.nf --help

~/Softwares/Miniconda3/nextflow-21.03.0-edge/nextflow run -resume
~/20T/DataBase/SoftwaresEnsembel/MAG/emg-viral-pipeline/virify.nf
--fasta "/home/stone/20T/SraDownload/Genome/TBEV/NC_001672.1_sequence.fasta"
--cores 4
--output /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/test_2021_12_27
--workdir /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/test_2021_12_27/work
--databases ~/20T/DataBase/SoftwaresEnsembel/MAG/emg-viral-pipeline/DATABASES
--cachedir ~/20T/DataBase/SoftwaresEnsembel/MAG/emg-viral-pipeline/SINGULARITY
-profile local,singularity

I got:
Error executing process > 'preprocess:length_filtering (1)'

Caused by:
Process preprocess:length_filtering (1) terminated with an error exit status (127)

Command executed:

filter_contigs_len.py -f NC_001672_renamed.fasta -l 1.5 -o ./
CONTIGS=$(grep ">" NC_001672filt.fasta | wc -l)

Command exit status:
127

Command output:
(empty)

Command error:
.command.sh: line 2: filter_contigs_len.py: command not found

Work dir:
/home/stone/20T/DataBase/SoftwaresEnsembel/MAG/test_2021_12_27/work/f6/9db03ebd837a74d632361cd0f07d79

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

How can I resolve this?

Database locations and download

Currently, the different databases used by VIRify are auto-downloaded and need to be already available. However, from a user perspective, it would be good to mention the database URLs in the README to also allow a manual download.

processing multiple inputs in parallel?

Hi team, it's thrilling to have successfully run VIRify for the first time!

I found the CPU usage is low when running hmmsearch, so I'd like to execute multiple VIRfy processes in parallel. However, an error occurred:

Unknown error accessing project `EBI-Metagenomics/emg-viral-pipeline` -- Repository may be corrupted: 
/home/shenwei/.nextflow/assets/EBI-Metagenomics/emg-viral-pipeline

Then I have to process all samples one by one, no error by now.

Does VIRfy (nextflow) support processing multiple inputs in parallel? I think it does cause there's an option --cores:

--cores             max cores per process for local use [default: 40]
--max_cores         max cores per machine for local use [default: 160]

Wei


Command:

j=1     # number of VIRfy processes
J=40    # CPU number of each VIRfy process
mem=100 # max memory

db=~/ws/db/virify
sg=~/app/singularity
reads=assembly
ete3tax=ete3_ncbi_tax.sqlite
time fd ^contigs.fasta$ $reads/ \
    | grep -v work \
    | rush -j $j -v j=$J -v mem=$mem -v db=$db -v sg=$sg -v ete3tax=$ete3tax \
        'nextflow run EBI-Metagenomics/emg-viral-pipeline -r v0.4.0 \
            --databases {db} --cachedir {sg} --ncbi {ete3tax} \
            --fasta {} --workdir {/}/work --output {/}/virfy \
            -profile local,singularity --memory {mem} --cores {j} ' \
        -c -C virify.rush --verbose

Regression - VirSorter prophage report at the end of the contigs

By updating the virsorter container to the one hosted in biocontainers we introduced a regression: simroux/VirSorter#68.
VirSorter extends the prophage sequence by 50 nucleotide beyond the last gene in 5' and 3' (to include potential att sites and not end a contig right on a start / stop codon). I just forgot to include a check to make sure we don't extend past the contig, i.e. if the last gene of the prophage is at the end of the contig, the coordinate from VirSorter will be 50 nucleotides beyond the contig (hence 63493 vs 63443).

validation errors for pipeline.cwl

Dear MGnify team,

I had the following error when I try running virify.sh on an example fasta file

ValidationException: The CWL reference runner no longer supports pre CWL v1.0 documents. Supported versions are:
v1.0
v1.1
v1.1.0-dev1 (with --enable-dev flag only)
v1.2.0-dev1 (with --enable-dev flag only)
v1.2.0-dev2 (with --enable-dev flag only)

I have tried upgrading the toil and cwltool packages to their latest versions and also installed two packages required as a result of these updates, I then get the following error:

schema_salad.exceptions.ValidationException: src/pipeline.cwl:12:1: union[str, int, schema_salad.avro.schema.Schema, list, list, dict, list,
None] object expected; got ruamel.yaml.scalarfloat.ScalarFloat
(emgvp) uludagm@kw60408:~/biosphere/emg-viral-pipeline/cwl$ find . -name *.cwl

--mahmut

Remove generate_krona_table.py

I think generate_krona_table.py is not used anymore and was replaced by generate_counts_table.py, thus, can be removed.

Add Tower execution feature to README

It's quite simple to monitor a pipeline run via tower.nf:

image

We could add a section to the README, e.g., after the ## Profiles section in the already updated README in the dev branch (github is converting some of the md below automatically, should work to just copy from raw mode):

## Monitoring

<img align="right" width="400px" src="figures/tower.png" alt="Monitoring with Nextflow Tower" /> 

To monitor your Nextflow computations, VIRify can be connected to [Nextflow Tower](https://tower.nf). You need a user access token to connect your Tower account with the pipeline. Simply [generate a login](https://tower.nf/login) using your email and then click the link sent to this address.

Once logged in, click on your avatar in the top right corner and select "Your tokens." Generate a token or copy the default one and set the following environment variable:

```bash
export TOWER_ACCESS_TOKEN=<YOUR_COPIED_TOKEN>

You can save this variable in your .bashrc or .profile to not need to enter it again. Refresh your terminal.

Now run:

nextflow run EBI-Metagenomics/emg-viral-pipeline -r v0.4.0 --fasta "/home/$USER/.nextflow/assets/EBI-Metagenomics/emg-viral-pipeline/nextflow/test/assembly.fasta" --cores 4 -profile local,docker -with-tower

Alternatively, you can also pull the code from this repository and activate the Tower connection within the nextflow.config file located in the root GitHub directory:

tower {
    accessToken = ''
    enabled = true
} 

You can also directly enter your access token here instead of generating the above-mentioned environment variable.

Fail to run example execution

Dear developer,

I installed virify via singularity but I failed to run the example execution with the command below:

./nextflow run EBI-Metagenomics/emg-viral-pipeline -r v0.4.0 --fasta "/home/$USER/.nextflow/assets/EBI-Metagenomics/emg-vira                                                                                                                    l-pipeline/nextflow/test/assembly.fasta" --cores 4 -profile local,singularity
  • Note: I just change the -profile local, docker to -profile local, singularity

Error message:

N E X T F L O W  ~  version 22.10.6
Launching `https://github.com/EBI-Metagenomics/emg-viral-pipeline` [distraught_c                                                                                                                    uvier] DSL2 - revision: f367002f0e [v0.4.0]

Profile: local,singularity

Current User: chong
Nextflow-version: 22.10.6
Starting time: 23-01-2023 23:20 UTC
Workdir location:
  /home/chong/primal/work
Databases location:
  nextflow-autodownload-databases

Dev ViPhOG database: v3
Dev Meta database: v2

Only run annotation: false

executor >  local (11)
[e4/6fad74] process > download_pprmeta:pprmetaGet          [100%] 1 of 1 ✔
[23/ee582f] process > download_virsorter_db:virsorterGetDB [  0%] 0 of 1
[7e/2dc2ed] process > download_virfinder_db:virfinderGetDB [100%] 1 of 1 ✔
[09/8fb736] process > download_model_meta:metaGetDB        [100%] 1 of 1 ✔
[60/dfb92e] process > download_viphog_db:viphogGetDB       [  0%] 0 of 1
[7d/06e34c] process > download_ncbi_db:ncbiGetDB           [100%] 1 of 1 ✔
[b9/32d3d7] process > download_checkv_db:checkvGetDB       [  0%] 0 of 1
[19/d88800] process > preprocess:rename (1)                [100%] 1 of 1 ✔
[66/41ecc6] process > preprocess:length_filtering (1)      [100%] 1 of 1 ✔
[-        ] process > detect:virsorter                     -
[a8/c43d73] process > detect:virfinder (1)                 [100%] 1 of 1 ✔
[2c/1d1535] process > detect:pprmeta (1)                   [  0%] 0 of 1
executor >  local (11)
[e4/6fad74] process > download_pprmeta:pprmetaGet          [100%] 1 of 1 ✔
[-        ] process > download_virsorter_db:virsorterGetDB -
[7e/2dc2ed] process > download_virfinder_db:virfinderGetDB [100%] 1 of 1 ✔
[09/8fb736] process > download_model_meta:metaGetDB        [100%] 1 of 1 ✔
[-        ] process > download_viphog_db:viphogGetDB       -
[7d/06e34c] process > download_ncbi_db:ncbiGetDB           [100%] 1 of 1 ✔
[-        ] process > download_checkv_db:checkvGetDB       -
[19/d88800] process > preprocess:rename (1)                [100%] 1 of 1 ✔
[66/41ecc6] process > preprocess:length_filtering (1)      [100%] 1 of 1 ✔
[-        ] process > detect:virsorter                     -
[a8/c43d73] process > detect:virfinder (1)                 [100%] 1 of 1 ✔
[2c/1d1535] process > detect:pprmeta (1)                   [100%] 1 of 1, failed: 1 ✘
[-        ] process > detect:parse                         -
[-        ] process > postprocess:restore                  -
[-        ] process > annotate:prodigal                    -
[-        ] process > annotate:hmmscan_viphogs             -
[-        ] process > annotate:hmm_postprocessing          -
[-        ] process > annotate:ratio_evalue                -
[-        ] process > annotate:annotation                  -
[-        ] process > annotate:plot_contig_map             -
[-        ] process > annotate:assign                      -
[-        ] process > annotate:checkV                      -
[-        ] process > plot:generate_krona_table            -
[-        ] process > plot:krona                           -
[-        ] process > plot:generate_sankey_table           -
[-        ] process > plot:sankey                          -
Error executing process > 'detect:pprmeta (1)'

Caused by:
  Process `detect:pprmeta (1)` terminated with an error exit status (255)

Command executed:

  [ -d "pprmeta" ] && cp pprmeta/* .
  ./PPR_Meta assembly_renamed_filt1500bp.fasta assembly_pprmeta.csv

Command exit status:
  255

Command output:

  Using TensorFlow backend.
  Illegal instruction (core dumped)
  python predict.py model_a.h5: Illegal instruction

Command error:
  INFO:    Converting SIF file to temporary sandbox...
  WARNING: underlay of /etc/localtime required more than 50 (94) bind mounts

  Using TensorFlow backend.
  Illegal instruction (core dumped)
  python predict.py model_a.h5: Illegal instruction
  Error using dlmread (line 62)
  The file 'tmp/predict.csv' could not be opened because: No such file or directory

  Error in PPR (line 96)


  Error in PPR_Meta (line 49)

  MATLAB:dlmread:FileNotOpened
  INFO:    Cleaning up image...

Work dir:
  /home/chong/primal/work/2c/1d153586b94115460798aa673604d2

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

I noticed that there is no tmp folder, which showcased this error: The file 'tmp/predict.csv' could not be opened because: No such file or directory

Do you have any suggestions to tackle this issue?

Best regards,
Chong

Fail to publish CheckV results Warning

I am running v1.0 on a HPC w/ SLURM and singularity. Apparently, there is a problem in publishing the output of the CheckV step:

WARN: Failed to publish file: /scratch/hoelzerm/projects/2023-01-15-VIRify-wastewater-Wyler/results-RNA116SPAdes-virome/work/6e/31afba1ef29fed4df415b69d74c2d0/low_confidence_viral_contigs; to: /scratch/hoelzerm/projects/2023-01-15-VIRify-wastewater-Wyler/results-RNA116SPAdes-virome/RNA-WW_220421_20-22hmix_cer_trizol_S34_L001/07-checkv/low_confidence_viral_contigs [copy] -- See log file for details
WARN: Failed to publish file: /scratch/hoelzerm/projects/2023-01-15-VIRify-wastewater-Wyler/results-RNA116SPAdes-virome/work/8a/8cdfebeb9a7dc05ed8658509414ba7/low_confidence_viral_contigs; to: /scratch/hoelzerm/projects/2023-01-15-VIRify-wastewater-Wyler/results-RNA116SPAdes-virome/RNA-WW_220421_20-22hmix_cer_trizol_S34_L001/07-checkv/low_confidence_viral_contigs [copy] -- See log file for details
WARN: Failed to publish file: /scratch/hoelzerm/projects/2023-01-15-VIRify-wastewater-Wyler/results-RNA116SPAdes-virome/work/39/17f263614b9a7d46ef6c768e5958e0/low_confidence_viral_contigs; to: /scratch/hoelzerm/projects/2023-01-15-VIRify-wastewater-Wyler/results-RNA116SPAdes-virome/RNA-WW_220421_20-22hmix_cer_trizol_S34_L001/07-checkv/low_confidence_viral_contigs [copy] -- See log file for details
WARN: Failed to publish file: /scratch/hoelzerm/projects/2023-01-15-VIRify-wastewater-Wyler/results-RNA116SPAdes-virome/work/dc/c4e9ab9e557386cc5c9f2a849faf8d/low_confidence_viral_contigs; to: /scratch/hoelzerm/projects/2023-01-15-VIRify-wastewater-Wyler/results-RNA116SPAdes-virome/RNA-WW_220421_20-22hmix_cer_trizol_S34_L001/07-checkv/low_confidence_viral_contigs [copy] -- See log file for details

Looking closer there are the output files in the work dir, e.g.:

ls /scratch/hoelzerm/projects/2023-01-15-VIRify-wastewater-Wyler/results-RNA116SPAdes-virome/work/6e/31afba1ef29fed4df415b69d74c2d0/low_confidence_viral_contigs

complete_genomes.tsv  completeness.tsv  contamination.tsv  proviruses.fna  quality_summary.tsv  tmp  viruses.fna

and I also see the output published in the results dir:

ls /scratch/hoelzerm/projects/2023-01-15-VIRify-wastewater-Wyler/results-RNA116SPAdes-virome/RNA-WW_220421_20-22hmix_cer_trizol_S34_L001/07-checkv/low_confidence_viral_contigs

complete_genomes.tsv  completeness.tsv  contamination.tsv  proviruses.fna  quality_summary.tsv  tmp  viruses.fna

So why the warning?

Test Virsorter2 as a replacement for Virsorter

I already have a container for VS2 (where I can then also provide the Dockerfile so it can go to the correct Hub): docker pull nanozoo/virsorter2:2.2.3--855ce4e

VS2 can replace VS in a separate branch, and we would need to run some benchmarks.

Error - Cannot run program "sbatch"

Hi!

I am trying to run VIRify in an HPC cluster using nextflow (22.10.4.5836), slurm and singularity (3.4.2).

This is the command in my sbatch script

nextflow run /CCAS/home/cpavloudi/.nextflow/assets/EBI-Metagenomics/emg-viral-pipeline/virify.nf --fasta "/christina/viruses/metaviralspades/*.fasta" --output /christina/VIRify_output --workdir /christina/work --cores 12 -profile slurm,singularity

but the job fails with this error

Error executing process > 'download_viphog_db:viphogGetDB' Caused by: java.io.IOException: Cannot run program "sbatch" (in directory "/christina/work/26/795584d61825df5f62d5176a0562cd"): error=2, No such file or directory

I have attached the output of the job so you can take a better look at it.
VIRify.2135431.output.txt

Add scroll bar for ChromoMap plot

One can change via sed or awk the following line from

<div id="htmlwidget-f6c6b982908bd2088de7" class="chromoMap html-widget"    style="width:960px;height:500px;">

to

<div id="htmlwidget-f6c6b982908bd2088de7" class="chromoMap html-widget"    style="width:960px;height:500px;overflow:auto;">

Thus, we don't need to reduce the number of plotted contigs or split them. Compare also with implementation in WtP.

Docker containers in CWL.

Review CWL docker hints and the Dockerfiles.

Missing ones:

  • fasta rename/restore
  • hmmscan_format_table
  • IMG VR - Blast
  • Krona

PPR-Meta scoring

Currently, all sequences are considered as viruses that are reported as "phages" by PPR-Meta. However, we can additionally filter by a "phage score" provided by the tool:

Header,Length,phage_score,chromosome_score,plasmid_score,Possible_source
seq8,86578,0.658026557109837,0.323770475766357,0.0182029599535136,phage
seq11,63443,0.671362450565434,0.257167359821571,0.0714701900259453,phage
seq20,41715,0.945974168353953,0.0147801588566125,0.0392456778921355,phage
seq22,38841,0.999412552439551,1.51951318980135e-05,0.000572250124745809,phage
awk 'BEGIN{FS=","}{if($6=="phage" && $3>0.7){print $0}}' 01-viruses/pprmeta/kleiner_virome_2015_pprmeta.csv 

This is also done here

Remove pipeline.cwl warnings

cwltool --validate CWL/pipeline.cwl INFO /home/mbc/miniconda3/envs/viral_pipeline/bin/cwltool 1.0.20190906054215 INFO Resolved 'CWL/pipeline.cwl' to 'file:///home/mbc/projects/emg_virify/CWL/pipeline.cwl' WARNING Workflow checker warning: CWL/pipeline.cwl:82:9: Source 'high_confidence_contigs_genes' of type ["null", "File"] may be incompatible CWL/pipeline.cwl:90:7: with sink 'aa_fasta_files' of type {"type": "array", "items": "File"} source has linkMerge method merge_flattened CWL/pipeline.cwl:83:9: Source 'low_confidence_contigs_genes' of type ["null", "File"] may be incompatible CWL/pipeline.cwl:90:7: with sink 'aa_fasta_files' of type {"type": "array", "items": "File"} source has linkMerge method merge_flattened CWL/pipeline.cwl:84:9: Source 'prophages_contigs_genes' of type ["null", "File"] may be incompatible CWL/pipeline.cwl:90:7: with sink 'aa_fasta_files' of type {"type": "array", "items": "File"} source has linkMerge method merge_flattened CWL/pipeline.cwl:82:9: Source 'high_confidence_contigs_genes' of type ["null", "File"] may be incompatible CWL/pipeline.cwl:114:7: with sink 'input_fastas' of type {"type": "array", "items": "File"} source has linkMerge method merge_flattened CWL/pipeline.cwl:83:9: Source 'low_confidence_contigs_genes' of type ["null", "File"] may be incompatible CWL/pipeline.cwl:114:7: with sink 'input_fastas' of type {"type": "array", "items": "File"} source has linkMerge method merge_flattened CWL/pipeline.cwl:84:9: Source 'prophages_contigs_genes' of type ["null", "File"] may be incompatible CWL/pipeline.cwl:114:7: with sink 'input_fastas' of type {"type": "array", "items": "File"} source has linkMerge method merge_flattened CWL/pipeline.cwl:70:9: Source 'high_confidence_contigs' of type ["null", "File"] may be incompatible CWL/pipeline.cwl:154:5: with sink 'high_confidence_contigs' of type "File" CWL/pipeline.cwl:82:9: Source 'high_confidence_contigs_genes' of type ["null", "File"] may be incompatible CWL/pipeline.cwl:163:5: with sink 'high_confidence_faa' of type "File" CWL/pipeline.cwl:71:9: Source 'low_confidence_contigs' of type ["null", "File"] may be incompatible CWL/pipeline.cwl:157:5: with sink 'low_confidence_contigs' of type "File" CWL/pipeline.cwl:83:9: Source 'low_confidence_contigs_genes' of type ["null", "File"] may be incompatible CWL/pipeline.cwl:166:5: with sink 'low_confidence_faa' of type "File" CWL/pipeline.cwl:72:9: Source 'prophages_contigs' of type ["null", "File"] may be incompatible CWL/pipeline.cwl:160:5: with sink 'parse_prophages_contigs' of type "File" CWL/pipeline.cwl:84:9: Source 'prophages_contigs_genes' of type ["null", "File"] may be incompatible CWL/pipeline.cwl:169:5: with sink 'prophages_faa' of type "File" CWL/pipeline.cwl is valid CWL.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.