cfia-ncfad / nf-flu Goto Github PK

View Code? Open in Web Editor NEW

This project forked from peterk87/nf-flu

14.0 14.0 8.0 2.81 MB

Influenza genome analysis Nextflow workflow

License: MIT License

Nextflow 38.51% Python 36.48% Groovy 25.00%

nf-flu's People

Contributors

Stargazers

Watchers

Forkers

fmaguire rpetit3 ric-costa shvartsmanirina mattheww95 nhhaidee xiaoli-dong

nf-flu's Issues

[BUG]: Conda profile not enabling conda

Is there an existing issue for this?

I have searched the existing issues

Description of the Bug/Issue

Conda not actually being used/enabled when using -profile conda

Nextflow command-line

nextflow run CFIA-NCFAD/nf-flu --input samplesheet.csv --platform nanopore -profile conda

Error Message

ERROR ~ Error executing process > 'NF_FLU:NANOPORE:CHECK_SAMPLE_SHEET (1)'

Caused by:
  Process `NF_FLU:NANOPORE:CHECK_SAMPLE_SHEET (1)` terminated with an error exit status (1)

Command executed:

  check_sample_sheet.py samplesheet.csv nanopore samplesheet.fixed.csv

Command exit status:
  1

Command output:
  (empty)

Command error:
  Traceback (most recent call last):
    File "/home/pkruczkiewicz/.nextflow/assets/CFIA-NCFAD/nf-flu/bin/check_sample_sheet.py", line 6, in <module>
      import typer
  ModuleNotFoundError: No module named 'typer'

Workflow Version

a38190e

Nextflow Executor

local

Nextflow Version

23.04.1

Java Version

openjdk version "20.0.1" 2023-04-18
OpenJDK Runtime Environment (build 20.0.1+9)
OpenJDK 64-Bit Server VM (build 20.0.1+9, mixed mode, sharing)

Hardware

local

Operating System (OS)

Arch

Conda/Container Engine

Conda

Additional context

nextflow.config is missing:

profiles {
  conda {
    params.enable_conda = true
    conda.createTimeout = "120 min"
    conda.enabled = true // <-- this, conda not enabled!!!
  }
}

Add custom references FASTA for CI tests

Custom sequences are not currently tested as part of CI. This is a big feature that needs testing.

Illumina subworkflow read mapping/variant calling

BWA MEM or Minimap2 short read mapping
variant calling with Freebayes

Update README and docs for running nf-flu test profiles for Nanopore and Illumina platforms

README is out-of-date and there is no test profile anymore. nf-flu can be run with either a test_illumina or test_nanopore profile. Docs and README need to be updated to reflect this.

See related issue peterk87#18

Is there an existing issue for this?

I have searched the existing issues

Description of the Bug/Issue

Test profile is not working

Nextflow command-line

nextflow run CFIA-NCFAD/nf-flu -profile test,docker

Error Message

N E X T F L O W  ~  version 23.04.3
Pulling CFIA-NCFAD/nf-flu ...
 downloaded from https://github.com/CFIA-NCFAD/nf-flu.git
Unknown configuration profile: 'test'

Workflow Version

3.3.5

Nextflow Executor

local

Nextflow Version

23.04.3.5875

Java Version

11.0.20.1

Hardware

Desktop

Operating System (OS)

Ubuntu 22.04

Conda/Container Engine

Docker

Additional context

No response

[BUG]: The fasta sequence does not match the REF allele at OQ234674.1:1946

Is there an existing issue for this?

I have searched the existing issues

Description of the Bug/Issue

Hello,

I have come across an edge case where the reference used as the top match for a sample contained a degenerate nucleotide causing the error described below. Essentially, clair3 changed the degenerate nucleotide to "N", which resulted in an error in bcftools as the reference allele no longer matched the reference sequence.

I believe I was able to mitigate this issue making two changes:

In the clair3.nf module, I added the following flag to the clair3 command:
- --keep_iupac_bases True
- This tells clair3 not to convert the IUPAC base to N. This, however, resulted in a new bcftools error related to the handling of non-ATGCN characters. during the BCF_FILTER process.
To compensate for this, in the bcftools.nf, I added the following flag to the bcftools norm command:
- -c w
- This corresponds to the bcftools norm --check-ref command, where the w sets it to "warn"; as far as I can tell, this should allow the the degenerate nucleotide to exist, but should not result in any changes to how the results are processed (see: bcftools norm)

In my testing, it has allowed me to successfully process the sample with the offending top match reference, though I would appreciate any feedback as to whether I have overlooked anything with my suggested parameter changes.

Nextflow command-line

nextflow nf-flu_v3.3.2/cfia-ncfad-nf-flu-3.3.2/workflow/main.nf --input IRVC20230720IH1_Part1_Complete_AF25.csv --platform nanopore --outdir IRVC20230720IH1_Part1_Complete_AF25_nf-flu_results --major_allele_fraction 0.25 -profile singularity,slurm -resume

Error Message

ERROR ~ Error executing process > 'NF_FLU:NANOPORE:BCF_CONSENSUS (RV00831-22-IAV|3_PA|OQ234674.1)'

Caused by:
  Process `NF_FLU:NANOPORE:BCF_CONSENSUS (RV00831-22-IAV|3_PA|OQ234674.1)` terminated with an error exit status (255)

Command executed:

  bgzip -c RV00831-22-IAV.Segment_3_PA.OQ234674.1.no_frameshifts.vcf > RV00831-22-IAV.Segment_3_PA.OQ234674.1.no_frameshifts.vcf.gz
  tabix RV00831-22-IAV.Segment_3_PA.OQ234674.1.no_frameshifts.vcf.gz
  
  # get low coverage depth mask BED file by filtering for regions with less than 10X
  zcat RV00831-22-IAV.Segment_3_PA.OQ234674.1.per-base.bed.gz | awk '$4<10' > low_cov.bed
  
  bcftools consensus \
    -f RV00831-22-IAV.Segment_3_PA.OQ234674.1.reference.fasta \
    -m low_cov.bed \
    RV00831-22-IAV.Segment_3_PA.OQ234674.1.no_frameshifts.vcf.gz > RV00831-22-IAV.Segment_3_PA.OQ234674.1.bcftools.consensus.fasta
  
  sed -i -E "s/^>(.*)/>RV00831-22-IAV_3_PA/g" RV00831-22-IAV.Segment_3_PA.OQ234674.1.bcftools.consensus.fasta
  
  cat <<-END_VERSIONS > versions.yml
  "NF_FLU:NANOPORE:BCF_CONSENSUS":
      bcftools: $(bcftools --version 2>&1 | head -n1 | sed 's/^.*bcftools //; s/ .*$//')
  END_VERSIONS

Command exit status:
  255

Command output:
  (empty)

Command error:
  Note: the --sample option not given, applying all records regardless of the genotype
  The fasta sequence does not match the REF allele at OQ234674.1:1946:
     REF .vcf: [TATTCAATAGCCTATATGCATCACCACAATTGGAAGGANTTTCAGCAGAGTC]
     ALT .vcf: [T]
     REF .fa : [TATTCAATAGCCTATATGCATCACCACAATTGGAAGGAYTTTCAGCAGAGTC]AAGAAAACTGCTCCTTATTGTTCAGGCTCTTAGGGACAAACTCGAACCTGGGACTTTTGATCTTGGGGGGCTATATGAAGCAATTGAGGAGTGCCTGATTAATGATCCCTGGGTTTTGCTCAATGCGTCTTGGTTCAACTCCTTCCTGACACATGCACTAAAATAGTTATAGCAGTGCTACTATTTGTTATCCGTACTGTCCAAAAAAGTA

Work dir:
/work/b6/73a925fff13d66411ea7d7776773c1

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details

Workflow Version

v3.3.2; revision: 91a5d05f86

Nextflow Executor

No response

Nextflow Version

23.04.1

Java Version

openjdk version "20-internal" 2023-03-21
OpenJDK Runtime Environment (build 20-internal-adhoc..src)
OpenJDK 64-Bit Server VM (build 20-internal-adhoc..src, mixed mode, sharing)

Hardware

HPC Cluster

Operating System (OS)

No response

Conda/Container Engine

None

Additional context

slurm-27454074_AfterClair3Change.txt
slurm-27454201_OriginalError.txt

[BUG]: no H/N subtypes assigned in subtyping reports with user-specified sequences

Is there an existing issue for this?

I have searched the existing issues

Description of the Bug/Issue

Instead of assigning H/N subtypes based on majority subtype from high % identity and alignment length matches to sequences with such info, the subtyping reports with user-specified sequences where those sequences are the top matching sequences report N/A when an H and N subtype could be assigned easily.

There are also no total counts, proportions matching to certain subtypes, etc.

Nextflow command-line

nextflow run CFIA-NCFAD/nf-flu --input samplesheet.csv --platform nanopore -profile docker --ref_db DB.fasta -resume -r 3.3.2

Error Message

N/A

Workflow Version

3.3.2

Nextflow Executor

local

Nextflow Version

23.04.2

Java Version

No response

Hardware

desktop

Operating System (OS)

No response

Conda/Container Engine

Docker

Additional context

No response

[BUG]: Conda and Mamba profiles broken in 3.3.0

Is there an existing issue for this?

I have searched the existing issues

Description of the Bug/Issue

Conda/Mamba process envs not being created.

Nextflow command-line

nextflow run CFIA-NCFAD/nf-flu -r 3.3.0 -profile <conda/mamba> ...

Error Message

"IRMA" (and other process tools) command not found

Workflow Version

3.3.0

Nextflow Executor

local

Nextflow Version

22.04.5, build 5708 (15-07-2022 16:09 UTC)

Java Version

No response

Hardware

desktop

Operating System (OS)

No response

Conda/Container Engine

Conda

Additional context

No response

Read samplesheet.csv with fields as strong rather than inferred types

Related to #53

See #53 (comment)

More robust handling and preprocessing of user input sequences

Users should be able to submit any sequences in FASTA format and the workflow should figure out if those sequences are valid input. The sequence names should NOT require any special formatting. Currently _Segment is required in the user provided sequence name:

nf-flu/bin/get_blastn_report.py

Line 83 in bdc8942

 df_blast_result["ref_name"] = df_blast_result["stitle"].str.extract('(.+?)_[sS]egment') 

But it is not necessary.

The full user sequence name should be preserved not dropped:

nf-flu/bin/get_blastn_report.py

Line 86 in bdc8942

df_blast_result.drop(columns=["qaccver", "saccver", "stitle"], inplace=True)

The FASTA record description or comment should be preserved instead of stripped away/ignored:

nf-flu/bin/ref_fasta_check.py

Lines 27 to 29 in bdc8942

 seqid, sequence = record.id.strip(), record.seq 

 seq_record_id = re.sub(r"[()\"#/@;:<>{}`+=~|!?,]", "_", seqid) 

 outfile.write(f'>{seq_record_id}\n{sequence}\n')

The code should be:

seqid, desc, seq = rec.id, rec.description, rec.seq
# replace non-word, non-digit, non-period or dash characters
new_seqid = re.sub(r'[^\w.\-]+, '_', seqid)
# remove leading and trailing underscores
new_seqid = re.sub(r'^_|_$', '', new_seqid)
# preserve seq description and document changes in FULL seq name
seq_name = f'{seqid}{" " + desc if desc else ""}'
new_seq_name = f'{new_seqid}{" " + desc if desc else ""}'

A subworkflow should be created to handle validation of user-specified sequences to ensure that they are valid input

Create representative gene sequences DB with non-redundant set of full length gene segment sequences with CD-HIT at 95% cluster threshold (or lower). Sequences with ambiguous bases should be excluded.
User input sequences should all be uppercase ASCII and distinct; sequence names can be duplicated, but will be renamed.
All-against-all Edlib global alignment of user sequences against representative gene segment sequences
Assign each user sequence to a gene segment provided there are less than X differences between user seq and rep seq. If no match, fail immediately with informative message to user that one or more of their sequences do not pass the threshold. This threshold would need to be determined with some testing.
Format user sequence names in pipeline compatible way. The name changes should be documented in a table with 3 columns: sequence index, old name, new name. Duplicated sequence names should be handled with a warning and renamed with appended -{seq index}

Variant calling results should be used instead of BLAST to compare read mapping/variant calling consensus to reference sequences not necessary

Variant calling results are already be produced and could simply be tabulated with bcftools stats to compute number of SNPs, MNPs and indels combined with depth info.

Edlib global alignment would be more appropriate for genome to genome comparison than BLAST local alignment, which can easily show inaccurate mismatch or gap results for a gene segment if there are coverage issues (e.g. no coverage in the middle of a gene segment leading to 2 BLAST alignments).

[BUG]: Unable to create subtyping report due to `NoDataError: empty CSV` in `parse_influenza_blast_results.py`

Is there an existing issue for this?

I have searched the existing issues

Description of the Bug/Issue

Unable to create subtyping report due to NoDataError: empty CSV in parse_influenza_blast_results.py. One of the BLAST results files from BLAST search against user ref seqs was empty causing Polars to throw an exception. This type of thing should be handled more gracefully. The previous version 3.1.6 did not produce this error.

Nextflow command-line

nextflow run CFIA-NCFAD/nf-flu -r 3.2.0 --input samplesheet.csv --platform nanopore -profile docker --ref_db DB.fasta

Error Message

Error executing process > 'NF_FLU:NANOPORE:SUBTYPING_REPORT_BCF_CONSENSUS'

Caused by:
  Process `NF_FLU:NANOPORE:SUBTYPING_REPORT_BCF_CONSENSUS` terminated with an error exit status (1)

Command executed:

  parse_influenza_blast_results.py \
   --threads 1 \
   --flu-metadata genomeset.dat.gz \
   --top 3 \
   --excel-report iav-subtyping-report.xlsx \
   --pident-threshold 0.85 \
   SAMPLE-0239-2.blastn.txt SAMPLE-0239-1.blastn.txt SAMPLE-1370.blastn.txt SAMPLE-0238.blastn.txt SAMPLE-0233.blastn.txt SAMPLE-1096.blastn.txt SAMPLE-0052.blastn.txt
  ln -s .command.log parse_influenza_blast_results.log
  cat <<-END_VERSIONS > versions.yml
  "NF_FLU:NANOPORE:SUBTYPING_REPORT_BCF_CONSENSUS":
     python: $(python --version | sed 's/Python //g')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  │ │                                │   ('mismatch', UInt16),                 │ │
  │ │                                │   ('gapopen', UInt16),                  │ │
  │ │                                │   ('qstart', UInt16),                   │ │
  │ │                                │   ('qend', UInt16),                     │ │
  │ │                                │   ('sstart', UInt16),                   │ │
  │ │                                │   ('send', UInt16),                     │ │
  │ │                                │   ... +6                                │ │
  │ │                                ]                                         │ │
  │ │                       dtypes = {                                         │ │
  │ │                                │   'qaccver': ,             │ │
  │ │                                │   'saccver': ,             │ │
  │ │                                │   'pident': ,            │ │
  │ │                                │   'length': UInt16,                     │ │
  │ │                                │   'mismatch': UInt16,                   │ │
  │ │                                │   'gapopen': UInt16,                    │ │
  │ │                                │   'qstart': UInt16,                     │ │
  │ │                                │   'qend': UInt16,                       │ │
  │ │                                │   'sstart': UInt16,                     │ │
  │ │                                │   'send': UInt16,                       │ │
  │ │                                │   ... +6                                │ │
  │ │                                }                                         │ │
  │ │                     encoding = 'utf8'                                    │ │
  │ │                     eol_char = '\n'                                      │ │
  │ │                   has_header = False                                     │ │
  │ │                ignore_errors = False                                     │ │
  │ │          infer_schema_length = 100                                       │ │
  │ │                            k = 'stitle'                                  │ │
  │ │                   low_memory = False                                     │ │
  │ │ missing_utf8_is_empty_string = False                                     │ │
  │ │                       n_rows = None                                      │ │
  │ │                  null_values = None                                      │ │
  │ │        processed_null_values = None                                      │ │
  │ │                   quote_char = '"'                                       │ │
  │ │                      rechunk = True                                      │ │
  │ │               row_count_name = None                                      │ │
  │ │             row_count_offset = 0                                         │ │
  │ │                         self =                            │ │
  │ │                    separator = '\t'                                      │ │
  │ │                    skip_rows = 0                                         │ │
  │ │       skip_rows_after_header = 0                                         │ │
  │ │                       source = 'SAMPLE-0239-2.blastn.txt'    │ │
  │ │              try_parse_dates = False                                     │ │
  │ │                            v =                              │ │
  │ │            with_column_names = .with_column_names at    │ │
  │ │                                0x7f8596a0d1b0>                           │ │
  │ ╰──────────────────────────────────────────────────────────────────────────╯ │
  ╰──────────────────────────────────────────────────────────────────────────────╯
  NoDataError: empty CSV

Workflow Version

3.2.0

Nextflow Executor

local

Nextflow Version

22.04.5, build 5708 (15-07-2022 16:09 UTC)

Java Version

No response

Hardware

local

Operating System (OS)

No response

Conda/Container Engine

Docker

Additional context

No response

[BUG]: cat: can't open 'GEN23-RTPCR-0829-NTC-3-1b/amended_consensus/*.fa': No such file or directory

Is there an existing issue for this?

I have searched the existing issues

Description of the Bug/Issue

Hello,

I have come across an edge-case related to the sequencing of negative controls samples.

On occasion, the read set for an NTC sample exceeds the minimum number of reads (default: 100), and are allowed to be processed by the pipeline. The issue is related to IRMA and how it handles samples with low numbers of reads.

The pipeline will crash when IRMA has an output as follows:

Loading config file 'irma_config.sh'
[2023-09-11 14:32:12] IRMA/FLU-minion started run 'GEN23-RTPCR-0829-NTC-3-1b'
[2023-09-11 14:32:12] IRMA/FLU-minion found 1306.5T free space, only needed ~6.0M
[2023-09-11 14:32:12] IRMA/FLU-minion pre-processed
[2023-09-11 14:32:12] IRMA/FLU-minion R1 started (253)
[2023-09-11 14:32:13] IRMA/FLU-minion R1 all-match with BLAT finished
[2023-09-11 14:32:13] IRMA/FLU-minion R1 consolidated & cleaned
[2023-09-11 14:32:13] IRMA/FLU-minion R1 aborted, no matches found
[2023-09-11 14:32:13] IRMA/FLU-minion converted back to fastq
[2023-09-11 14:32:13] IRMA/FLU-minion saved unmatched read patterns
[2023-09-11 14:32:13] IRMA/FLU-minion skipping final assembly, no reference files found
[2023-09-11 14:32:15] IRMA/FLU-minion moving project
[2023-09-11 14:32:15] IRMA/FLU-minion finished!

Essentially there are no influenza reads, so it doesn't generate any outputs, which appears to affect lines 38-42 in the irma.nf file:

IRMA $irma_module $reads $meta.id

  if [ -d "${meta.id}/amended_consensus/" ]; then
    cat ${meta.id}/amended_consensus/*.fa > ${meta.id}.irma.consensus.fasta
  fi

Resulting in the following error that crashes the pipeline:

cat: can't open 'GEN23-RTPCR-0829-NTC-3-1b/amended_consensus/*.fa': No such file or directory

However, when IRMA encounters a sample with, I'm assuming, with at least one RP or read the outputs generared are enough to allowe the pipeline to proceed:

Loading config file 'irma_config.sh'
[2023-09-11 14:32:18] IRMA/FLU-minion started run 'GEN23-RTPCR-0829-NTC-3-1a'
[2023-09-11 14:32:18] IRMA/FLU-minion found 1306.5T free space, only needed ~5.7M
[2023-09-11 14:32:18] IRMA/FLU-minion pre-processed
[2023-09-11 14:32:18] IRMA/FLU-minion R1 started (164)
[2023-09-11 14:32:18] IRMA/FLU-minion R1 all-match with BLAT finished
[2023-09-11 14:32:19] IRMA/FLU-minion R1 consolidated & cleaned
[2023-09-11 14:32:19] IRMA/FLU-minion R1 sorted using BLAT
[2023-09-11 14:32:19] IRMA/FLU-minion R1 aborted, found fewer than 3 RPs or 3 reads for all templates
[2023-09-11 14:32:19] IRMA/FLU-minion moving project
[2023-09-11 14:32:19] IRMA/FLU-minion finished!

For now, to mitigate this problem, I added errorStrategy = 'ignore' to the modules_nanopore.config:

withName: 'IRMA' {
    // increased job time limit for IRMA process to accommodate large samples
    time = '24h'
    errorStrategy = 'ignore'
    publishDir = [
      [
        path: { "${params.outdir}/irma"},
        saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
        mode: params.publish_dir_mode
      ],
      [
        path: { "${params.outdir}/consensus/irma/" },
        pattern: "*.irma.consensus.fasta",
        mode: params.publish_dir_mode
      ]
    ]
  }

However, I'm not sure if this is the most robust way around this particular issue.

Thank you very much for your support.

Nextflow command-line

nextflow nf-flu_v3.3.4/cfia-ncfad-nf-flu-3.3.4/workflow/main.nf --input IRVC20230831IHN-1.csv --platform nanopore --outdir IRVC20230831IHN-1_nf-flu_results --major_allele_fraction 0.25 -profile singularity,slurm -resume

Error Message

ERROR ~ Error executing process > 'NF_FLU:NANOPORE:IRMA (GEN23-RTPCR-0829-NTC-3-1b)'

Caused by:
  Process `NF_FLU:NANOPORE:IRMA (GEN23-RTPCR-0829-NTC-3-1b)` terminated with an error exit status (1)

Command executed:

  touch irma_config.sh
  echo 'SINGLE_LOCAL_PROC=16' >> irma_config.sh
  echo 'DOUBLE_LOCAL_PROC=8' >> irma_config.sh
  # default tmp in current working directory instead of defaulting to /tmp 
  # which may be restricted in size on HPC clusters
  echo 'ALLOW_TMP=1' >> irma_config.sh
  echo 'TMP=$PWD' >> irma_config.sh
  if [ true ]; then
    echo 'DEL_TYPE="NNN"' >> irma_config.sh
    echo 'ALIGN_PROG="BLAT"' >> irma_config.sh
  fi
  
  IRMA FLU-minion GEN23-RTPCR-0829-NTC-3-1b.merged.fastq.gz GEN23-RTPCR-0829-NTC-3-1b
  
  if [ -d "GEN23-RTPCR-0829-NTC-3-1b/amended_consensus/" ]; then
    cat GEN23-RTPCR-0829-NTC-3-1b/amended_consensus/*.fa > GEN23-RTPCR-0829-NTC-3-1b.irma.consensus.fasta
  fi
  ln -s .command.log GEN23-RTPCR-0829-NTC-3-1b.irma.log
  cat <<-END_VERSIONS > versions.yml
  "NF_FLU:NANOPORE:IRMA":
     IRMA: $(IRMA | head -n1 | sed -E 's/^Iter.*IRMA\), v(\S+) .*/\1/')
  END_VERSIONS

Command exit status:
  1

Command output:
  Loading config file 'irma_config.sh'
  [2023-09-11 16:06:21]	IRMA/FLU-minion started run 'GEN23-RTPCR-0829-NTC-3-1b'
  [2023-09-11 16:06:21]	IRMA/FLU-minion found 1305.9T free space, only needed ~6.0M
  [2023-09-11 16:06:21]	IRMA/FLU-minion pre-processed
  [2023-09-11 16:06:21]	IRMA/FLU-minion R1 started (253)
  [2023-09-11 16:06:22]	IRMA/FLU-minion R1 all-match with BLAT finished
  [2023-09-11 16:06:23]	IRMA/FLU-minion R1 consolidated & cleaned
  [2023-09-11 16:06:23]	IRMA/FLU-minion R1 aborted, no matches found
  [2023-09-11 16:06:23]	IRMA/FLU-minion converted back to fastq
  [2023-09-11 16:06:23]	IRMA/FLU-minion saved unmatched read patterns
  [2023-09-11 16:06:23]	IRMA/FLU-minion skipping final assembly, no reference files found
  [2023-09-11 16:06:24]	IRMA/FLU-minion moving project
  [2023-09-11 16:06:24]	IRMA/FLU-minion finished!

Command error:
  cat: can't open 'GEN23-RTPCR-0829-NTC-3-1b/amended_consensus/*.fa': No such file or directory

Workflow Version

3.3.4, revision: bda4dc7

Nextflow Executor

No response

Nextflow Version

23.04.1

Java Version

openjdk version "17.0.3-internal" 2022-04-19
OpenJDK Runtime Environment (build 17.0.3-internal+0-adhoc..src)
OpenJDK 64-Bit Server VM (build 17.0.3-internal+0-adhoc..src, mixed mode, sharing)

Hardware

HPC Cluster

Operating System (OS)

Distributor ID: CentOS Description: CentOS Linux release 7.9.2009 (Core) Release: 7.9.2009 Codename: Core

Conda/Container Engine

Singularity

Additional context

slurm-27894603_mod.txt
nextflow_mod.log

[BUG]: ComputeError on SUBTYPING_REPORT

Is there an existing issue for this?

I have searched the existing issues

Description of the Bug/Issue

On running our own flu samples on the latest nf-flu release, we're getting the following error:

ComputeError: ValueError: Remapping keys for map_dict could not be converted to
  Utf8 without losing values in the conversion.

which seems to happen when querying the metadata file with polars.
I can't share the .gz files that we ran, but these same files ran successfully on previous nf-flu versions.
Your test samples run fine on the same versions.

Nextflow command-line

nextflow run main.nf -config ~/conf/credentials.config -profile docker --input ~/samplesheets/test1.csv --max_memory 9.GB --max_cpus 6

Error Message

Oct-26 13:01:28.571 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 9; name: NF_FLU:ILLUMINA:SUBTYPING_REPORT (1); status: COMPLETED; exit: 1; error: -; workDir: /nf-flu/work/95/e1a429083f7a0cca807a0a059be076]
Oct-26 13:01:28.571 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=NF_FLU:ILLUMINA:SUBTYPING_REPORT (1); work-dir=/nf-flu/work/95/e1a429083f7a0cca807a0a059be076
  error [nextflow.exception.ProcessFailedException]: Process `NF_FLU:ILLUMINA:SUBTYPING_REPORT (1)` terminated with an error exit status (1)
Oct-26 13:01:28.579 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NF_FLU:ILLUMINA:SUBTYPING_REPORT (1)'

Caused by:
  Process `NF_FLU:ILLUMINA:SUBTYPING_REPORT (1)` terminated with an error exit status (1)

Command executed:

  parse_influenza_blast_results.py \
   --flu-metadata 2023-06-14-NCBI-Viruses-Orthomyxoviridae_utf8-influenza.csv \
   --top 3 \
   --excel-report nf-flu-subtyping-report.xlsx \
   --pident-threshold 0.85 \
   --samplesheet samplesheet.fixed.csv \
   442878.blastn.txt
  
  ln -s .command.log parse_influenza_blast_results.log
  
  cat <<-END_VERSIONS > versions.yml
  "NF_FLU:ILLUMINA:SUBTYPING_REPORT":
     python: $(python --version | sed 's/Python //g')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  │ │              │        ┆            ┆ .1       ┆          ┆   ┆ sapiens   │ │
  │ │              ┆ Missouri   ┆            ┆            │                    │ │
  │ │              │ …      ┆ …          ┆ …        ┆ …        ┆ … ┆ …         │ │
  │ │              ┆ …          ┆ …          ┆ …          │                    │ │
  │ │              │ 442878 ┆ 7          ┆ OQ462477 ┆ H1N1     ┆ … ┆ Homo      │ │
  │ │              ┆ USA:       ┆ 2022-12-05 ┆ 2023-02-26 │                    │ │
  │ │              │        ┆            ┆ .1       ┆          ┆   ┆ sapiens   │ │
  │ │              ┆ Maryland   ┆            ┆            │                    │ │
  │ │              │ 442878 ┆ 8          ┆ OX422577 ┆ nan      ┆ … ┆ Homo      │ │
  │ │              ┆ United     ┆ 2022-12-20 ┆ 2023-02-10 │                    │ │
  │ │              │        ┆            ┆ .1       ┆          ┆   ┆ sapiens   │ │
  │ │              ┆ Kingdom    ┆            ┆            │                    │ │
  │ │              │ 442878 ┆ 8          ┆ OX436875 ┆ nan      ┆ … ┆ Homo      │ │
  │ │              ┆ United     ┆ 2023-01-01 ┆ 2023-02-21 │                    │ │
  │ │              │        ┆            ┆ .1       ┆          ┆   ┆ sapiens   │ │
  │ │              ┆ Kingdom    ┆            ┆            │                    │ │
  │ │              │ 442878 ┆ 8          ┆ OX442452 ┆ nan      ┆ … ┆ Homo      │ │
  │ │              ┆ United     ┆ 2023-01-01 ┆ 2023-03-03 │                    │ │
  │ │              │        ┆            ┆ .1       ┆          ┆   ┆ sapiens   │ │
  │ │              ┆ Kingdom    ┆            ┆            │                    │ │
  │ │              └────────┴────────────┴──────────┴──────────┴───┴─────────… │ │
  │ ╰──────────────────────────────────────────────────────────────────────────╯ │
  │                                                                              │
  │ /usr/local/lib/python3.10/site-packages/polars/lazyframe/frame.py:1606 in    │
  │ collect                                                                      │
  │                                                                              │
  │   1603 │   │   │   common_subplan_elimination,                               │
  │   1604 │   │   │   streaming,                                                │
  │   1605 │   │   )                                                             │
  │ ❱ 1606 │   │   return wrap_df(ldf.collect())                                 │
  │   1607 │                                                                     │
  │   1608 │   def sink_parquet(                                                 │
  │   1609 │   │   self,                                                         │
  │                                                                              │
  │ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
  │ │ common_subplan_elimination = False                                       │ │
  │ │                        ldf = <builtins.PyLazyFrame object at             │ │
  │ │                              0x7f12d0dd9530>                             │ │
  │ │            no_optimization = True                                        │ │
  │ │         predicate_pushdown = False                                       │ │
  │ │        projection_pushdown = False                                       │ │
  │ │                       self = <polars.LazyFrame object at 0x7F12D0D5ED10> │ │
  │ │        simplify_expression = True                                        │ │
  │ │             slice_pushdown = False                                       │ │
  │ │                  streaming = False                                       │ │
  │ │              type_coercion = True                                        │ │
  │ ╰──────────────────────────────────────────────────────────────────────────╯ │
  ╰──────────────────────────────────────────────────────────────────────────────╯
  ComputeError: ValueError: Remapping keys for map_dict could not be converted to 
  Utf8 without losing values in the conversion.

Workflow Version

v3.3.5 28183a8

Nextflow Executor

local

Nextflow Version

23.10.0

Java Version

openjdk version "18.0.2-ea" 2022-07-19
OpenJDK Runtime Environment (build 18.0.2-ea+9-Ubuntu-222.04)
OpenJDK 64-Bit Server VM (build 18.0.2-ea+9-Ubuntu-222.04, mixed mode, sharing)

Hardware

Desktop

Operating System (OS)

Ubuntu 22.04.2 LTS (GNU/Linux 5.15.90.1-microsoft-standard-WSL2 x86_64)

Conda/Container Engine

Docker

Additional context

No response

Add FASTQ preprocessing step to prevent IRMA from silently failing on paired-end Illumina reads when "1:N:..."/"2:N:..." are not present in the read headers

Problem description

IRMA assembly will silently fail when Illumina paired-end reads don't have "1:N:..."/"2:N:..." in their read headers. Reads straight off an Illumina sequencer will not have this problem, but reads extracted from NCBI SRA files with fasterq-dump, some modified/filtered reads and synthetic reads may have issues.

IRMA will log "Irregular header for fastQ read pairs" and output empty consensus sequences.

A process should be added to append "1:N:..."/"2:N:..." to forward/reverse reads if those reads do not have that text in their read headers. Basically these command-lines wrapped in a Nextflow process:

# if paired-end Illumina reads, check for 1:N... and 2:N
fwd_reads_N_count=$(zcat ${reads[0]} | grep -c "^@.* [12]:N:.*")
rev_reads_N_count=$(zcat ${reads[1]} | grep -c "^@.* [12]:N:.*")
if [[ $fwd_reads_N_count == 0 && $rev_reads_N_count == 0 ]]; then
  zcat ${reads[0]} | sed -r 's/^(@.*)/\1 1:N:0./' | pigz -ck > ${meta.id}_R1.fixed.fastq.gz
  zcat ${reads[1]} | sed -r 's/^(@.*)/\1 2:N:0./' | pigz -ck > ${meta.id}_R2.fixed.fastq.gz
else
  # reads okay, symlink?
fi

[BUG]: IRMA/FLU-minion ERROR: needed ~38765.72M to execute, but only 26447.94M available on disk

Is there an existing issue for this?

I have searched the existing issues

Description of the Bug/Issue

Hello,

On rare occasion we have sequencing runs comprising relatively few samples resulting in large input .fastq files (>2GB compressed) that evidently cause IRMA to fail. I have attempted to modify the "base.config" file to increase the "withLabel:process_high" parameter (which governs the IRMA module) to >32GB to mitigate this issue

e.g.,

However, the IRMA command executed doesn't appear to be influenced by changing parameters in the base.config:

e.g.,

Of course it is possible to down sample the reads, though it would be preferable if I didn't have to do that. I'm not sure if there are any other levers I can pull within the pipeline to overcome this issue. Any help would be appreciated.

Nextflow command-line

sbatch -c 2 --mem=4GB -p OutbreakResponse --wrap="nextflow ${WORKFLOW_DIR} --input ${INPUT_SHEET} --platform ${PLATFORM} ${DATABASE} --outdir ${OUTDIR} -profile singularity,slurm -resume"

Note: platform = nanopore, and in this particular case, no user-defined database was used.

Error Message

Oops... Pipeline execution stopped with the following message: Loading config file 'irma_config.sh'
[2023-05-15 10:16:54]	IRMA/FLU-minion started run 'GEN23-0018-neat'
[2023-05-15 10:16:54]	IRMA/FLU-minion ERROR: needed ~38765.72M to execute, but only 26447.94M available on disk
[2023-05-15 10:16:54]	IRMA/FLU-minion ABORTED run: GEN23-0018-neat
[f7/e4e21a] NOTE: Process `NF_FLU:NANOPORE:IRMA (GEN23-0018-neat)` terminated with an error exit status (1) -- Execution is retried (1)
[ed/a49ef6] NOTE: Process `NF_FLU:NANOPORE:IRMA (GEN23-0018-neat)` terminated with an error exit status (1) -- Execution is retried (2)
Error executing process > 'NF_FLU:NANOPORE:IRMA (GEN23-0018-neat)'

Caused by:
  Process `NF_FLU:NANOPORE:IRMA (GEN23-0018-neat)` terminated with an error exit status (1)

Command executed:

  touch irma_config.sh
  echo 'SINGLE_LOCAL_PROC=16' >> irma_config.sh
  echo 'DOUBLE_LOCAL_PROC=8' >> irma_config.sh
  if [ true ]; then
    echo 'DEL_TYPE="NNN"' >> irma_config.sh
    echo 'ALIGN_PROG="BLAT"' >> irma_config.sh
  fi
  
  IRMA FLU-minion GEN23-0018-neat.merged.fastq.gz GEN23-0018-neat
  
  if [ -d "GEN23-0018-neat/amended_consensus/" ]; then
    cat GEN23-0018-neat/amended_consensus/*.fa > GEN23-0018-neat.irma.consensus.fasta
  fi
  ln -s .command.log GEN23-0018-neat.irma.log
  cat <<-END_VERSIONS > versions.yml
  "NF_FLU:NANOPORE:IRMA":
     IRMA: $(IRMA | head -n1 | sed -E 's/^Iter.*IRMA\), v(\S+) .*/\1/')
  END_VERSIONS

Command exit status:
  1

Command output:
  Loading config file 'irma_config.sh'
  [2023-05-15 10:16:54]	IRMA/FLU-minion started run 'GEN23-0018-neat'
  [2023-05-15 10:16:54]	IRMA/FLU-minion ERROR: needed ~38765.72M to execute, but only 26447.94M available on disk
  [2023-05-15 10:16:54]	IRMA/FLU-minion ABORTED run: GEN23-0018-neat

Command wrapper:
  Loading config file 'irma_config.sh'
  [2023-05-15 10:16:54]	IRMA/FLU-minion started run 'GEN23-0018-neat'
  [2023-05-15 10:16:54]	IRMA/FLU-minion ERROR: needed ~38765.72M to execute, but only 26447.94M available on disk
  [2023-05-15 10:16:54]	IRMA/FLU-minion ABORTED run: GEN23-0018-neat

Work dir:
 /path/to/workdir/IRVC20230417IHN_analysis/20230417_samples_nf-flu_results/work/a5/67d3b6c885164cd99afcf1598e54ef

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

Workflow Version

3.1.2; revision: 9473cbaed9

Nextflow Executor

slurm

Nextflow Version

22.10.1

Java Version

openjdk version "17.0.3-internal" 2022-04-19
OpenJDK Runtime Environment (build 17.0.3-internal+0-adhoc..src)
OpenJDK 64-Bit Server VM (build 17.0.3-internal+0-adhoc..src, mixed mode, sharing)

Hardware

HPC Cluster

Operating System (OS)

Distributor ID: CentOS Description: CentOS Linux release 7.9.2009 (Core) Release: 7.9.2009 Codename: Core

Conda/Container Engine

Singularity

Additional context

No response

[BUG]: NoDataError: empty CSV

Is there an existing issue for this?

I have searched the existing issues

Description of the Bug/Issue

Hello,

When performing analysis on a run today, I got an error on the sub-typing report generation step near the end of the workflow. The error said "NoDataError: empty CSV" and this was a sample that did not amplify well at all, so I assume the issue is that there is poor quality data associated with the sample and this caused an issue for sub-typing report generation. When I repeated the analysis with v3.1.6, the workflow went to completion with no problem (I ran it exactly the same except with -r 3.1.6).

Thanks,
Mat

Nextflow command-line

nextflow run CFIA-NCFAD/nf-flu -r 3.2.0 --input samplesheet_20230707_all.csv --platform nanopore -profile docker --ref_db /Zarls/users/Mat/2021-22_AIV-Outbreak_WGS-DB/230630_AIV-Outbreak-DB.fasta --outdir results_all_final

Error Message

Workflow execution completed unsuccessfully!

The exit status of the task that caused the workflow execution to fail was: 1.

The full error message was:

Error executing process > 'NF_FLU:NANOPORE:SUBTYPING_REPORT_BCF_CONSENSUS'

Caused by:
  Process `NF_FLU:NANOPORE:SUBTYPING_REPORT_BCF_CONSENSUS` terminated with an error exit status (1)

Command executed:

  parse_influenza_blast_results.py \
   --threads 1 \
   --flu-metadata genomeset.dat.gz \
   --top 3 \
   --excel-report iav-subtyping-report.xlsx \
   --pident-threshold 0.85 \
   WIN-AH-2023-FAV-0239-2-OS.blastn.txt WIN-AH-2023-FAV-0239-1-OS.blastn.txt WIN-AH-2022-FAV-1370-5-1ce4dpi-rpt.blastn.txt WIN-AH-2023-FAV-0238-OS.blastn.txt WIN-AH-2023-FAV-0233-1ce2dpi-rpt.blastn.txt WIN-AH-2022-FAV-1096-14-1ce4dpi.blastn.txt WIN-AH-2023-OTH-0052-6-OS.blastn.txt
  ln -s .command.log parse_influenza_blast_results.log
  cat <<-END_VERSIONS > versions.yml
  "NF_FLU:NANOPORE:SUBTYPING_REPORT_BCF_CONSENSUS":
     python: $(python --version | sed 's/Python //g')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  │ │                                │   ('mismatch', UInt16),                 │ │
  │ │                                │   ('gapopen', UInt16),                  │ │
  │ │                                │   ('qstart', UInt16),                   │ │
  │ │                                │   ('qend', UInt16),                     │ │
  │ │                                │   ('sstart', UInt16),                   │ │
  │ │                                │   ('send', UInt16),                     │ │
  │ │                                │   ... +6                                │ │
  │ │                                ]                                         │ │
  │ │                       dtypes = {                                         │ │
  │ │                                │   'qaccver': ,             │ │
  │ │                                │   'saccver': ,             │ │
  │ │                                │   'pident': ,            │ │
  │ │                                │   'length': UInt16,                     │ │
  │ │                                │   'mismatch': UInt16,                   │ │
  │ │                                │   'gapopen': UInt16,                    │ │
  │ │                                │   'qstart': UInt16,                     │ │
  │ │                                │   'qend': UInt16,                       │ │
  │ │                                │   'sstart': UInt16,                     │ │
  │ │                                │   'send': UInt16,                       │ │
  │ │                                │   ... +6                                │ │
  │ │                                }                                         │ │
  │ │                     encoding = 'utf8'                                    │ │
  │ │                     eol_char = '\n'                                      │ │
  │ │                   has_header = False                                     │ │
  │ │                ignore_errors = False                                     │ │
  │ │          infer_schema_length = 100                                       │ │
  │ │                            k = 'stitle'                                  │ │
  │ │                   low_memory = False                                     │ │
  │ │ missing_utf8_is_empty_string = False                                     │ │
  │ │                       n_rows = None                                      │ │
  │ │                  null_values = None                                      │ │
  │ │        processed_null_values = None                                      │ │
  │ │                   quote_char = '"'                                       │ │
  │ │                      rechunk = True                                      │ │
  │ │               row_count_name = None                                      │ │
  │ │             row_count_offset = 0                                         │ │
  │ │                         self =                            │ │
  │ │                    separator = '\t'                                      │ │
  │ │                    skip_rows = 0                                         │ │
  │ │       skip_rows_after_header = 0                                         │ │
  │ │                       source = 'WIN-AH-2023-FAV-0239-2-OS.blastn.txt'    │ │
  │ │              try_parse_dates = False                                     │ │
  │ │                            v =                              │ │
  │ │            with_column_names = .with_column_names at    │ │
  │ │                                0x7f8596a0d1b0>                           │ │
  │ ╰──────────────────────────────────────────────────────────────────────────╯ │
  ╰──────────────────────────────────────────────────────────────────────────────╯
  NoDataError: empty CSV

Work dir:
  /home/CSCScience.ca/mfisher/Desktop/Temp/2023-07-07-AIV-Diagnostic-Nanopore-Rapid96-Miso-Run-657/work/59/ef44f96032897f47586da09bdf1881

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

Workflow Version

v3.2.0

Nextflow Executor

local

Nextflow Version

22.04.5

Java Version

No response

Hardware

Desktop

Operating System (OS)

Ubuntu 18.04.6 LTS

Conda/Container Engine

Docker

Additional context

No response

[BUG]: Error executing process > 'NF_FLU:ILLUMINA:SUBTYPING_REPORT (1)'

Is there an existing issue for this?

I have searched the existing issues

Description of the Bug/Issue

Hi all,

First, thanks for all the work being done on this pipeline.

I'm having an issue running the pipeline at the SUBTYPING REPORT step where it throws an "Illegal instruction" error.

The same issue happens when using either Docker (24.0.6) or Podman (3.4.4), I haven't tried other containers yet.

I suspect it may be related to a hardware compatibility issue, but I thought I'd post here to see if anyone has come across this as well. The sever I am running is an older Dell T7500 with 6-core Intel Xeon x5650. (Note: This processor does not have AVX support, which I think may be causing the issue.)

This issue happens with any samples I've run so far, either FluA or FluB.

Thanks in advance!

Nextflow command-line

nextflow run CFIA-NCFAD/nf-flu --input test_samplesheet_ab.csv --platform illumina --outdir testruns/test_a -profile podman

Error Message

ERROR ~ Error executing process > 'NF_FLU:ILLUMINA:SUBTYPING_REPORT (1)'

Caused by:
  Process `NF_FLU:ILLUMINA:SUBTYPING_REPORT (1)` terminated with an error exit status (132)

Command executed:

  parse_influenza_blast_results.py \
   --flu-metadata 41415333-influenza.csv \
   --top 3 \
   --excel-report nf-flu-subtyping-report.xlsx \
   --pident-threshold 0.85 \
   --samplesheet samplesheet.fixed.csv \
   FluB-pB-040523-MM00001U-Qc.blastn.txt

  ln -s .command.log parse_influenza_blast_results.log

  cat <<-END_VERSIONS > versions.yml
  "NF_FLU:ILLUMINA:SUBTYPING_REPORT":
     python: $(python --version | sed 's/Python //g')
  END_VERSIONS

Command exit status:
  132

Command output:
  (empty)

Command error:
  .command.sh: line 8:    23 Illegal instruction     (core dumped) parse_influenza_blast_results.py --flu-metadata 41415333-influenza.csv --top 3 --excel-report nf-flu-subtyping-report.x
lsx --pident-threshold 0.85 --samplesheet samplesheet.fixed.csv FluB-pB-040523-MM00001U-Qc.blastn.txt

Work dir:
  /home/vitalite-dev/nf-flu/work/ea/b95127acda6a0a5e626cc44dc3b0a2

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details

Workflow Version

Workflow 3.3.4, revision: bda4dc7

Nextflow Executor

local

Nextflow Version

23.04.3

Java Version

openjdk version "11.0.20.1" 2023-08-24
OpenJDK Runtime Environment (build 11.0.20.1+1-post-Ubuntu-0ubuntu122.04)
OpenJDK 64-Bit Server VM (build 11.0.20.1+1-post-Ubuntu-0ubuntu122.04, mixed mode, sharing)

Hardware

Dell T7500

Operating System (OS)

Ubuntu 22.04

Conda/Container Engine

Podman

Additional context

nextflow.log

Update Influenza sequence DB

Problem description

Data at https://ftp.ncbi.nih.gov/genomes/INFLUENZA/ is out of date. New sequences since 2020-10-13 are available elsewhere on NCBI, e.g. https://ftp.ncbi.nlm.nih.gov/genomes/Viruses/AllNuclMetadata/

nf-flu should be using an updated DB of IAV and IBV seqs from NCBI.

Compatibility with cloud storage paths

Problem description

Hi!

I noticed that the check_sample_sheet.py is not offering compatibility with cloud storage paths for the samples (such as az:// for azure or s3:// for AWS buckets). We simply changed line 30:
if p.startswith("http") or p.startswith("ftp"):

to:
if p.startswith("http") or p.startswith("ftp") or p.startswith("az://") or p.startswith("s3://") or p.startswith("gs://"):

and it seems to access the samples successfully.
Thank you!

[Question]: Support for Clair3 model r1041_e82_400bps_sup_g615 using R10.4.1 flowcell

Is there an existing issue for this?

I have searched the existing issues

Description of the Bug/Issue

Hi @peterk87,

We would like to ask whether the r1041_e82_400bps_sup_g615 model is compatible with the --clair3_variant_model option? Thank you very much!

Best regards,
Eddie

Nextflow command-line

Error Message

Workflow Version

v3.1.3

Nextflow Executor

No response

Nextflow Version

21.10.4.5656

Java Version

No response

Hardware

Desktop

Operating System (OS)

No response

Conda/Container Engine

Docker

Additional context

No response

Optional dehosting

Problem description

Implement dehosting of reads prior to analysis with Kraken2 with a few common host indexes that could be downloaded from a fast place, e.g. human T2T, chicken, pig. Maybe an index with all 3?

Related to #49

Clair3 not variant calling ends of each segment

Clair3 does not call 16 bp at each end of a reference sequence (HKU-BAL/Clair3#257)

Supplement Clair3 calls with Bcftools mpileup/call?

Emit concatenated, re-named .fastq files into subdirectory within the nf-flu results directory

Hello,

I was wondering about the feasibility of emitting the concatenated, re-named .fastq files into a subdirectory (e.g., fastq_files) within the nf-flu results directory. This would facilitate downstream processes such as uploading to various repositories (e.g., IRIDA).

Thank you for your consideration.

[BUG]: Illumina workflow failure on SRA sample ERR3338653

Is there an existing issue for this?

I have searched the existing issues

Description of the Bug/Issue

Cloned this github repo
Downloaded the influenza database and metadata from NCBI as instructed in the repo README
Downloaded SRA sample ERR3338653
Checked number of reads in the downloaded fastqs (125,675 reads in each direction)
Ran the pipeline using the command below
Observed that the pipeline failed in the BLAST_BLASTN process.

Nextflow command-line

nextflow run main.nf \
         -profile conda \
         --use_mamba \
         --ncbi_influenza_fasta ${HOME}/code/nf-flu/test_input/db/influenza.fna.gz \
         --ncbi_influenza_metadata ${HOME}/code/nf-flu/test_input/db/genomeset.dat.gz \
         --input ${HOME}/code/nf-flu/test_input/samplesheet_illumina.csv \
         --platform illumina \
         --outdir ${HOME}/code/nf-flu/test_output/illumina \
         -with-trace ${HOME}/code/nf-flu/trace_illumina.tsv \
         -with-report ${HOME}/code/nf-flu/report_illumina.html \
         -work-dir ${HOME}/scratch/work-nf-flu

Error Message

executor >  slurm (5)
[aa/85af98] process > NF_FLU:ILLUMINA:GUNZIP_NCBI_FLU_FASTA (influenza.fna.gz) [100%] 1 of 1 ✔
[99/85ad49] process > NF_FLU:ILLUMINA:BLAST_MAKEBLASTDB (influenza.fna)        [100%] 1 of 1 ✔
[88/56f970] process > NF_FLU:ILLUMINA:CHECK_SAMPLE_SHEET (1)                   [100%] 1 of 1 ✔
[-        ] process > NF_FLU:ILLUMINA:CAT_FASTQ                                -
[09/c5a1d9] process > NF_FLU:ILLUMINA:IRMA (ERR3338653)                        [100%] 1 of 1 ✔
[94/039244] process > NF_FLU:ILLUMINA:BLAST_BLASTN (ERR3338653)                [100%] 1 of 1, failed: 1 ✘
[-        ] process > NF_FLU:ILLUMINA:SUBTYPING_REPORT                         -
Oops... Pipeline execution stopped with the following message: BLAST engine error: Warning: Sequence contains no data Warning: Sequence contains no data Warning: Sequence contains no data Warning: Sequence co\ntains no data Warning: Sequence contains no data Warning: Sequence contains no data Warning: Sequence contains no data Warning: Sequence contains no data
Error executing process > 'NF_FLU:ILLUMINA:BLAST_BLASTN (ERR3338653)'                                                                                                                                                                                                                                                                                                                                                             Caused by:                                                                                                                                                                                                         Process `NF_FLU:ILLUMINA:BLAST_BLASTN (ERR3338653)` terminated with an error exit status (3)                                                                                                                                                                                                                                                                                                                                    Command executed:                                                                                                                                                                                                                                                                                                                                                                                                                   DB=`find -L ./ -name "*.ndb" | sed 's/.ndb//'`                                                                                                                                                                   blastn \                                                                                                                                                                                                             -num_threads 4 \                                                                                                                                                                                                 -db $DB \                                                                                                                                                                                                        -query ERR3338653.irma.consensus.fasta \                                                                                                                                                                         -outfmt "6 qaccver saccver pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen qcovs stitle" -num_alignments 1000000 -evalue 1e-6 \                                                 -out ERR3338653.blastn.txt                                                                                                                                                                                   cat <<-END_VERSIONS > versions.yml                                                                                                                                                                               "NF_FLU:ILLUMINA:BLAST_BLASTN":                                                                                                                                                                                      blast: $(blastn -version 2>&1 | sed 's/^.*blastn: //; s/ .*$//')                                                                                                                                             END_VERSIONS                                                                                                                                                                                                                                                                                                                                                                                                                    Command exit status:                                                                                                                                                                                               3                                                                                                                                                                                                                                                                                                                                                                                                                               Command output:                                                                                                                                                                                                    (empty)
                                                                                                                                                                                                                 Command error:                                                                                                                                                                                                     BLAST engine error: Warning: Sequence contains no data Warning: Sequence contains no data Warning: Sequence contains no data Warning: Sequence contains no data Warning: Sequence contains no data Warning: Se\quence contains no data Warning: Sequence contains no data Warning: Sequence contains no data                                                                                                                                                                                                                                                                                                                                     Work dir:                                                                                                                                                                                                          /home/dfornika/scratch/work-nf-flu/94/0392446a487ae16a1cda1728cc211b                                                                                                                                                                                                                                                                                                                                                            Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

Workflow Version

commit f0eb199

Nextflow Executor

slurm

Nextflow Version

21.10.4

Java Version

openjdk version "11.0.18" 2023-01-17 LTS
OpenJDK Runtime Environment (Red_Hat-11.0.18.0.10-2.el8_7) (build 11.0.18+10-LTS)
OpenJDK 64-Bit Server VM (Red_Hat-11.0.18.0.10-2.el8_7) (build 11.0.18+10-LTS, mixed mode, sharing)

Hardware

HPC cluster

Operating System (OS)

RHEL 8

Conda/Container Engine

Conda

Additional context

The contents of the irma/ERR3338653.irma.consensus.fasta output file is:

>ERR3338653_1

>ERR3338653_2

>ERR3338653_3

>ERR3338653_4

>ERR3338653_5

>ERR3338653_6

>ERR3338653_7

>ERR3338653_8

[BUG]: Error executing process > 'NF_FLU:ILLUMINA:SUBTYPING_REPORT'

Is there an existing issue for this?

I have searched the existing issues

Description of the Bug/Issue

Hello,

First of all, thank you very much for your efforts to continue to enhance the nf-flu workflow. It is truly appreciated.

I have been testing each release against Illumina and Nanopore datasets for both influenza A and B.

I encountered an error during the subtyping report generation step (logs attached) runing nf-flu against an Illumina-based influenza B dataset. I haven't been able to reproduce this error running release 3.3.0 on any other dataset. I'm inclined to think it is related to parsing the specific metadata associated with the BLAST results for this sample.

I'll be sure to update this thread with any new information.

Thank you in advance.

EDIT: It definitely appears to be sample-specific. Trying to isolate offending samples for further investigation.

Nextflow command-line

nextflow nf-flu_v3.3.0/cfia-ncfad-nf-flu-3.3.0/workflow/main.nf --input IRVC20230711_SK_Illumina_fluB_validation_setup.csv --platform illumina --outdir IRVC20230711_SK_Illumina_fluB_validation_setup_nf-flu_results -profile singularity,slurm

Error Message

[cd/928ea9] process > NF_FLU:ILLUMINA:SUBTYPING_R... [ 50%] 1 of 2, failed: 1...
[-        ] process > NF_FLU:ILLUMINA:SOFTWARE_VE... -
[d2/0af9b2] NOTE: Process `NF_FLU:ILLUMINA:SUBTYPING_REPORT` terminated with an error exit status (1) -- Execution is retried (1)
ERROR ~ Error executing process > 'NF_FLU:ILLUMINA:SUBTYPING_REPORT'

Caused by:
  Process `NF_FLU:ILLUMINA:SUBTYPING_REPORT` terminated with an error exit status (1)

Command executed:

  parse_influenza_blast_results.py \
   --flu-metadata 41415333-influenza.csv \
   --top 3 \
   --excel-report nf-flu-subtyping-report.xlsx \
   --pident-threshold 0.85 \
   sample36.blastn.txt sample46.blastn.txt sample38.blastn.txt sample28.blastn.txt sample30.blastn.txt sample44.blastn.txt sample26.blastn.txt sample6.blastn.txt sample32.blastn.txt sample40.blastn.txt sample14.blastn.txt sample10.blastn.txt sample22.blastn.txt sample18.blastn.txt sample24.blastn.txt sample42.blastn.txt sample16.blastn.txt sample34.blastn.txt sample8.blastn.txt sample20.blastn.txt sample4.blastn.txt sample2.blastn.txt sample12.blastn.txt
  ln -s .command.log parse_influenza_blast_results.log
  cat <<-END_VERSIONS > versions.yml
  "NF_FLU:ILLUMINA:SUBTYPING_REPORT":
     python: $(python --version | sed 's/Python //g')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  │ │                   ┆ null       ┆ null       ┆ Sequence   │               │ │
  │ │                   │           ┆            ┆        ┆        ┆   ┆       │ │
  │ │                   ┆            ┆            ┆ 7703 from  │               �� │
  │ │                   │           ┆            ┆        ┆        ┆   ┆       │ │
  │ │                   ┆            ┆            ┆ Patent     │               │ │
  │ │                   │           ┆            ┆        ┆        ┆   ┆       │ │
  │ │                   ┆            ┆            ┆ WO2007…    │               │ │
  │ │                   │ sample6_6 ┆ GN357980.1 ┆ 94.737 ┆ 57     ┆ … ┆ null  │ │
  │ │                   ┆ null       ┆ null       ┆ Sequence   │               │ │
  │ │                   │           ┆            ┆        ┆        ┆   ┆       │ │
  │ │                   ┆            ┆            ┆ 7744 from  │               │ │
  │ │                   │           ┆            ┆        ┆        ┆   ┆       │ │
  │ │                   ┆            ┆            ┆ Patent     │               │ │
  │ │                   │           ┆            ┆        ┆        ┆   ┆       │ │
  │ │                   ┆            ┆            ┆ WO2007…    │               │ │
  │ │                   │ sample6_6 ┆ GN357979.1 ┆ 93.333 ┆ 60     ┆ … ┆ null  │ │
  │ │                   ┆ null       ┆ null       ┆ Sequence   │               │ │
  │ │                   │           ┆            ┆        ┆        ┆   ┆       │ │
  │ │                   ┆            ┆            ┆ 7743 from  │               │ │
  │ │                   │           ┆            ┆        ┆        ┆   ┆       │ │
  │ │                   ┆            ┆            ┆ Patent     │               │ │
  │ │                   │           ┆            ┆        ┆        ┆   ┆       │ │
  │ │                   ┆            ┆            ┆ WO2007…    │               │ │
  │ │                   └───────────┴────────────┴────────┴────────┴───┴─────… │ │
  │ │  df_type_counts = shape: (0, 3)                                          │ │
  │ │                   ┌──────────┬────────┬────────┐                         │ │
  │ │                   │ Genotype ┆ counts ┆ N_type │                         │ │
  │ │                   │ ---      ┆ ---    ┆ ---    │                         │ │
  │ │                   │ str      ┆ u32    ┆ str    │                         │ │
  │ │                   ╞══════════╪════════╪════════╡                         │ │
  │ │                   └──────────┴────────┴────────┘                         │ │
  │ │          h_or_n = 'N'                                                    │ │
  │ │          is_iav = True                                                   │ │
  │ │ reg_h_or_n_type = '[Nn]'                                                 │ │
  │ │             seg = '6'                                                    │ │
  │ │     type_counts = shape: (3, 2)                                          │ │
  │ │                   ┌──────────┬────────┐                                  │ │
  │ │                   │ Genotype ┆ counts │                                  │ │
  │ │                   │ ---      ┆ ---    │                                  │ │
  │ │                   │ str      ┆ u32    │                                  │ │
  │ │                   ╞══════════╪════════╡                                  │ │
  │ │                   │ B        ┆ 86     │                                  │ │
  │ │                   │ Victoria ┆ 64     │                                  │ │
  │ │                   │ Yamagata ┆ 33     │                                  │ │
  │ │                   └──────────┴────────┘                                  │ │
  │ │       type_name = 'N_type'                                               │ │
  │ │   type_to_count = []                                                     │ │
  │ ╰──────────────────────────────────────────────────────────────────────────╯ │
  ╰──────────────────────────────────────────────────────────────────────────────╯
  IndexError: list index out of range

Work dir:
  validation_results/nf-flu_v3.3.0/Illumina_FluB/IRVC20230711_SK_Illumina_fluB_validation_setup_nf-flu_results/work/cd/928ea9d2849072e0835391fa054a0c

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details

Workflow Version

v3.3.0; revision: 91a5d05f86

Nextflow Executor

Slurm

Nextflow Version

23.04.1

Java Version

openjdk version "20-internal" 2023-03-21
OpenJDK Runtime Environment (build 20-internal-adhoc..src)
OpenJDK 64-Bit Server VM (build 20-internal-adhoc..src, mixed mode, sharing)

Hardware

HPC Cluster

Operating System (OS)

No response

Conda/Container Engine

Singularity

Additional context

slurm-26930422_mod.log
command.log
nextflow_mod.log

[BUG]: Mixed IAV and IBV run analysis reports all samples as IBV only

Is there an existing issue for this?

I have searched the existing issues

Description of the Bug/Issue

Pipeline unable to resolve mixed fluA and fluB run. I performed an analysis were the majority of the samples for the run were fluB confirmed and a handful were fluA confirmed. The subtyping report returned all samples as fluB.

Nextflow command-line

nextflow run CFIA-NCFAD/nf-flu -profile singularity,slurm --platform nanopore --input samplesheet_test.csv -r 3.3.2

Error Message

Subtyping report file gives subtype for all samples as fluB even when fluA samples are present.

Workflow Version

v3.3.2 and v3.3.3

Nextflow Executor

slurm

Nextflow Version

22.04.3

Java Version

No response

Hardware

cluster

Operating System (OS)

No response

Conda/Container Engine

Singularity

Additional context

No response

Error running newly installed pipeline

Research

I have searched the documentation and previously asked questions.

Question about nf-flu

Hi there,

I just installed the updated version of this pipeline and tried re-analyzing a previously successful batch. I got the following error:

Oops... Pipeline execution stopped with the following message:<run_path>/work/c1/6c9d454ee21fddbc49551f39ee88d3/.command.sh: line 14: IRMA: command not found
ERROR ~ Error executing process > 'NF_FLU:NANOPORE:IRMA (sample1)'

Caused by:
  Process `NF_FLU:NANOPORE:IRMA (sample1)` terminated with an error exit status (127)

Command executed:

  touch irma_config.sh
  echo 'SINGLE_LOCAL_PROC=8' >> irma_config.sh
  echo 'DOUBLE_LOCAL_PROC=4' >> irma_config.sh
  # default tmp in current working directory instead of defaulting to /tmp 
  # which may be restricted in size on HPC clusters
  echo 'ALLOW_TMP=1' >> irma_config.sh
  echo 'TMP=$PWD' >> irma_config.sh
  if [ true ]; then
    echo 'DEL_TYPE="NNN"' >> irma_config.sh
    echo 'ALIGN_PROG="BLAT"' >> irma_config.sh
  fi
  
  IRMA FLU-minion sample1.merged.fastq.gz sample1
  
  if [ -d "sample1/amended_consensus/" ]; then
    cat sample1/amended_consensus/*.fa > sample1.irma.consensus.fasta
  fi
  ln -s .command.log sample1.irma.log
  cat <<-END_VERSIONS > versions.yml
  "NF_FLU:NANOPORE:IRMA":
     IRMA: $(IRMA | head -n1 | sed -E 's/^Iter.*IRMA\), v(\S+) .*/\1/')
  END_VERSIONS

Command exit status:
  127

Command output:
  (empty)

Command error:
  .command.sh: line 14: IRMA: command not found

Work dir:
 <run_path>/work/c1/6c9d454ee21fddbc49551f39ee88d3

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details

I'd appreciate any help you can provide! Thank you.

	seqid, sequence = record.id.strip(), record.seq
	seq_record_id = re.sub(r"[()\"#/@;:<>{}`+=~\|!?,]", "_", seqid)
	outfile.write(f'>{seq_record_id}\n{sequence}\n')