GithubHelp home page GithubHelp logo

Comments (8)

daaaaande avatar daaaaande commented on June 1, 2024 1

The actc downgrade seems to fix the issue. A second smrtcell also ran without hiccups.

from deepconsensus.

danielecook avatar danielecook commented on June 1, 2024

@daaaaande I don't think the pysam version is an issue.

The error is happening here:

def __next__(self) -> List[pysam.libcalignedsegment.AlignedSegment]:
if not self.keep_iter:
raise StopIteration
while self.keep_iter:
try:
read = next(self.bam_reader)
if read.is_unmapped:
continue
except StopIteration:
self.keep_iter = False
break

The error suggests an issue with the bam file - but I wonder if it's a memory issue?

  1. Try to reduce --batch_zmws from 1000 to a smaller number in this line. We typically use 100.
  2. Do any of the chunks succeed? Can you examine the file sizes of the chunks that succeed vs. the ones that fail?
  3. Try to run with a larger number of chunks (e.g. 40) and see if the problem persists.

from deepconsensus.

daaaaande avatar daaaaande commented on June 1, 2024

thanks for the swift reply! :)
to address your suggestions:

  1. made the parameter changeable and default 100. currently testing this - will report back if it changed anything!
  2. most chunks do succeed, no difference in file sizes between the chunks so far
  3. did already try to chunk into 100 parts, sadly resulted in the same error but on more chunks
    The resources given to the job are not the problem as far as i can tell, but just to make sure i increased the time and memory
    again, thanks for the feedback and i will report back!

from deepconsensus.

daaaaande avatar daaaaande commented on June 1, 2024

Update:

  • did try 35 chunks, 100 zmws batched: all jobs succeed except one
  • this one failing job has a ~10% bigger input file compared to others
  • ressources like time and memory do still not look like the problem

I'm now trying even smaller chunks (100) and 50 zmw batch size.

from deepconsensus.

daaaaande avatar daaaaande commented on June 1, 2024

Update II:

  • even smaller chunks (100) and 50 zmw batch size failed aswell, the failing chunk is again the biggest input size.
  • now doing 120 chunks, 50 zmw batches. also now trying with --batch_size 512 (i left the default in all prior runs)

from deepconsensus.

danielecook avatar danielecook commented on June 1, 2024

Thanks for the update @daaaaande - it might be helpful if you provide the command you are running.

from deepconsensus.

daaaaande avatar daaaaande commented on June 1, 2024

@danielecook thanks for the hint, will give you more information:

the logfile of deepconsensus:

I1120 19:59:39.677627 47130953445824 quick_inference.py:693] Example summary: ran model=3797 (75.83%; 7.357s) skip=1210 (24.17%; 0.148s) total=5007.
I1120 19:59:39.783081 47130953445824 quick_inference.py:770] Processed a batch of 20 ZMWs in 8.323 seconds
I1120 19:59:39.787414 47130953445824 quick_inference.py:931] Processed 15700 ZMWs in 8150.246 seconds
Traceback (most recent call last):
  File "/software/DeepConsensus/1.2.0/skylake/bin/deepconsensus", line 8, in <module>
    sys.exit(run())
  File "/software/DeepConsensus/1.2.0/icelake/lib/python3.9/site-packages/deepconsensus/cli.py", line 118, in run
    app.run(main, flags_parser=parse_flags)
  File "/software/DeepConsensus/1.2.0/icelake/lib/python3.9/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/software/DeepConsensus/1.2.0/icelake/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/software/DeepConsensus/1.2.0/icelake/lib/python3.9/site-packages/deepconsensus/cli.py", line 103, in main
    app.run(quick_inference.main, argv=passed)
  File "/software/DeepConsensus/1.2.0/icelake/lib/python3.9/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/software/DeepConsensus/1.2.0/icelake/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/software/DeepConsensus/1.2.0/icelake/lib/python3.9/site-packages/deepconsensus/inference/quick_inference.py", line 977, in main
    outcome_counter = run()
  File "/software/DeepConsensus/1.2.0/icelake/lib/python3.9/site-packages/deepconsensus/inference/quick_inference.py", line 912, in run
    for zmw, subreads, dc_config, window_widths in input_file_generator:
  File "/software/DeepConsensus/1.2.0/icelake/lib/python3.9/site-packages/deepconsensus/inference/quick_inference.py", line 480, in stream_bam
    for input_data in proc_feeder():
  File "/software/DeepConsensus/1.2.0/icelake/lib/python3.9/site-packages/deepconsensus/preprocess/pre_lib.py", line 1309, in proc_feeder
    for read_set in subread_grouper:
  File "/software/DeepConsensus/1.2.0/icelake/lib/python3.9/site-packages/deepconsensus/preprocess/pre_lib.py", line 73, in __next__
    read = next(self.bam_reader)
  File "pysam/libcalignmentfile.pyx", line 1876, in pysam.libcalignmentfile.AlignmentFile.__next__
OSError: error -3 while reading file

Snakemake error message:

Error in rule run_deep:
    jobid: 12
    input: m54274Ue_230926_083040.subreads_4-of-120_ccs_lq_mapped_to_subreads.bam, m54274Ue_230926_083040.subreads_4-of-120_ccs_lq.bam
    output: m54274Ue_230926_083040.subreads_4-of-120_deepconsensus.fastq
    conda-env: .snakemake/conda/94e59aa9aceab381a4abc6c7f412b7a1_
    shell:
        
       module load DeepConsensus &>m54274Ue_230926_083040.subreads_4-of-120_deepconsensus_logfile.log
       /software/DeepConsensus/1.2.0/skylake/bin/deepconsensus run --subreads_to_ccs=m54274Ue_230926_083040.subreads_4-of-120_ccs_lq_mapped_to_subreads.bam --ccs_bam=m54274Ue_230926_083040.s
ubreads_4-of-120_ccs_lq.bam --checkpoint=checkpoint --output=m54274Ue_230926_083040.subreads_4-of-120_deepconsensus.fastq --batch_zmws 20 --batch_size=512 --cpus=16 &>m54274Ue_230926_083040.subreads_4-of-120_deepconsensus_logfile.log
       
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: 8691257.hpc-batch

Error executing rule run_deep on cluster (jobid: 12, external: 8691257.hpc-batch, jobscript: .snakemake/tmp.8cb82736/snakejob.run
_deep.12.sh). For error details see the cluster log and the log files of the involved rule(s).


ls of the affected chunk:

ls -lth 4-of-120

-rwxrwx--- 1 daric102 users 282K Nov 20 19:59 m54274Ue_230926_083040.subreads_4-of-120_deepconsensus_logfile.log
-rwxrwx--- 1 daric102 users 5.8G Nov 20 16:23 m54274Ue_230926_083040.subreads_4-of-120_ccs_lq_mapped_to_subreads.bam
-rwxrwx--- 1 daric102 users  19M Nov 20 16:23 m54274Ue_230926_083040.subreads_4-of-120_actc_.log
-rwxrwx--- 1 daric102 users 284M Nov 20 16:19 m54274Ue_230926_083040.subreads_4-of-120_ccs_lq_mapped_to_subreads.fasta
-rwxrwx--- 1 daric102 users 344M Nov 19 13:17 m54274Ue_230926_083040.subreads_4-of-120_ccs_lq.bam
-rwxrwx--- 1 daric102 users 1.5K Nov 19 13:17 m54274Ue_230926_083040.subreads_4-of-120_ccs_lq.ccs_report.txt
-rwxrwx--- 1 daric102 users 2.5M Nov 19 13:17 m54274Ue_230926_083040.subreads_4-of-120_ccs_lq.zmw_metrics.json.gz
-rwxrwx--- 1 daric102 users 219K Nov 19 13:17 m54274Ue_230926_083040.subreads_4-of-120_ccs_lq.bam.pbi

ls of a typical, non-affected chunk:

ls -lth 14-of-120

-rwxrwx--- 1 daric102 users 540M Nov 20 19:55 m54274Ue_230926_083040.subreads_14-of-120_deepconsensus.fastq
-rwxrwx--- 1 daric102 users   88 Nov 20 19:54 m54274Ue_230926_083040.subreads_14-of-120_deepconsensus.fastq.inference.json
-rwxrwx--- 1 daric102 users 111K Nov 20 19:54 m54274Ue_230926_083040.subreads_14-of-120_deepconsensus.fastq.runtime.csv
-rwxrwx--- 1 daric102 users 272K Nov 20 19:54 m54274Ue_230926_083040.subreads_14-of-120_deepconsensus_logfile.log
-rwxrwx--- 1 daric102 users 5.5G Nov 20 16:23 m54274Ue_230926_083040.subreads_14-of-120_ccs_lq_mapped_to_subreads.bam
-rwxrwx--- 1 daric102 users  19M Nov 20 16:22 m54274Ue_230926_083040.subreads_14-of-120_actc_.log
-rwxrwx--- 1 daric102 users 276M Nov 20 16:20 m54274Ue_230926_083040.subreads_14-of-120_ccs_lq_mapped_to_subreads.fasta
-rwxrwx--- 1 daric102 users 332M Nov 19 12:30 m54274Ue_230926_083040.subreads_14-of-120_ccs_lq.bam
-rwxrwx--- 1 daric102 users 1.5K Nov 19 12:30 m54274Ue_230926_083040.subreads_14-of-120_ccs_lq.ccs_report.txt
-rwxrwx--- 1 daric102 users 2.4M Nov 19 12:30 m54274Ue_230926_083040.subreads_14-of-120_ccs_lq.zmw_metrics.json.gz
-rwxrwx--- 1 daric102 users 210K Nov 19 12:30 m54274Ue_230926_083040.subreads_14-of-120_ccs_lq.bam.pbi

actc used:

(test_2) [daric102@hilbert213 deepconsensus_snake]$ conda activate .snakemake/conda/94e59aa9aceab381a4abc6c7f412b7a1_
(.snakemake/conda/94e59aa9aceab381a4abc6c7f412b7a1_) [daric102@hilbert213 deepconsensus_snake]$ actc --version
actc 0.6.0

Using:
actc : 0.6.0 (commit v0.6.0)
pbbam : 2.4.99 (commit v2.4.0-23-g59248fe)
pbcopper : 2.3.99 (commit v2.3.0-28-ga9b1ffa)
boost : 1.81
htslib : 1.17
zlib : 1.2.13

checking the stats of the failed chunk:

samtools stats m54274Ue_230926_083040.subreads_4-of-120_ccs_lq_mapped_to_subreads.bam >m54274Ue_230926_083040.subreads_4-of-120_mapped_stats_.out
Failure while decoding file

loading the deepconsensus module:

module load DeepConsensus

 CUDA Toolkit 11.4.3
 |--------------------------------------------------------------~o
   binaries      : /software/CUDA/11.4.3/bin
   libraries     : /software/CUDA/11.4.3/lib64
   includes      : /software/CUDA/11.4.3/include
   documentation : /software/CUDA/11.4.3/doc

 Intel oneAPI-2022.2
 |--------------------------------------------------------------~o
   C compiler         : icc
   C++ compiler       : icpc
   FORTRAN77 compiler : ifort
   FORTRAN90 compiler : ifort
   FORTRAN95 compiler : ifort

   runtime libraries  : /software/intel/oneAPI-2022.2/compiler/lib/intel64

 Intel Math Kernel Library (2022.2.1.0)
 |--------------------------------------------------------------~o
   libraries          : /software/intel/oneAPI-2022.2/mkl/2022.1.0/lib/intel64
   interfaces         : /software/intel/oneAPI-2022.2/mkl/2022.1.0/interfaces

 IntelMPI (2021.6.0)
 |--------------------------------------------------------------~o
   C compiler         : mpiicc
   C++ compiler       : mpiicpc
   FORTRAN77 compiler : mpiifort
   FORTRAN90 compiler : mpiifort
   FORTRAN95 compiler : mpiifort

   runtime libraries  : /software/intel/oneAPI-2022.2/mpi/2021.6.0/intel64/lib

 Intel Python Version 3.9.10
 |--------------------------------------------------------------~o
   binaries  : /software/intel/oneAPI-2022.2/intelpython/latest/bin
   libs       : /software/intel/oneAPI-2022.2/intelpython/latest/lib
   includes  : /software/intel/oneAPI-2022.2/intelpython/latest/include



 DeepConsensus 1.2.0

|--------------------------------------------------------------~o
   binaries : /software/DeepConsensus/1.2.0/skylake/bin

ressources requested per deepconsensus - chunk:

 "nodes" : 1,
      "time" : "8:59:59",
      "mem" : "72GB",
      "ncpus" : "16",
      "ngpus" : "0",
      "arch" : "skylake",

actc log:

...
 | 20231120 15:23:01.419 | DEBUG    | CLR parser     : m54274Ue_230926_083040/6751778/241410_263248
| 20231120 15:23:01.419 | DEBUG    | CLR parser     : m54274Ue_230926_083040/6751778/263277_269659
| 20231120 15:23:01.419 | FATAL    | CCS reader     : CCS ZMW 6751779 has multiple records. Ignoring ZMW!
...

ccs report:

                        Double-Strand Reads  Single-Strand Reads
Inputs                   :      41714 (94.03%)        5292 (5.965%)

Passed                   :      15604 (37.41%)        4553 (86.04%)
Failed                   :      26110 (62.59%)         739 (13.96%)

Tandem repeats           :        335 (1.283%)         128 (17.32%)

Exclusive failed counts
Below SNR threshold      :        715 (2.738%)           0 (0.000%)
Median length filter     :          0 (0.000%)           0 (0.000%)
Shortcut filters         :          0 (0.000%)           0 (0.000%)
Lacking full passes      :      24841 (95.14%)         638 (86.33%)
Coverage drops           :         69 (0.264%)          14 (1.894%)
Insufficient draft cov   :        101 (0.387%)          16 (2.165%)
Draft too different      :          0 (0.000%)           0 (0.000%)
Draft generation error   :        374 (1.432%)          68 (9.202%)
Draft above --max-length :          1 (0.004%)           0 (0.000%)
Draft below --min-length :          0 (0.000%)           0 (0.000%)
Reads failed polishing   :          0 (0.000%)           0 (0.000%)
Empty coverage windows   :          7 (0.027%)           1 (0.135%)
CCS did not converge     :          1 (0.004%)           2 (0.271%)
CCS below minimum RQ     :          1 (0.004%)           0 (0.000%)
Unknown error            :          0 (0.000%)           0 (0.000%)

Additional passing metrics
Missing adapters         :        408 (2.615%)         169 (3.712%)

did samtools stats of the ccs_lq.bam:

This file was produced by samtools stats (1.18+htslib-1.18) and can be plotted using plot-bamstats
This file contains statistics for all reads.
The command line was:  stats m54274Ue_230926_083040.subreads_4-of-120_ccs_lq.bam
CHK, Checksum [2]Read Names   [3]Sequences    [4]Qualities
CHK, CRC32 of reads which passed filtering followed by addition (32bit overflow)
CHK     31679cf3        96079b4e        df23c499
Summary Numbers. Use `grep ^SN | cut -f 2-` to extract this part.
SN      raw total sequences:    20157   # excluding supplementary and secondary reads
SN      filtered sequences:     0
SN      sequences:      20157
SN      is sorted:      1
SN      1st fragments:  20157
SN      last fragments: 0
SN      reads mapped:   0
SN      reads mapped and paired:        0       # paired-end technology bit set + both mates mapped
SN      reads unmapped: 20157
SN      reads properly paired:  0       # proper-pair bit set
SN      reads paired:   0       # paired-end technology bit set
SN      reads duplicated:       0       # PCR or optical duplicate bit set
SN      reads MQ0:      0       # mapped and MQ=0
SN      reads QC failed:        0
SN      non-primary alignments: 0
SN      supplementary alignments:       0
SN      total length:   385773399       # ignores clipping
SN      total first fragment length:    385773399       # ignores clipping
SN      total last fragment length:     0       # ignores clipping
SN      bases mapped:   0       # ignores clipping
SN      bases mapped (cigar):   0       # more accurate
SN      bases trimmed:  0
SN      bases duplicated:       0
SN      mismatches:     0       # from NM fields
SN      error rate:     0.000000e+00    # mismatches / bases mapped (cigar)
SN      average length: 19138
SN      average first fragment length:  19138
SN      average last fragment length:   0
SN      maximum length: 46259
SN      maximum first fragment length:  46259
SN      maximum last fragment length:   0

samtools checking again the ccs_mapped_to_subreads.bam

samtools quickcheck m54274Ue_230926_083040.subreads_4-of-120_ccs_lq_mapped_to_subreads.bam -vvvvv
verbosity set to 5
checking m54274Ue_230926_083040.subreads_4-of-120_ccs_lq_mapped_to_subreads.bam
opened m54274Ue_230926_083040.subreads_4-of-120_ccs_lq_mapped_to_subreads.bam
m54274Ue_230926_083040.subreads_4-of-120_ccs_lq_mapped_to_subreads.bam is sequence data
m54274Ue_230926_083040.subreads_4-of-120_ccs_lq_mapped_to_subreads.bam has 15706 targets in header.
m54274Ue_230926_083040.subreads_4-of-120_ccs_lq_mapped_to_subreads.bam has good EOF block.

if you want i can share loads of logs, just let me know what you think would be the most useful!
Also, the ccs_lq_mapped_to_subreads not being parsed by samtools stats gives me the idea that this .bam file ist corrupt/ not according to specs/ confusing deepconsensus - but the quickcheck does not complain. Any idea there?

from deepconsensus.

daaaaande avatar daaaaande commented on June 1, 2024

mini-update:
so it seems like i fixed the issue by downgrading actc to 0.2.0.
Still need to test more, but as soon as another smrtcell also works with this i will close this issue.

from deepconsensus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.