GithubHelp home page GithubHelp logo

wtsi-hpag / scaff10x Goto Github PK

View Code? Open in Web Editor NEW
20.0 20.0 4.0 4.04 MB

Pipeline for scaffolding and breaking a genome assembly using 10x genomics linked-reads

License: MIT License

Shell 0.54% C 99.30% Makefile 0.16%
10xgenomics assembly bioinformatics breaking genome genomics scaffolding

scaff10x's People

Contributors

edharry avatar fg6 avatar zning-sanger avatar zning01 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

scaff10x's Issues

segmentation fault scaff_bwa.c

Dear developers,

we are running Scaff10x V.5 with the following command:

scaff10x -nodes 30 -longread 1 -gap 100 -matrix 2000 -reads 10 -link 8 -score 20 -edge 50000  -block 50000 -data input.dat contigs-break.fasta output_scaffolds.fasta

My input.dat is:

q1=/mypath/to/fastq/NA24143_barcoded.part_001.fastq.gz
q2=/mypath/to/fastq/NA24143_barcoded.part_002.fastq.gz

and we obtain the following error message:

[M::mem_pestat] mean and std.dev: (272.64, 66.53)
[M::mem_pestat] low and high boundaries for proper pairs: (7, 539)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 28 reads in 0.045 CPU sec, 0.018 real sec
[main] Version: 0.7.17-r1198-dirty
[main] CMD: /mypath/Scaff10X/src/scaff-bin/bwa mem -p -t 30 tarseq.fastq -
[main] Real time: 2.431 sec; CPU: 2.265 sec
sh : ligne 1 : 11840 segmentation fault /mypath/Scaff10X/src/scaff-bin/scaff_bwa -edge 50000 tarseq.tag align.dat align2.dat > try.out
Error running command: /mypath/scaff_bwa -edge 50000 tarseq.tag align.dat align2.dat > try.out
Input target assembly file2: /mypath/contigs-break.fasta
www: /mypath/input.dat input.dat
Input data file: /mypath/input.dat
/mypath/Scaff10X/src/scaff-bin/scaff_FilePreProcess -t 2 -n 1 /mypath/input.dat - |/mypath/Scaff10X/src/scaff-bin/bwa mem -p -t 30 tarseq.fastq -  | egrep tarseq_ | awk '($2<100)&&($5>=0){print $1,$2,$3,$4,$5}' | egrep -v '^@' > align.dat
/mypath/scaff10x/Scaff10X/src/scaff-bin/scaff_bwa -edge 50000 tarseq.tag align.dat align2.dat > try.out

I tried to launch it several times.
We did not get any out of memory error.
Instead of input.dat I tried to put directly the fastq. Here, I run it on the result of break10x but I also tried on another assembly.

So we tried to find where the error is by modifying the src/makefile by adding the following flag :

-ggdb -fsanitize=address  -fno-omit-frame-pointer -static-libstdc++ -static-libgcc -static-libasan -lrt

like this:

# Makefile for scaff10x
CC= gcc
CFLAGS= -O2 -std=c11 -march=x86-64 -mtune=generic -ggdb -fsanitize=address -fno-omit-frame-pointer -static-libstdc++ -static-libgcc -static-libasan -lrt
LFLAGS= -lm -pthread -lz

we run the command that causes the problem in tmp_rununik repertory :

Scaff10X/src/scaff-bin/scaff_bwa -edge 50000 tarseq.tag align.dat align2.dat

and we obtain:

AddressSanitizer:DEADLYSIGNAL
=41390==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000001 (pc 0x7f8c98704721 bp 0x7ffd69f98740 sp 0x7ffd69f97ed8 T0)
==41390==The signal is caused by a READ memory access.
==41390==Hint: address points to the zero page.
#0 0x7f8c98704721 in __strlen_sse2_pminub (/lib64/libc.so.6+0x16f721)
#1 0x433f33 in __interceptor_strcpy ../../.././libsanitizer/asan/asan_interceptors.cpp:437
#2 0x406e52 in main /mypath/src/scaff_bwa.c:220
#3 0x7f8c985b7554 in __libc_start_main (/lib64/libc.so.6+0x22554)
#4 0x407006  (/mypath/Scaff10X/src/scaff-bin/scaff_bwa+0x407006)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/lib64/libc.so.6+0x16f721) in __strlen_sse2_pminub

Do you have an idea for the resolution of this problem?
It seems that your code (scaff_bwa.c, line 220) is expecting a _ in align.dat, but there is none.

Thank you!

cannot find input.dat

Hello,

I would like to run Scaff10x, but after a minute or so that is running, I get this error:

[copettid@kp140-95 scaffolding_Rabv2]$ /data/dario/bin/Scaff10X/src/scaff10x -nodes 15 -longread 0 -gap 100 -matrix 2000 -reads 10 -score 30 -edge 10000 -link 10 -block 50000 -plot barcode_lengtg.png -data input.dat Rabiosa_genome_v2.2.fa Rabv2_Scaff10x_test
Input target assembly file2: /data/dario/scaffolding_Rabv2/Rabiosa_genome_v2.2.fa
cp input.dat align.dat
cp: cannot stat 'input.dat': No such file or directory
Error running command: cp input.dat align.dat

though the file is there and I can run stat on it:

[copettid@kp140-95 scaffolding_Rabv2]$ ls -lrth
total 4.8G
-rwx------+ 1 copettid mpb 4.8G Jul 17 17:35 Rabiosa_genome_v2.2.fa
-rw-rw----+ 1 copettid mpb  324 Jul 17 17:38 input.dat 
[copettid@kp140-95 scaffolding_Rabv2]$ stat input.dat
  File: input.dat
  Size: 324             Blocks: 8          IO Block: 4096   regular file
Device: fd00h/64768d    Inode: 4299148105  Links: 1
Access: (0660/-rw-rw----)  Uid: (201168/copettid)   Gid: (217165/     mpb)
Access: 2019-07-17 17:38:42.055868594 +0200
Modify: 2019-07-17 17:38:42.055868594 +0200
Change: 2019-07-17 17:38:53.782943899 +0200
 Birth: -

can you help me figure out the issue?
Thanks,

Dario

scaff_reads: Segmentation fault

Hi there,

scaff_reads can only produce genome-BC_1.fastq.gz, but not genome-BC_2.fastq.gz. Here I list information as follows. Please help to fix it.

Thanks a lot

command line:

Scaff10X/src/scaff_reads -nodes 30 input.dat genome-BC_1.fastq.gz genome-BC_2.fastq.gz

# error information
58211 Segmentation fault Scaff10X/src/scaff-bin/scaff_BC-reads-2 MySample_S1_L001_R1_001.fastq.name MySample_S1_L001_R2_001.fastq MySample_S1_L001_R2_001.fastq.RC2 > try.out

cups & mem

#SBATCH --cpus-per-task=30
#SBATCH --mem=500G

input.dat

q1=MySample_S1_L001_R1_001.fastq (base size: 150Gb )
q2=MySample_S1_L001_R2_001.fastq (base size: 150Gb)

fastq

@CL100073098L1C001R001_7
CCCAATGGGACAATGGCAGGGCTGCCTATGGGGGAACCGGCATTGCTGTGAGGGTCGGGGGGACTATTGTATCTGTAAAGGATCAGCCATGGCCAGAAGTAGGTTTCTGAGCTGAGCGGTGACAGACTGTGCCCTTTTCCTGGCAGGAGG
+
@:GFFDFFFFF9FFFCFGFFDFGEFFE@FFFFFDEF@FF:F?DFGFF@FGFFFEDFEFFFGF;=FFFFGFFGGFFEBFCGE5F>BCFFBFFFFFFCF1BFAGFCDEFEF7FEFF,FGFADF3DBD=FFC84FFFGBFF7GF:DFDFF=EF
@CL100073098L1C001R001_9
CTGCGTTTCGCGGCATGCTTTCTAGAAGCTTAAGTTGTCTGTTTTTCCACCCTCCAAATTGTCTGACCACTTGTTGATAGTAGCAATTCCATTTTAATACCTTATGTCATAAGTATTTTAAGCAACCAAAAGATTCCTTTATTTTTTGCA
+
FFFGFFFFFGGB;FFGGFEGEGEEGGCFGEEE=GFFGEGEGFFGGCEGFGDFGBFFBBGFEGDFEFEGBEFBBGGG:GGBDFDFDGGGF?ECF@F@GEAEAEEEEEGF>GDFDEEEECFF,GFFFE1FGGBEGCEG@EAC?DCGEAEB5@

Scaffolding with parental reads

Hello and thank you for the excellent tool.

I have a PB + nanopore assembly and 10X linked reads from the parents for trio binning, however I do not have the linked reads for the individual with PB and nanopore (I have it for his littermate -- long story). Would you recommend combining the paternal linked reads into a single fastq file (i.e., cat maternal_R1.fastq.gz paternal_R1_fastq.gz ; cat maternal_R2.fastq.gz paternal_R2_fastq.gz ) and running scaff10X on these merged data, or could you see this leading to misassemblies?

Thanks,
Dustin

Pigz installation

The install.sh script failed with the following error:
./install.sh: line 89: cd: /scratch/tmp/amikhail/src/Scaff10X/src/pigz-2.6: No such file or directory
cp: cannot stat ‘pigz’: No such file or directory
!! Error: pigz not installed properly!

Pigz-2.6 is not available when trying to download it with wget. I had to change it with Pigz-2.7 manually in the install.sh script. In case other people have the same problem.

scaff_screeN not found

Hi,

I'm trying to run Scaff10X (4.2) to assemble a set of contigs with 10X reads. However, right at the start I get the following problem:

Input:

./scaff10x -nodes 30 -longread 1 -gap 100 -matrix 2000 -reads 10 -score 20 -edge 50000 -link 8 -block 50000 -plot barcode_lengtg.png myassembly.fa R1.test.fq.gz R2.test.fq.gz output.test.fa

Output:

Input target assembly file2: /home/Scaff10X/src/myassembly.fa
Input read1 file: /home/Scaff10X/src/R1.test.fq.gz
Input read2 file: /home/Scaff10X/src/R2.test.fq.gz
sh: 1: ./scaff-bin/scaff_screeN: not found
Error running command: ./scaff-bin/scaff_screeN /home/Scaff10X/src/myassembly.fa cleaN.fasta > try.out

However, scaff_screeN is right where it should be, and even if I type ./scaff-bin/scaff_screeN the prompt delivers a usage instruction, signalling the executable works: Usage: ./scaff-bin/scaff_screeN <input fast/aq file> <output fasta file>

I tried this in two separate Linux machines, both gave the same issue. Do you have any idea what could be happening?

pre n50 = scaff10x n50

Hi,

I hope you can help me out with the following. I am running scaff10x on a pacbio draft assembly and there is no change in metrics after running the software using default parameters. The pipeline runs smoothly and produces all the expected files, but I am, wondering that my 10x data might not be of the best quality after looking at the barcode length distributions and coverage stats in the generated plot. In any case, I would love to get some benefit out of it. So, my question is, can I re-run the pipeline with tweaked parameters using the output files generated in the first run? If yes, what would be the flag for using the align.dat but with different block and edge parameters?

Thanking in advance and waiting for your answer

Luis

Segmentation fault

Hi,

I just downloaded and installed the program:

Downloading and installing BWA
 BWA succesfully installed!

Downloading and installing Smalt
 Smalt succesfully installed!
Downloading and installing pigz
 pigz succesfully installed!

Compiling scaff10X sources

Checking installation:
 Congrats: installation successful!

But when I execute it I get a
317644 Segmentation fault

What's wrong?
Thanks
F

how does scaff_reads works

Hi,
Can you please give some details on how "Scaff_reads" works on forward and reverse linked reads to remove adapters?
Thank you
Upendra

Floating point exception in break10x

Hi there,

I'm running break10x on my cluster as follows:

/nfs/research1/marioni/claumer/Scaff10X/src/break10x -nodes 40 -gap 100 -reads 5 -score 20 -cover 50 -ratio 15 /nfs/research1/marioni/claumer/DalyG_HiFi/Scaff10x/DalyG_HiFi_q10_redbean_all.ctg.fa /nfs/research1/marioni/claumer/DalyG_HiFi/fwdrev_all/Scaff10x/DalyG_10x_R1.fastq /nfs/research1/marioni/claumer/DalyG_HiFi/fwdrev_all/Scaff10x/DalyG_10x_R2.fastq DalyG_wtdbg2_break10x DalyG_wtdbg2_break10x_breakpoints

I've installed using a conda environment built to use gcc 7.1.0.

The program finishes, gives a "successful complete" to LSF, but when I check stderr I see these last couple of lines:

[main] Version: 0.7.17-r1198-dirty
[main] CMD: /nfs/research1/marioni/claumer/Scaff10X/src/scaff-bin/bwa mem -t 40 tarseq.fastq /nfs/research1/marioni/claumer/DalyG_HiFi/fwdrev_all/Scaff10x/DalyG_10x_R1.fastq /nfs/research1/marioni/claumer/DalyG_HiFi/fwdrev_all/Scaff10x/DalyG_10x_R2.fastq
[main] Real time: 15635.842 sec; CPU: 211509.263 sec
sh: line 1: 23754 Floating point exception/nfs/research1/marioni/claumer/Scaff10X/src/scaff-bin/scaff_barcode-cover -score 20 -cover 50 -ratio 15 align.length-sort break.dat cover.dat > break.out

And it looks like the breakpoints file is completely empty and the assembly that's output is the same as the input. So I assume this means break10x has failed. Can you advise on the cause and remedy?

Regards,
Chris L

File Handling and Core Dumps.

Would it be possible to use named pipes in the very first step, i.e scaff_names series? Also, where would I be changing the default initial data copy behavior in the source code? Asking this as I'm currently working on a 1gb genome and we have a lane of data, so this would be likely eating up around 500Gb for just copying and extracting data.

Cursory checks with mkfifo as well as /dev/stdin for gunzipping and providing the same to scaff_names works well!

Can you share the Memory requirements as well, the only error seems to be "Core Dump, segmentation fault" on the terminal and quitting? Scaff10X seems to be giving up on another genome of about 400Mb with 100x coverage as well.

I'll share the log, currently I'm recompiling on another server.

another seg fault issue

sorry..

step 1. binary executes, results in help menu.

(scaff10x) [macmanes@premise genome]$ scaff10x
Program: scaff10x - Genome Scaffolding using 10X Chromium Data
Version: 4.2

Usage: scaff10x -nodes 30 -longread 1 -gap 100 -matrix 2000 -reads 10 -score 20 -edge 50000 -link 8 -block 50000 -plot barcode_lengtg.png <input_assembly_fasta/q_file> <Input_read_1>> <Input_read_2> <Output_scaffold_file>
       nodes    (30)    - number of CPUs requested
       matrix   (2000)  - relation matrix size
       reads    (10)    - step 1 and 2: minimum number of reads per barcode
       link     (8)     - step 1 and 2: minimum number of shared barcodes
       read-s1 (8)     - step 1: minimum number of reads per barcode
       read-s2 (10)    - step 2: minimum number of reads per barcode
       link-s1  (8)     - step 1: minimum number of shared barcodes
       link-s2  (10)    - step 2: minimum number of shared barcodes
       edge     (50000) - edge length to consider for scaffolding
       score    (20)    - minimum average mapping score on a barcode covered area
       block    (50000) - length to determine for nearest neighbours
       longread (1)     - contigs were produced using PacBio or ONT reads
                (0)     - contigs were produced from short reads such as Illumina
       file     (0)     - do not output the sam file in order to save disk space
                (1)     - sam file from bwa is saved
       gap      (100)   - gap size in building scaffold
       sam      ()      - previously aligned sam file by bwa
       bam      ()      - previously aligned bam file by longrange
       plot     (barcode_lengtg.png) - output image file on barcode length distributions

step 2. seg fault with dat file

(scaff10x) [macmanes@premise genome]$ scaff10x -data reads.dat Neotomodon_alstoni_10x_v6.fasta Neotomodon_alstoni_10x_v7.fasta
Segmentation fault

here is the dat file

q1=/mnt/lustre/macmaneslab/shared/pero_genomes/Neotomodon_alstoni/reads/NEAL_S1_L006_R1_001.fastq.gz
q2=/mnt/lustre/macmaneslab/shared/pero_genomes/Neotomodon_alstoni/reads/NEAL_S1_L006_R2_001.fastq.gz

step 3. segfault with passing reads on the command line

(scaff10x) [macmanes@premise genome]$ scaff10x Neotomodon_alstoni_10x_v6.fasta ../reads/NEAL_S1_L006_R1_001.fastq.gz ../reads/NEAL_S1_L006_R2_001.fastq.gz  Neotomodon_alstoni_10x_v7.fasta
Segmentation fault

run fails at scaff_matrix

Hello,
I was able to run the alignment step, then the run died at the scaff_matrix step.
This is the stdout:

[...]
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 569628 reads in 2053.558 CPU sec, 114.390 real sec
[main] Version: 0.7.17-r1198-dirty
[main] CMD: /data/dario/bin/Scaff10X/src/scaff-bin/bwa mem -p -t 18 tarseq.fastq -
[main] Real time: 321077.765 sec; CPU: 5706537.235 sec
sh: line 1: 14303 Segmentation fault      (core dumped) /data/dario/bin/Scaff10X/src/scaff-bin/scaff_matrix -file 1 -matrix 2000 -link 10 -uplink 50 -longread 0 barcodes.clean tarseq.tag contig.dat > scaff.out
Error running command: /data/dario/bin/Scaff10X/src/scaff-bin/scaff_matrix -file 1 -matrix 2000 -link 10 -uplink 50 -longread 0 barcodes.clean tarseq.tag contig.dat > scaff.out

and this is the content of the tmp folder:

total 59G
-rw-rw----+ 1 copettid mpb  64M Jul 18 17:16 tarseq.tag
-rw-rw----+ 1 copettid mpb 9.5G Jul 18 17:16 tarseq.fastq
-rw-rw----+ 1 copettid mpb 4.8G Jul 18 18:12 tarseq.fastq.bwt
-rw-rw----+ 1 copettid mpb 1.2G Jul 18 18:14 tarseq.fastq.pac
-rw-rw----+ 1 copettid mpb  76M Jul 18 18:14 tarseq.fastq.ann
-rw-rw----+ 1 copettid mpb 2.4M Jul 18 18:14 tarseq.fastq.amb
-rw-rw----+ 1 copettid mpb 2.4G Jul 18 18:41 tarseq.fastq.sa
-rw-rw----+ 1 copettid mpb  31G Jul 22 11:52 align.dat
-rw-rw----+ 1 copettid mpb 3.3G Jul 22 11:55 align2.dat
-rw-rw----+ 1 copettid mpb 3.3G Jul 22 11:57 align.sort
-rw-rw----+ 1 copettid mpb 3.0G Jul 22 11:57 align.sort2
-rw-rw----+ 1 copettid mpb  73M Jul 22 11:58 barcodes.clust
-rw-rw----+ 1 copettid mpb   32 Jul 22 11:58 try.out
-rw-rw----+ 1 copettid mpb  54M Jul 22 11:58 barcodes.clean
-rw-rw----+ 1 copettid mpb  649 Jul 22 11:58 scaff.out

is the script looking for the contig.dat file and can't find it maybe?
If we find the issue, is it possible to restart from after the alignment step? It took a long time and I think we have the right files.
Thanks,
Dario

Break10x runtime error - missing file

Dear Francesca Giordano,

I am currently trying to optimize my 10x chromium de-novo assembly and am trying out different tools to check for assembly errors. The Scaff10x function seems to work perfectly. However, if I try to run the Break10x, I get the following error:

...
[M::mem_pestat] low and high boundaries for proper pairs: (1, 823)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 1535158 reads in 519.780 CPU sec, 20.325 real sec
[main] Version: 0.7.17-r1194-dirty
[main] CMD: /home/user/softwares/scaff10x/Scaff10X/src/scaff-bin/bwa mem -t 30 tarseq.fastq /home/user/my_data/N15_10x/raw_reads/read-BC_1.fastq /home/jderaad/my_data/N15_10x/raw_reads/read-BC_2.fastq
[main] Real time: 234729.355 sec; CPU: 201340.607 sec
sh: /home/user/softwares/scaff10x/Scaff10X/src/scaff-bin/scaff_barcode-length: No such file or directory
sort: cannot read: align.length-5: No such file or directory

There appears to be an error due to the missing "scaff_barcode-length" file and there is no file named "scaff_barcode-length" in the softwares/scaff10x/Scaff10X/src/scaff-bin directory. Is this an error in the script or am I missing a crucial file?

Kind regards,

Jordi

"Numbers of contigs are difference," error in "scaff_bwa-barcode"

Hey!

Thank you for your great tool.

I am trying to use some 10X linked reads to improve the contig assembly of a _de _novo genome completed with oxford nanopore long reads. I first aligned the 10X reads with longranger and then continued from there.

Command:

/hpf/tools/centos7/Scaff10X/4.1/src/scaff10x -nodes 25 -bam /hpf/largeprojects/mdwilson/dustin/new_genome/phase_link/SUB_2626M1/outs/possorted_bam.bam genome.fa male_output.fasta

The "genome.fa" is the genome fasta file produced in the "refdata-assembly/fasta/" file made from longranger mkref.

The error specifically was:
Error running command: /hpf/tools/centos7/Scaff10X/4.1/src/scaff-bin/scaff_bwa-barcode tarseq.tag align0.dat align.dat > try.out

Try try-out file had this:
2751 409880285
Numbers of contigs: 2750 2751
Numbers of contigs are difference, please check reference assembly! 2750 2751

While not entirely sure what this meant, I did some digging and I noticed that one scaffold had 0 10-X reads aligning to it. Below is the summary of reads aligned per contig and the contig lacking reads.
image

I also noticed that the contig itself is on the shorter side.

Together, I have the following questions:

  1. Could the lack of alignment to a contig be responsible for this error?
  2. should there be a contig length cutoff in the inputted assembly?
  3. If this error is coming elsewhere, do you happen to know the source?

Thanks so much!
Dustin

add gnuplot and inkscape to install.sh

Running Scaff10X v4.2 with -plot may fail due to missing dependencies (i.e. clusters requiring modules to be explicitly loaded). I may attempt to implement this, but compiling inkscape from source seems a bit tricky.

buffer overflow detected

Hello,
I tried to run scaff_reads on my 10X raw data with:
scaff_data reads.dat reads_1.fastq reads_2.fastq > try.out
Where, my reads.dat file had:
q1 = /10X_RAWDATA/S1_L008_R1_001.fastq.gz
q2=/10X_RAWDATA/S1_L008_R2_001.fastq.gz

I get the following error:

buffer overflow detected: scaff_reads terminated
Aborted (core dumped)

Please let me know, how to solve this issue ? I am running it on 40 core workstation with 125GB RAM. Instead of this, can I use barcoded.fastq.gz file produced by 'longranger basic' ?

Thanks in Advance,
AP

segmentation fault

Hello,
I am running Scaff10x V.4.2 with the following command:
scaff10x -nodes 40 -longread 1 -plot 10x_coverage.png PacBioScaffolds.fasta 19-Bp_S1_L001_R1_001.fastq.gz 19-Bp_S1_L001_R2_001.fastq.gz Result_scaff10x.fasta

Scaff10x is running and also produces a couple of output files, of which align2.dat and try.out are empty:
0 Apr 30 17:40 align2.dat
28G Apr 29 22:56 align.dat
1.3G Apr 29 11:40 cleaN.fasta
2.9M Apr 29 22:56 core.61997
2.5G Apr 29 11:40 tarseq.fastq
19 Apr 29 11:58 tarseq.fastq.amb
1.6M Apr 29 11:58 tarseq.fastq.ann
1.3G Apr 29 11:58 tarseq.fastq.bwt
312M Apr 29 11:58 tarseq.fastq.pac
624M Apr 29 12:06 tarseq.fastq.sa
1.9M Apr 29 11:40 tarseq.tag
0 Apr 29 22:56 try.out

and I obtain the following error message:
...
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 545)
[M::mem_pestat] mean and std.dev: (192.88, 98.24)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 685)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 434416 reads in 662.155 CPU sec, 18.441 real sec
[main] Version: 0.7.17-r1198-dirty
[main] CMD: /cluster/software/scaff10x/Scaff10X-4.2/src/scaff-bin/bwa mem -t 40 tarseq.fastq R1_001.fastq.gz 19-Bp_S1_L001_R2_001.fastq.gz
[main] Real time: 38988.280 sec; CPU: 1509159.415 sec
sh: line 1: 61997 Segmentation fault (core dumped) /cluster/software/scaff10x/Scaff10X-4.2/src/scaff-bin/scaff_bwa -edge 50000 tarseq.tag align.dat align2.dat > try.out
Error running command: /cluster/software/scaff10x/Scaff10X-4.2/src/scaff-bin/scaff_bwa -edge 50000 tarseq.tag align.dat align2.dat > try.out

scaff_bwa segmentation fault

Hi there,

I am encountering a segmentation fault when running Scaff10X.

[main] Version: 0.7.17-r1188
[main] CMD: /home/sejoslin/miniconda3/envs/scaffold_10x/bin/Scaff10X/scaff-bin/bwa mem -t 16 tarseq.fastq /group/millermrgrp2/shannon/projects/assembly_genome_Hypomesus-transpacificus/00-raw_data/data-10X_M/Male2_S63_L004_R1_001.fastq.gz /group/millermrgrp2/shannon/projects/assembly_genome_Hypomesus-transpacificus/00-raw_data/data-10X_M/Male2_S63_L004_R2_001.fastq.gz
[main] Real time: 140745.891 sec; CPU: 2245179.580 sec
Segmentation fault
Error running command: /home/sejoslin/miniconda3/envs/scaffold_10x/bin/Scaff10X/scaff-bin/scaff_bwa -edge 50000 tarseq.tag align.dat align2.dat > try.out

Here is the stderr file associated with the error:
test_assembly.j23957420.run_scaff10x.hi.err.txt

Here is the stdout file associated with the job:
test_assembly.j23957420.run_scaff10x.hi.out.txt

I have the following lines in the tmp_rununik_10443

-rw-rw-r-- 1 sejoslin millermrgrp    0 Jul  3 06:46 align2.dat
-rw-rw-r-- 1 sejoslin millermrgrp  23G Jul  3 06:46 align.dat
-rw-rw-r-- 1 sejoslin millermrgrp 930M Jul  1 15:31 tarseq.fastq
-rw-rw-r-- 1 sejoslin millermrgrp  257 Jul  1 15:38 tarseq.fastq.amb
-rw-rw-r-- 1 sejoslin millermrgrp 201K Jul  1 15:38 tarseq.fastq.ann
-rw-rw-r-- 1 sejoslin millermrgrp 465M Jul  1 15:38 tarseq.fastq.bwt
-rw-rw-r-- 1 sejoslin millermrgrp 117M Jul  1 15:38 tarseq.fastq.pac
-rw-rw-r-- 1 sejoslin millermrgrp 233M Jul  1 15:40 tarseq.fastq.sa
-rw-rw-r-- 1 sejoslin millermrgrp 187K Jul  1 15:31 tarseq.tag
-rw-rw-r-- 1 sejoslin millermrgrp    0 Jul  1 15:31 try.out

and my align.dat file looks like this:

(base) sejoslin@farm:tmp_rununik_10443$ head align.dat
A00351:291:HVMC5DSXX:4:1101:1307:1000 69 tarseq_648 132029 0
A00351:291:HVMC5DSXX:4:1101:1398:1000 99 tarseq_462 123666 60
A00351:291:HVMC5DSXX:4:1101:1524:1000 81 tarseq_236 424566 0
A00351:291:HVMC5DSXX:4:1101:1597:1000 97 tarseq_874 104495 0
A00351:291:HVMC5DSXX:4:1101:1687:1000 99 tarseq_16 418743 60
A00351:291:HVMC5DSXX:4:1101:1940:1000 97 tarseq_2149 50794 0
A00351:291:HVMC5DSXX:4:1101:2284:1000 99 tarseq_338 173397 60
A00351:291:HVMC5DSXX:4:1101:2302:1000 83 tarseq_302 99246 60
A00351:291:HVMC5DSXX:4:1101:2483:1000 83 tarseq_5085 17812 60
A00351:291:HVMC5DSXX:4:1101:2591:1000 99 tarseq_102 116777 41
(base) sejoslin@farm:tmp_rununik_10443$ tail align.dat
A00351:291:HVMC5DSXX:4:2678:13132:36949 99 tarseq_1383 26746 60
A00351:291:HVMC5DSXX:4:2678:13277:36949 65 tarseq_1446 66935 60
A00351:291:HVMC5DSXX:4:2678:13313:36949 99 tarseq_569 12813 60
A00351:291:HVMC5DSXX:4:2678:13639:36949 83 tarseq_1401 40168 27
A00351:291:HVMC5DSXX:4:2678:13747:36949 99 tarseq_84 92647 60
A00351:291:HVMC5DSXX:4:2678:14091:36949 99 tarseq_445 203003 60
A00351:291:HVMC5DSXX:4:2678:14561:36949 99 tarseq_512 71920 11
A00351:291:HVMC5DSXX:4:2678:14597:36949 83 tarseq_411 18587 60
A00351:291:HVMC5DSXX:4:2678:14868:36949 97 tarseq_2112 1850 10
A00351:291:HVMC5DSXX:4:2678:15157:36949 83 tarseq_613 158418 60

I ran Scaff10X with the following parameters:

#SBATCH -J hi_scf10
#SBATCH -e slurm/test_assembly.j%j.run_scaff10x.hi.err
#SBATCH -o slurm/test_assembly.j%j.run_scaff10x.hi.out
#SBATCH --nodes=1
#SBATCH --ntasks=16
#SBATCH --mem=480G
#SBATCH --time=06-10:08:07
#SBATCH -p bigmemh

and used the following command to run scaff10x:

${scaf_bin}/scaff10x \
    -nodes ${threads} \
    -align bwa \
    -matrix 2000 \
    -reads 12 \
    -link 10 \
    -plot barcode_length.png \
    ${asm} ${R1} ${R2} ${output}"

I didn't think this step much memory and indeed it fails at the same place if I use a partition with less memory (62000M/node).

Please advise! Thank you for your time :)

segfault

If i run without parameters, i get the usage as expected.

then if i run with parameters
scaff10x -nodes 30 funestus_redbean.purged.fa bamtofastq_S1_L000_R1_001.fastq.gz bamtofastq_S1_L000_R2_001.fastq.gz hybrid_redbean2.purged.scaff.fa

I get ./scaffme2.sh: line 1: 31890 Segmentation fault

I'm on farm4 in directory /lustre/scratch118/malaria/team222/hh5/projects/analysis/assembly/arabiensis
and created the fastq files from a longranger run vs a different assembly with 10x's bamtofastq which should put the reads in the same format as they were as input to longranger.

scaff10x still running after 22 days

Hi,

I am running scaff10x using the following command:

/opt/Scaff10X/src/scaff10x \
                -nodes 30 \
                -data input.dat \
                 scaffolds_FINAL.fasta scaffolds_FINAL_scaff10x.fasta"

Started running on August 31st and the last output is on August 31st.
This is the last output:

[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 2150538 reads in 1615.708 CPU sec, 58.529 real sec

This is my input.dat

q1=/Data/A673_10x_and_WGS/A673_10x/A673_10x_fastq/A673_combined_S2_L001_R1_002.fastq
q2=/Data/A673_10x_and_WGS/A673_10x/A673_10x_fastq/A673_combined_S2_L001_R2_002.fastq
q1=/Data/A673_10x_and_WGS/A673_10x/A673_10x_fastq/A673_combined_S2_L002_R1_002.fastq
q2=/Data/A673_10x_and_WGS/A673_10x/A673_10x_fastq/A673_combined_S2_L002_R2_002.fastq
q1=/Data/A673_10x_and_WGS/A673_10x/A673_10x_fastq/A673_combined_S2_L003_R1_002.fastq
q2=/Data/A673_10x_and_WGS/A673_10x/A673_10x_fastq/A673_combined_S2_L003_R2_002.fastq
q1=/Data/A673_10x_and_WGS/A673_10x/A673_10x_fastq/A673_combined_S2_L004_R1_002.fastq
q2=/Data/A673_10x_and_WGS/A673_10x/A673_10x_fastq/A673_combined_S2_L004_R2_002.fastq
q1=/Data/A673_10x_and_WGS/A673_10x/A673_10x_fastq/A673_combined_S4_L001_R1_001.fastq
q2=/Data/A673_10x_and_WGS/A673_10x/A673_10x_fastq/A673_combined_S4_L001_R2_001.fastq
q1=/Data/A673_10x_and_WGS/A673_10x/A673_10x_fastq/A673_combined_S4_L002_R1_001.fastq
q2=/Data/A673_10x_and_WGS/A673_10x/A673_10x_fastq/A673_combined_S4_L002_R2_001.fastq
q1=/Data/A673_10x_and_WGS/A673_10x/A673_10x_fastq/A673_combined_S4_L003_R1_001.fastq
q2=/Data/A673_10x_and_WGS/A673_10x/A673_10x_fastq/A673_combined_S4_L003_R2_001.fastq
q1=/Data/A673_10x_and_WGS/A673_10x/A673_10x_fastq/A673_combined_S4_L004_R1_001.fastq
q2=/Data/A673_10x_and_WGS/A673_10x/A673_10x_fastq/A673_combined_S4_L004_R2_001.fastq

I am wondering if there is anything I should done differently. Thank you in advance for your help.

No scaffolding being done on the Scaff10xV4

Hi,

I have been trying out scaffolding the pacbio contigs using scaff10x, using the latest version the version 4 uploaded last month.

I can see that the align* series files are empty, but the tool doesn't throw an error, nor does it do scaffolding. Please see the attached screenshot.
scaff10x

It seems during the update the tool names in the scaff-bin folder have been changed, but calling them in the main script seems to be done using the older names, creating the error. (I observed this for bwa).

This is the command I have been using:

~/Tools/Scaff10X/src/scaff10x -nodes 48 -longread 1 -gap 100 -matrix 2000 -reads 10 -score 20 -edge 50000 -link 8 -block 50000 final_MP_scaff.fa genome-BC2_1.fastq.gz genome-BC2_2.fastq.gz out.scaffolds.fa &

I'm also a bit unsure if this is because the genome is fragmented? Please see the stats below:

file final_MP_scaff.fa
num_seqs 20896
sum_len 896879817
min_len 2002
avg_len 42921.1
max_len 1276294
Q1 14596
Q2 21642
Q3 37521
sum_gap 1782762
N50 86627

Edit: On closer look, it seems that for some reason the bwa indices are not being created.

Recommended -blocks setting

Hi,

In the docs, you say:

The block value is very important. The default value of 2500 is very conservative and you may increase this value to say 5000 or 10000 to improve the length of scaffolds;

In your example command, you set: -block 50000 - bigger still.

Please, could you point me to a description of what this setting actually does, given that it is very important, and explain why the example is so much bigger than the recommendations?

Thanks!

Rich

Scaff10X error of path in scaff_reads

Hi all,

I tried to run scaff_reads from the 245d6e2 commit. I had an issue about scaff_BC-reads-1 and scaff_BC-reads-2 paths.

.../Scaff10X/src/scaff-bin/scaff-bin/scaff_BC-reads-1: no such file or directory
.../Scaff10X/src/scaff-bin/scaff-bin/scaff_BC-reads-2: no such file or directory

The problem is scaff-bin/scaff-bin/. I changed lines 327 and 333 of scaff_reads.c to have the good path in my environment. It works. But I do not know how to do a generic modification for all environments.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.