Hi @jason-weirather ,
Thank you for this awesome tool. I want to try it for our PacBio, Illumina, and ONT data. However, I keep on getting the error mentioned in the subject like regardless of my attempts. Can you please help me figure it out?
I used the following script:
module load anaconda2
cd /stornext/General/data/user_managed/grpu_mritchie_1/Shani/long_read_benchmark/alignqc/
source activate alignqc
module load samtools/1.7
REFERENCE="/stornext/General/data/user_managed/grpu_mritchie_1/Shani/atac-seq/20190529_MiRCL_ATAC/references/genome.fa"
mkdir "/stornext/General/data/user_managed/grpu_mritchie_1/Shani/long_read_benchmark/alignqc_output/ont"
OUT_DIR="/stornext/General/data/user_managed/grpu_mritchie_1/Shani/long_read_benchmark/alignqc_output/ont"
# full BAM files took forever - so trying on the subsample
IN_LOC_ONT="/stornext/General/data/user_managed/grpu_mritchie_1/XueyiDong/long_read_benchmark/ONT/bam_subsample"
find ${IN_LOC_ONT} -name '*.bam' -print0 | while IFS= read -r -d '' BAM
do
OUT_P=${BAM##*/};OUT_P=${OUT_P%%.sorted*};
echo " ######## --------- processing $BAM in $OUT_P -------- #########################";
seq-tools sort --bam ${BAM} -o ${BAM}.sorted.bam;
samtools index ${BAM}.sorted.bam;
mkdir ${OUT_DIR}/${OUT_P};
echo " ######## --------- results saved in $OUT_DIR/$OUT_P -------- #########################";
alignqc analyze ${BAM}.sorted.bam -g ${REFERENCE} --no_transcriptome --threads 8 --specific_tempdir ${OUT_DIR}/${OUT_P} -o ${OUT_DIR}/${OUT_P}/${OUT_P}.ont.alignqc.xhtml
done
The error I'm getting is as follows:
######## --------- processing /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample/bam_subsample/barcode05.sorted.bam in barcode05 -------- #########################
######## --------- results saved in /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode05 -------- #########################
Using Rscript version:
R scripting front-end version 3.6.1 (2019-07-05)
WARNING: No annotation specified. Will be unable to report feature specific outputs
Creating initial alignment mapping data
/stornext/HPCScratch/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/bam_preprocess.py /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample/bam_subsample/barcode05.sorted.bam --minimum_intron_size 68 -o /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode05/temp/alndata.txt.gz --threads 8 --specific_tempdir /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode05/temp/
read basics
6257000
check for best set
6250000/6257982
combining results
6257982
Traverse bam for alignment analysis
/stornext/HPCScratch/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/traverse_preprocessed.py /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode05/temp/alndata.txt.gz -o /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode05/data/ --specific_tempdir /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode05/temp/ --threads 8 --min_aligned_bases 50 --max_query_overlap 10 --max_target_overlap 10 --max_target_gap 500000 --required_fractional_improvement 0.2
6257982 alignments 3844424 reads
Writing chromosome lengths from header
/stornext/HPCScratch/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/bam_to_chr_lengths.py /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample/bam_subsample/barcode05.sorted.bam -o /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode05/data/chrlens.txt
Can we find any known read types
/stornext/HPCScratch/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/get_platform_report.py /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode05/data/lengths.txt.gz /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode05/data/special_report
Go through genepred best alignments and make a bed depth file
Generate the depth bed for the mapped reads
gpd_to_bed_depth.py /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode05/data/best.sorted.gpd.gz -o /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode05/data/depth.sorted.bed.gz --threads 8
Traceback (most recent call last):
File "/home/amarasinghe.s/.conda/envs/alignqc/bin/alignqc", line 11, in <module>
sys.exit(entry_point())
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/alignqc.py", line 47, in entry_point
main(args,operable_argv)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/alignqc.py", line 17, in main
analyze.external_cmd(operable_argv,version=version)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/analyze.py", line 88, in external_cmd
main(args)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/analyze.py", line 54, in main
prepare_all_data.external(args)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/prepare_all_data.py", line 844, in external
main(args)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/prepare_all_data.py", line 60, in main
make_data_bam(args)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/prepare_all_data.py", line 184, in make_data_bam
gpd_to_bed_depth(cmd)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/seqtools/cli/utilities/gpd_to_bed_depth.py", line 60, in external_cmd
main(args)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/seqtools/cli/utilities/gpd_to_bed_depth.py", line 27, in main
for covs in results:
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/multiprocessing/pool.py", line 271, in <genexpr>
return (item for chunk in result for item in chunk)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/multiprocessing/pool.py", line 673, in next
raise value
ValueError: Expected lines to be ordered but they appear not to be ordered on line 3362988
Then I used the seq-tools sort
option to get the files sorted first as you have mentioned in this issue. However, it still doesn't seem to solve the problem as seen form below email.
######## --------- processing /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample/bam_subsample/barcode01.sorted.bam in barcode01 -------- #########################
[bam_sort_core] merging from 0 files and 10 in-memory blocks...
######## --------- results saved in /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode01 -------- #########################
Using Rscript version:
R scripting front-end version 3.6.1 (2019-07-05)
WARNING: No annotation specified. Will be unable to report feature specific outputs
Creating initial alignment mapping data
/stornext/HPCScratch/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/bam_preprocess.py /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample/bam_subsample/barcode01.sorted.bam.sorted.bam --minimum_intron_size 68 -o /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode01/temp/alndata.txt.gz --threads 8 --specific_tempdir /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode01/temp/
read basics
5916000
check for best set
5910000/5916804
combining results
5916804
Traverse bam for alignment analysis
/stornext/HPCScratch/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/traverse_preprocessed.py /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode01/temp/alndata.txt.gz -o /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode01/data/ --specific_tempdir /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode01/temp/ --threads 8 --min_aligned_bases 50 --max_query_overlap 10 --max_target_overlap 10 --max_target_gap 500000 --required_fractional_improvement 0.2
5916804 alignments 3720827 reads
Writing chromosome lengths from header
/stornext/HPCScratch/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/bam_to_chr_lengths.py /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample/bam_subsample/barcode01.sorted.bam.sorted.bam -o /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode01/data/chrlens.txt
Can we find any known read types
/stornext/HPCScratch/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/get_platform_report.py /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode01/data/lengths.txt.gz /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode01/data/special_report
Go through genepred best alignments and make a bed depth file
Generate the depth bed for the mapped reads
gpd_to_bed_depth.py /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode01/data/best.sorted.gpd.gz -o /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode01/data/depth.sorted.bed.gz --threads 8
Traceback (most recent call last):
File "/home/amarasinghe.s/.conda/envs/alignqc/bin/alignqc", line 11, in <module>
sys.exit(entry_point())
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/alignqc.py", line 47, in entry_point
main(args,operable_argv)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/alignqc.py", line 17, in main
analyze.external_cmd(operable_argv,version=version)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/analyze.py", line 88, in external_cmd
main(args)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/analyze.py", line 54, in main
prepare_all_data.external(args)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/prepare_all_data.py", line 844, in external
main(args)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/prepare_all_data.py", line 60, in main
make_data_bam(args)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/prepare_all_data.py", line 184, in make_data_bam
gpd_to_bed_depth(cmd)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/seqtools/cli/utilities/gpd_to_bed_depth.py", line 60, in external_cmd
main(args)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/seqtools/cli/utilities/gpd_to_bed_depth.py", line 27, in main
for covs in results:
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/multiprocessing/pool.py", line 271, in <genexpr>
return (item for chunk in result for item in chunk)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/multiprocessing/pool.py", line 673, in next
raise value
ValueError: Expected lines to be ordered but they appear not to be ordered on line 3364652
I'm attaching the genome file and a small sample of the BAM file here:
barcode05.sorted.bam.first_10_lines.bam.gz
The header of this .bam
file is as follows:
@HD VN:1.5 SO:coordinate
@SQ SN:chr1 LN:248956422
@SQ SN:chr2 LN:242193529
@SQ SN:chr3 LN:198295559
@SQ SN:chr4 LN:190214555
@SQ SN:chr5 LN:181538259
@SQ SN:chr6 LN:170805979
@SQ SN:chr7 LN:159345973
@SQ SN:chr8 LN:145138636
@SQ SN:chr9 LN:138394717
@SQ SN:chr10 LN:133797422
@SQ SN:chr11 LN:135086622
@SQ SN:chr12 LN:133275309
@SQ SN:chr13 LN:114364328
@SQ SN:chr14 LN:107043718
@SQ SN:chr15 LN:101991189
@SQ SN:chr16 LN:90338345
@SQ SN:chr17 LN:83257441
@SQ SN:chr18 LN:80373285
@SQ SN:chr19 LN:58617616
@SQ SN:chr20 LN:64444167
@SQ SN:chr21 LN:46709983
@SQ SN:chr22 LN:50818468
@SQ SN:chrX LN:156040895
@SQ SN:chrY LN:57227415
@SQ SN:chrM LN:16569
@SQ SN:GL000008.2 LN:209709
@SQ SN:GL000009.2 LN:201709
@SQ SN:GL000194.1 LN:191469
@SQ SN:GL000195.1 LN:182896
@SQ SN:GL000205.2 LN:185591
@SQ SN:GL000208.1 LN:92689
@SQ SN:GL000213.1 LN:164239
@SQ SN:GL000214.1 LN:137718
@SQ SN:GL000216.2 LN:176608
@SQ SN:GL000218.1 LN:161147
@SQ SN:GL000219.1 LN:179198
@SQ SN:GL000220.1 LN:161802
@SQ SN:GL000221.1 LN:155397
@SQ SN:GL000224.1 LN:179693
@SQ SN:GL000225.1 LN:211173
@SQ SN:GL000226.1 LN:15008
@SQ SN:KI270302.1 LN:2274
@SQ SN:KI270303.1 LN:1942
@SQ SN:KI270304.1 LN:2165
@SQ SN:KI270305.1 LN:1472
@SQ SN:KI270310.1 LN:1201
@SQ SN:KI270311.1 LN:12399
@SQ SN:KI270312.1 LN:998
@SQ SN:KI270315.1 LN:2276
@SQ SN:KI270316.1 LN:1444
@SQ SN:KI270317.1 LN:37690
@SQ SN:KI270320.1 LN:4416
@SQ SN:KI270322.1 LN:21476
@SQ SN:KI270329.1 LN:1040
@SQ SN:KI270330.1 LN:1652
@SQ SN:KI270333.1 LN:2699
@SQ SN:KI270334.1 LN:1368
@SQ SN:KI270335.1 LN:1048
@SQ SN:KI270336.1 LN:1026
@SQ SN:KI270337.1 LN:1121
@SQ SN:KI270338.1 LN:1428
@SQ SN:KI270340.1 LN:1428
@SQ SN:KI270362.1 LN:3530
@SQ SN:KI270363.1 LN:1803
@SQ SN:KI270364.1 LN:2855
@SQ SN:KI270366.1 LN:8320
@SQ SN:KI270371.1 LN:2805
@SQ SN:KI270372.1 LN:1650
@SQ SN:KI270373.1 LN:1451
@SQ SN:KI270374.1 LN:2656
@SQ SN:KI270375.1 LN:2378
@SQ SN:KI270376.1 LN:1136
@SQ SN:KI270378.1 LN:1048
@SQ SN:KI270379.1 LN:1045
@SQ SN:KI270381.1 LN:1930
@SQ SN:KI270382.1 LN:4215
@SQ SN:KI270383.1 LN:1750
@SQ SN:KI270384.1 LN:1658
@SQ SN:KI270385.1 LN:990
@SQ SN:KI270386.1 LN:1788
@SQ SN:KI270387.1 LN:1537
@SQ SN:KI270388.1 LN:1216
@SQ SN:KI270389.1 LN:1298
@SQ SN:KI270390.1 LN:2387
@SQ SN:KI270391.1 LN:1484
@SQ SN:KI270392.1 LN:971
@SQ SN:KI270393.1 LN:1308
@SQ SN:KI270394.1 LN:970
@SQ SN:KI270395.1 LN:1143
@SQ SN:KI270396.1 LN:1880
@SQ SN:KI270411.1 LN:2646
@SQ SN:KI270412.1 LN:1179
@SQ SN:KI270414.1 LN:2489
@SQ SN:KI270417.1 LN:2043
@SQ SN:KI270418.1 LN:2145
@SQ SN:KI270419.1 LN:1029
@SQ SN:KI270420.1 LN:2321
@SQ SN:KI270422.1 LN:1445
@SQ SN:KI270423.1 LN:981
@SQ SN:KI270424.1 LN:2140
@SQ SN:KI270425.1 LN:1884
@SQ SN:KI270429.1 LN:1361
@SQ SN:KI270435.1 LN:92983
@SQ SN:KI270438.1 LN:112505
@SQ SN:KI270442.1 LN:392061
@SQ SN:KI270448.1 LN:7992
@SQ SN:KI270465.1 LN:1774
@SQ SN:KI270466.1 LN:1233
@SQ SN:KI270467.1 LN:3920
@SQ SN:KI270468.1 LN:4055
@SQ SN:KI270507.1 LN:5353
@SQ SN:KI270508.1 LN:1951
@SQ SN:KI270509.1 LN:2318
@SQ SN:KI270510.1 LN:2415
@SQ SN:KI270511.1 LN:8127
@SQ SN:KI270512.1 LN:22689
@SQ SN:KI270515.1 LN:6361
@SQ SN:KI270516.1 LN:1300
@SQ SN:KI270517.1 LN:3253
@SQ SN:KI270518.1 LN:2186
@SQ SN:KI270519.1 LN:138126
@SQ SN:KI270521.1 LN:7642
@SQ SN:KI270522.1 LN:5674
@SQ SN:KI270528.1 LN:2983
@SQ SN:KI270529.1 LN:1899
@SQ SN:KI270530.1 LN:2168
@SQ SN:KI270538.1 LN:91309
@SQ SN:KI270539.1 LN:993
@SQ SN:KI270544.1 LN:1202
@SQ SN:KI270548.1 LN:1599
@SQ SN:KI270579.1 LN:31033
@SQ SN:KI270580.1 LN:1553
@SQ SN:KI270581.1 LN:7046
@SQ SN:KI270582.1 LN:6504
@SQ SN:KI270583.1 LN:1400
@SQ SN:KI270584.1 LN:4513
@SQ SN:KI270587.1 LN:2969
@SQ SN:KI270588.1 LN:6158
@SQ SN:KI270589.1 LN:44474
@SQ SN:KI270590.1 LN:4685
@SQ SN:KI270591.1 LN:5796
@SQ SN:KI270593.1 LN:3041
@SQ SN:KI270706.1 LN:175055
@SQ SN:KI270707.1 LN:32032
@SQ SN:KI270708.1 LN:127682
@SQ SN:KI270709.1 LN:66860
@SQ SN:KI270710.1 LN:40176
@SQ SN:KI270711.1 LN:42210
@SQ SN:KI270712.1 LN:176043
@SQ SN:KI270713.1 LN:40745
@SQ SN:KI270714.1 LN:41717
@SQ SN:KI270715.1 LN:161471
@SQ SN:KI270716.1 LN:153799
@SQ SN:KI270717.1 LN:40062
@SQ SN:KI270718.1 LN:38054
@SQ SN:KI270719.1 LN:176845
@SQ SN:KI270720.1 LN:39050
@SQ SN:KI270721.1 LN:100316
@SQ SN:KI270722.1 LN:194050
@SQ SN:KI270723.1 LN:38115
@SQ SN:KI270724.1 LN:39555
@SQ SN:KI270725.1 LN:172810
@SQ SN:KI270726.1 LN:43739
@SQ SN:KI270727.1 LN:448248
@SQ SN:KI270728.1 LN:1872759
@SQ SN:KI270729.1 LN:280839
@SQ SN:KI270730.1 LN:112551
@SQ SN:KI270731.1 LN:150754
@SQ SN:KI270732.1 LN:41543
@SQ SN:KI270733.1 LN:179772
@SQ SN:KI270734.1 LN:165050
@SQ SN:KI270735.1 LN:42811
@SQ SN:KI270736.1 LN:181920
@SQ SN:KI270737.1 LN:103838
@SQ SN:KI270738.1 LN:99375
@SQ SN:KI270739.1 LN:73985
@SQ SN:KI270740.1 LN:37240
@SQ SN:KI270741.1 LN:157432
@SQ SN:KI270742.1 LN:186739
@SQ SN:KI270743.1 LN:210658
@SQ SN:KI270744.1 LN:168472
@SQ SN:KI270745.1 LN:41891
@SQ SN:KI270746.1 LN:66486
@SQ SN:KI270747.1 LN:198735
@SQ SN:KI270748.1 LN:93321
@SQ SN:KI270749.1 LN:158759
@SQ SN:KI270750.1 LN:148850
@SQ SN:KI270751.1 LN:150742
@SQ SN:KI270752.1 LN:27745
@SQ SN:KI270753.1 LN:62944
@SQ SN:KI270754.1 LN:40191
@SQ SN:KI270755.1 LN:36723
@SQ SN:KI270756.1 LN:79590
@SQ SN:KI270757.1 LN:71251
@SQ SN:chrIS LN:10567884
@PG ID:minimap2 PN:minimap2 VN:2.17-r974-dirty CL:minimap2 -ax splice -uf -k14 --junc-bed /wehisan/home/allstaff/d/dong.x/annotation/HumanSequins/gencode.v33.sequins.junction.bed /wehisan/home/allstaff/d/dong.x/annotation/HumanSequins/GrCh38_sequins.fa /stornext/General/data/user_managed/grpu_mritchie_1/XueyiDong/long_read_benchmark/subsample/ONT010/barcode05.fq.gz
Also, I'm attaching the full bam file and a zip file of whatever I got as an output from running the script here:
https://drive.google.com/drive/folders/1HtuIZWOSCh-7PpmxLJyZNy37b8Uo9N6z?usp=sharing
Your help would be really appreciated to figure out what is going on...
Many thanks,
Shani