GithubHelp home page GithubHelp logo

zyndagj / bsmapz Goto Github PK

View Code? Open in Web Editor NEW
18.0 18.0 6.0 1.14 MB

Updated and optimized fork of BSMAP

License: Other

C++ 13.76% Python 4.56% Makefile 1.34% C 69.19% HTML 0.50% Shell 0.08% TeX 0.37% Perl 7.26% Java 0.60% Roff 2.34%

bsmapz's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

bsmapz's Issues

Float point error in methratio.py

Hi,

I tried running methratio.py using the following command:
methratio.py -o CN_2_N_methratio.txt -d hg19.fa --pair -z -m 5 CN_2_N.bam

I got the following type error :
[methratio] @Wed Oct 21 16:13:32 2020 Using 90% of available memory (11870 MB) as limit
[methratio] @Wed Oct 21 16:13:32 2020 Presorting inputs
[methratio] @Wed Oct 21 16:13:32 2020 Processing 5 chromosomes at a time
Traceback (most recent call last):
File "/usr/bin/methratio.py", line 565, in
main()
File "/usr/bin/methratio.py", line 125, in main
chromPool = ChromPool(maxChromProcs)
File "/MGMSTAR1/SHARED/ANALYSIS/APPS/external/anoconda/lib/python3.7/multiprocessing/pool.py", line 176, in init
self._repopulate_pool()
File "/MGMSTAR1/SHARED/ANALYSIS/APPS/external/anoconda/lib/python3.7/multiprocessing/pool.py", line 231, in _repopulate_pool
for i in range(self._processes - len(self._pool)):
TypeError: 'float' object cannot be interpreted as an integer

Could you please help me with this.
All dependencies like samtools and python modules are fulfilled.

bsmap bigwig format

Can bsmap produce a bigwig file including positive (F-strand) and negative (R-strand) of methylation ratio?

Thanks,

"Illegal instruction" message when I launch bsmapz

Hi ,I'm trying to install bsmapz thorugh conda on the cluster where I'm used to work but when I launch it I get the following message "Illegal instruction".. I can't install in other ways.
Can you help me?
Thanks
Francesca

methratio exception

I recently updated to bsmapz (from an older version of bsmap) and consistently been getting the following error:

methratio] @TUE Jul 14 15:37:26 2020 chr3 used 127582 reads
[methratio] @TUE Jul 14 15:37:26 2020 combining CpG methylation from both strands of chr3...
Traceback (most recent call last):
File "/usr/local/bioinfo/bsmapz/1.1.2/bin/methratio.py", line 585, in
main()
File "/usr/local/bioinfo/bsmapz/1.1.2/bin/methratio.py", line 138, in main
ret = chromPool.map(chromWorker, argList, chunksize=1)
File "/usr/local/bioinfo/bioconda/envs/samtools/lib/python2.7/multiprocessing/pool.py", line 253, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/local/bioinfo/bioconda/envs/samtools/lib/python2.7/multiprocessing/pool.py", line 572, in get
raise self._value
UnboundLocalError: local variable 'depth1' referenced before assignment

This occurs after all the chromosomes have been read.

The command line I am running is :
methratio.py -g -z -x CG -i no-action -I -p -u -d mm10/Sequence/WholeGenomeFasta/genome.fa Run3_C4_MyoD_B_2.bam

methratio.py error

Hi,
I am getting the following error with methratio.py,

" largestChromSize = max(chromDict.values())
ValueError: max() arg is an empty sequence"

The code I am using is as follows,

"python methratio.py --chr=chr5 --ref=Cavia_porcellus.cavPor3.dna_sm.toplevel.fa --out=methratio.txt SRR6519305_out.sam"

Any help would be highly appreciated!

best wishes,
Pashu

ZeroDivisionError

Hi @zyndagj
Thank you for supplying with fantastic tool for methylation analysis.

I'm analyzing the methylation ratio of arabidopsis, and I tried to check the moviment of methratio.py.
However, I received the ZeroDivisionError as below.
What should I do to solve this problem?

GCF_000001735.4_TAIR10.1_genomic.fna: genome of arabidopsis download from NCBI (tair10.1)
Col-0_sorted_marked.bam: mapped and sorted bam file generated by BSMAP.
environment: python2.7, ubuntu20.04 (WSL2)

↓copyed from terminal

methratio.py --ref GCF_000001735.4_TAIR10.1_genomic.fna --out Col0_methratio.txt Col-0_sorted_marked.bam
[methratio] @TUE May 18 21:19:28 2021 loading reference file: GCF_000001735.4_TAIR10.1_genomic.fna ...
[main_samview] incorrect number of arguments for -X option. Aborting.
[methratio] @TUE May 18 21:19:35 2021 read 0 lines
[methratio] @TUE May 18 21:19:35 2021 writing Col0_methratio.txt ...
Traceback (most recent call last):
File "/anaconda3/envs/py27/bin/methratio.py", line 257, in
disp('total %d valid mappings, %d covered cytosines, average coverage: %.2f fold.' % (nmap, nc, float(nd)/nc))
ZeroDivisionError: float division by zero

Segmentation fault - core dumped

Hi,

I am using BSMAPz in RRBS mode and the reference am using has a large number of contigs (368,060 contigs; https://www.ncbi.nlm.nih.gov/assembly/GCF_000233375.1/). I think due to this I am getting segmentation fault. I tried to align the reads increasing the memory, but still I get segmentation fault-core dumped. It would be great if you can help me to resolve this issue.

Thank you,
Geetha

methratio.py 'S not a valid CIGAR character'

When I run this script on my bam file I got ValueError and script ends with the comment:
'S not a valid CIGAR character'. The script only operates on M|I|D characters and crashes on any other however S|H|X are also allowed by sam file format.
https://drive5.com/usearch/manual/cigar.html
I did some prints inside and my seq and cigar looks like below e.g.
seq: ATCCCAACAACACTCCAACCTCAACATAAACCAACCCCAACATAAACCAACCCCAACATAAACCTACCTCAACAT
cigar: 43M32S

pair orientation

Hi,

I am using this tool to test some data, and I see that there you allow inward and outward pair orientation. Any chance to have an option to filter that?

thanks!

methdiff.py error

Hello!
I am learning the bisulfite pipeline, but when I get into methdiff.py it retrieves an index error: array index out of range
This is the code:

python2 /home/alba/SOFTWARE/BSMAPz/methdiff.py -d Mus_musculus.GRCm39.dna.primary_assembly.fa -p 0.05 -b 10 -l WT,KO WT_1L,WT_2L,WT_3L KO_1L,KO_2L,KO_3L -o OUTPUT.txt

It seems to work but ends up with the error and the output is a txt with the headers but empty. This is what it retrieves:

[methdiff] @thu Dec 2 14:21:09 2021 reading reference file: Mus_musculus.GRCm39.dna.primary_assembly.fa ...
[methdiff] @thu Dec 2 14:21:22 2021 processing 1 dna:chromosome chromosome:GRCm39:1:1:195154279:1 REF ...
[methdiff] @thu Dec 2 14:21:29 2021 processing 10 dna:chromosome chromosome:GRCm39:10:1:130530862:1 REF ...
[methdiff] @thu Dec 2 14:21:34 2021 processing 11 dna:chromosome chromosome:GRCm39:11:1:121973369:1 REF ...
[methdiff] @thu Dec 2 14:21:38 2021 processing 12 dna:chromosome chromosome:GRCm39:12:1:120092757:1 REF ...
[methdiff] @thu Dec 2 14:21:43 2021 processing 13 dna:chromosome chromosome:GRCm39:13:1:120883175:1 REF ...
[methdiff] @thu Dec 2 14:21:47 2021 processing 14 dna:chromosome chromosome:GRCm39:14:1:125139656:1 REF ...
[methdiff] @thu Dec 2 14:21:52 2021 processing 15 dna:chromosome chromosome:GRCm39:15:1:104073951:1 REF ...
[methdiff] @thu Dec 2 14:21:56 2021 processing 16 dna:chromosome chromosome:GRCm39:16:1:98008968:1 REF ...
[methdiff] @thu Dec 2 14:22:00 2021 processing 17 dna:chromosome chromosome:GRCm39:17:1:95294699:1 REF ...
[methdiff] @thu Dec 2 14:22:03 2021 processing 18 dna:chromosome chromosome:GRCm39:18:1:90720763:1 REF ...
[methdiff] @thu Dec 2 14:22:07 2021 processing 19 dna:chromosome chromosome:GRCm39:19:1:61420004:1 REF ...
[methdiff] @thu Dec 2 14:22:09 2021 processing 2 dna:chromosome chromosome:GRCm39:2:1:181755017:1 REF ...
[methdiff] @thu Dec 2 14:22:16 2021 processing 3 dna:chromosome chromosome:GRCm39:3:1:159745316:1 REF ...
[methdiff] @thu Dec 2 14:22:22 2021 processing 4 dna:chromosome chromosome:GRCm39:4:1:156860686:1 REF ...
[methdiff] @thu Dec 2 14:22:28 2021 processing 5 dna:chromosome chromosome:GRCm39:5:1:151758149:1 REF ...
[methdiff] @thu Dec 2 14:22:34 2021 processing 6 dna:chromosome chromosome:GRCm39:6:1:149588044:1 REF ...
[methdiff] @thu Dec 2 14:22:39 2021 processing 7 dna:chromosome chromosome:GRCm39:7:1:144995196:1 REF ...
[methdiff] @thu Dec 2 14:22:45 2021 processing 8 dna:chromosome chromosome:GRCm39:8:1:130127694:1 REF ...
[methdiff] @thu Dec 2 14:22:50 2021 processing 9 dna:chromosome chromosome:GRCm39:9:1:124359700:1 REF ...
Traceback (most recent call last):
File "/home/alba/SOFTWARE/BSMAPz/methdiff.py", line 135, in
cmp_chrom(cr)
File "/home/alba/SOFTWARE/BSMAPz/methdiff.py", line 105, in cmp_chrom
m[i] = meth[i][pos]
IndexError: array index out of range

I would really appreciate if you could help me with this issue. Thank you in advance,

Alba

Significantly different in number of covered cytosines between bsmap 2.90 and BSMAPz

Hi @zyndagj ,

Excellent work and really helpful tool!

Recently I met an issue on the number of covered cytosines generated by bsmap and BSMAPz, on Arabdopsis EM-seq data. Belows are my commands:

# ========== bsmap v2.90
bsmap -a arab.Flower-4-50ng-18PCR-EM.SRR11906626_trimmed.fq \
  -d GCF_000001735.4_TAIR10.1_genomic.fa \
  -o arab.Flower-4-50ng-18PCR-EM.SRR11906626_trimmed.bsmap.bam \
  -v 2 -w 1 -p 20
python2 ~/tools/bsmap-2.90/methratio.py \
  -o arab.Flower-4-50ng-18PCR-EM.SRR11906626_trimmed.bsmap.freq.txt \
  -d GCF_000001735.4_TAIR10.1_genomic.fa -r -u \
  -s ~/tools/bsmap-2.90/samtools \
  arab.Flower-4-50ng-18PCR-EM.SRR11906626_trimmed.bsmap.bam

# =========== BSMAPz
bsmapz -a arab.Flower-4-50ng-18PCR-EM.SRR11906626_trimmed.fq \
  -d GCF_000001735.4_TAIR10.1_genomic.fa \
  -o arab.Flower-4-50ng-18PCR-EM.SRR11906626_trimmed.bsmapz.bam \
  -v 2 -w 1 -p 20
python2 ~/tools/BSMAPz/methratio.py \
  -o arab.Flower-4-50ng-18PCR-EM.SRR11906626_trimmed.bsmapz.freq.txt \
  -d GCF_000001735.4_TAIR10.1_genomic.fa -u -r -N 20 \ 
  arab.Flower-4-50ng-18PCR-EM.SRR11906626_trimmed.bsmapz.bam

The results told me that, though the .bam files of the two tools are almost the same in size, methratio.py of bsmap generated 43067903 covered cytosines, whereas methratio.py of BSMAPz generated 36637231 covered cytosines.

So do you happen to know what caused this big difference? FWIW, the number 43067903 covered 99% cytosines in Arabidopsis genome. This result is consistent with the EM-seq paper, but it looks way too high to me.

Best,
Peng

installation problem

hello im following the installation instructions
already clone de git
did the make bsmapz
and when running the test i got the following:
(BIT) pocho@DESKTOP-8JA094T:~/git-repos/BSMAPz$ sudo make test
OK - converted test_data/test_paired.sam to sorted and indexed BAM
Makefile:70: recipe for target 'test_data/test_paired.sam.bam.mr' failed
make: *** [test_data/test_paired.sam.bam.mr] Error 1

i have samtools installed:
(BIT) pocho@DESKTOP-8JA094T:/git-repos/BSMAPz$ which samtools
/home/pocho/anaconda3/envs/BIT/bin/samtools
(BIT) pocho@DESKTOP-8JA094T:
/git-repos/BSMAPz$ samtools --version
samtools 1.9
Using htslib 1.9
Copyright (C) 2018 Genome Research Ltd.

any idea what to do about this? thanks in advance

Segmentation Fault when Loading Reference

Hello,

I am trying to run BSMAPz with some watermelon data. The reference genome that I am using can be found here:

ftp://cucurbitgenomics.org/pub/cucurbit/genome/watermelon/WCG/v2/

My strong preference is to use the chromosome FASTA file as my reference (WCG_genome_v2.fa, from the link above). However, using this file leads to a segmentation error. If I use the scaffold reference (WCG_scaffold_v2.fa) though, my command runs just fine. My first guess was that this was caused by a memory issue (i.e., in the first case, the program tries to load the entire chromosome at once and it cannot allocate that much memory, which is not a problem with the scaffolds due to their smaller size). However, I find this hard to believe since am using an instance with 768GB of memory, and the problem persists even if I run it with one core.

Any insights into this would be greatly appreciated (command and error below).

Thank you very much!
Angels

Command:
bsmapz -a ./data/SRR6328781_1.fastq -b ./data/SRR6328781_2.fastq -d ./data/WCG_genome_v2.fa -o SRR6328781.bam -p 8 -A AGATCGGAAGAGC -w 100 -r 0 -q 10

Error:
loading reference file: ./data/WCG_genome_v2.fa (format: FASTA)
Segmentation fault (core dumped)

SAM2BAM conversion not sucessful

Hi!

The error pop in while using sam2bam.sh script:

"./BSMAPz/sam2bam.sh BSMAP_Output/epiRIL-368_4C_Rep1_L2.sam
Converting SAM to BAM ...
[W::sam_read1] Parse error at line 11
[main_samview] truncated file.
SAM2BAM conversion not sucessful.
BSMAP_Output/epiRIL-368_4C_Rep1_L2.sam remains unchanged."

Any help is much appriciated

Thank you!
Best Regards

SC

OverflowError in methratio.py

Hi @zyndagj ,
I tried to run command below, but it didn't work. (The same error seems to be occrured before.)
Bam file is a data from arabidopsis and BSMAPz's version is 1.1.3.


Things I've tried:

  • Different sample (some samples worked, but the others didn't.)
  • Tried with BSMAPz 1.1.2. (didn't work)

command
methratio.py --chr=chr2 -d chr2_arab.fasta -o chr2CYS1meth.txt CYS1_sorted_marked.bam


result
[methratio] @Wed Jun 9 15:06:12 2021 Using 90% of available memory (3859 MB) as limit
[methratio] @Wed Jun 9 15:06:12 2021 Presorting inputs
[methratio] @Wed Jun 9 15:06:12 2021 CYS1_sorted_marked.bam is already sorted
[methratio] @Wed Jun 9 15:06:12 2021 Processing 1 chromosomes at a time
[methratio] @Wed Jun 9 15:06:12 2021 Reading chr2 from CYS1_sorted_marked.bam with samtools

Traceback (most recent call last):
File "/home/kent/anaconda3/envs/py27/bin/methratio.py", line 585, in
main()
File "/home/kent/anaconda3/envs/py27/bin/methratio.py", line 143, in main
ret = map(chromWorker, argList)
File "/home/kent/anaconda3/envs/py27/bin/methratio.py", line 301, in chromWorker
searchFunc(refseq, seq, depth, meth, convert, match, pos)
File "/home/kent/anaconda3/envs/py27/bin/methratio.py", line 543, in searchFunc
depth[ip] += 1
OverflowError: unsigned short is greater than maximum

methratio.py: samtools view: invalid option -- 'X'

Hi,
a new issue came up to me when I was trying to lounch methratio.py function which worked perfectly about a month ago.

My code for samtools v.1.9 and bsmap v.2.90:
(bioinfo) root@cloud61b:/home/Illumina# methratio.py -d /Reference/hg19.fa -m 1 -z -i skip -o SAMPLE.methylation_results.txt SAMPLE.bam
[methratio] @Tue Dec 11 10:17:21 2018 loading reference file: /Reference/hg19.fa ...
samtools view: invalid option -- 'X'

At first I thought there has been a change in new version of samtools but I tried downgraded to 1.8 and 1.7 and the error is still view: invalid option -- 'X'.

Do you have any idea what's wrong and what shloud I do to fix this?

Thanks!
Sarka

methratio.py and Identification of methylated cytosine sites

Hi!
I have got the methylation ratios file by methratio.py ,and I want to know : do I still need to use binomial test to identify methylation sites ?
And ,can the CI_lower、CI_upper be used to identify methylation sites?

I saw this in "Single-base resolution methylomes of tomato fruit development reveal epigenome modifications associated with ripening"

Identification of methylated cytosine sites.
The binomial test was performed for each cytosine base in the tomato genome to check whether the cytosine site can be called a methylated cytosine site. Three types of potential errors are present in the BS-Seq data set, namely incomplete bisulfite conversion, nucleotide polymorphisms and sequencing errors. Bisulfite conversion rate was derived from the alignments to the tomato plastid genome. Nucleotide polymorphisms and sequencing errors were calculated as the frequency of mismatched bases in the uniquely aligned reads. For each cytosine site, binomial probability was calculated as B(x ≥ k; n, p), where n is the total number of reads covering the site (read depth), k is the total number of the methylated cytosines identified at the site and p is the error rate. The resultant binomial probability can be interpreted as the probability of at least k methylated cytosines obtained from n trials (read depth) owing to the errors in which the error rate is p. Binomial probability values were then adjusted for multiple tests using FDR38. Cytosine sites with FDR <0.01 were defined as methylated cytosine sites.

Questions about the bam output

Hi,
Recently I've tried using bsmap to align bisulfite sequencing data, I found something I can't understand when looking at the output bam file:

ST-E00206:158:HVLMYCCXX:7:121	161	# chrM	156	255	106M	# chr18	29124951	0	ATTTATCACACCTACATTCAATATTACAAACAAACATACCTACTAAAATATATTAATTAATTAATACTTATAAAACATAATAATAACAATTAAATATCTACACAAC	AA<FFAAAFKKFAFKKFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKAKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK	NM:i:0	ZS:Z:--
ST-E00206:158:HVLMYCCXX:7:1211	129	# chrM	156	255	144M	# chr1	161734142	0	ATTTATCACAGCTACAGTCAATGTTACGAACAGACATACTTACTAAAATATATTAATTAATTAATACTTATGAAACATAATAATAACAATTAAATATCTACACAACCACTTTCCACACAAACATCATAACAAAAAATTTCCACC	A<FFFFF,FAFA,AFF,F<F<K<FKK,,F,,F,,,A<,,,A<,7,7AA7FAA,FA,FK,AF7,7,,7,F,F,,F,AF<FKKKAA<FKKFAFFFAF,FAAAFKKAFFKKKKFKFFAKFFKF7F7AAFFKKKKKFKKKKKF7AAFA	NM:i:7	ZS:Z:--
ST-E00206:158:HVLMYCCXX:7:1211	129	# chrM	156	255	144M	# chr18	1125144	0	AGTTATCACACCTACATTCAATATTACAAACAAACATACTTACTAAAATATATTAATTAATTAATACTTATAAAACATAATAATAACAATTAAATATCTACACAACCACTTTCCACACAAACATCATAACAAAACATTTCCACC	A<AFAA,F,,FAAFKKKF<<,FFF<A,<<FAAFFFFKFFKK<,FFFKKKKKFKKKKKKKKKKKKKFKFK7AKA7AFKFFFKFKKKKKKKKKFAKKKKFAKAFKKKFFKFFFKKAKFAFFKFKKKKKKFKKKKFK,FKKKFFAAA	NM:i:3	ZS:Z:--

Above is from the bam file cut, could you please explain why in the same line of bam, different chromosomes appear ? In my understanding, read1 and read2 should be on the same chromosome, is that right ?
Thank you so much !

Methratio.py does not handle files

I am using Python2 (2.7.5) on my UCSF cluster and BSMAPz-1.1.3 as well. I get some other assertion errors - something about repopulation pool, when I use Python3 (3.8.10).
I am getting the following error
[methratio] @thu May 5 11:40:26 2022 Using 90% of available memory (406157 MB) as limit
[methratio] @thu May 5 11:40:26 2022 Presorting inputs
methratio.py does not handle files

Since, it is a cluster, I am using following code to execute:
"""
#!/bin/bash
REF="/wynton/scratch/agala/PASC/Homo_sapiens.GRCh38.dna.primary_assembly.fa"
INgzbam="${!SGE_TASK_ID}"
OUTbase="${INgzbam/.bam/}.txt"

. /etc/bashrc
module load CBI
module load Sali
module load samtools/1.9
module load anaconda
export PATH="/wynton/home/pillailab/agala/.local/bin/BSMAPz-1.1.3:$PATH"
export PATH="/wynton/home/pillailab/agala/.local/bin/BSMAPz-1.1.3/samtools:$PATH"

python2 /wynton/home/pillailab/agala/.local/bin/BSMAPz-1.1.3/methratio.py
-o "${OUTbase}" -d "${REF}" -g -r "${Ingzbam}"
"""

Can someone please help me with this.

When I run the script I get the error CMD: R --vanilla

I always get an error when I use scripts to process data with differential expressions
CMD: R --vanilla -q < genes1.counts.matrix.F2n_vs_M2n.DESeq2.Rscript
sh: R: command not found
Error, cmd: R --vanilla -q < genes1.counts.matrix.F2n_vs_M2n.DESeq2.Rscript died with ret (32512) at /public/home/sundf/miniconda3/opt/trinity-2.1.1/Analysis/DifferentialExpression/run_DE_analysis.pl line 712.

query regarding methylation level from methylratio output file.

Hi! Dr. Zynda,

I hope you are doing well.

This is not regarding any issue, it just a query regarding the BSMAPz output file. The file generated from methylratio.py contains 12 columns.
I want to plot the methylation level around various genomic features. My question is how I can extract the methylation level from this methylation output file.

I will appreciate it if I get any help regarding this.

Thank you
Best regards

Saurabh

bsmapz -h returns exit-code 1

Help should return exit-code 0, since everything went as expected - printing help.
This might be opinionated request, but could help SW maintainers to test binary call (eg. if on PATH etc.) by having some command (help?) that returns 0.

Feel free to close this issue, if you have another opinion.

methratio.py memory issue

I am running it in slurm with : srun --nodes=1 --cpus-per-task=20 --ntasks-per-node=1 --mem=30000M

methratio.py -I --np 10 --mem 5000 -o SampleName_methratio.txt -d $genome_fasta bam.bam > SampleName.bam_mr.log

but I am getting the error : Only 3158 MB available, not 5000

As i am blocking 30gb memory I am not expecting this memory issue. Please help with this.

buffer overflow detected***

Hi!
I am trying to BSMAP using following command line:

"BSMAPz/bsmapz -a /media/waqas/30004AFD3CB35412/All_Data/WGBS-Seq/epiRIL-368/Raw/epiRIL-368_QC_Reads_WGBS/Epiril368_4_C_R1_L001_R1_val_1.fq.gz -b /media/waqas/30004AFD3CB35412/All_Data/WGBS-Seq/epiRIL-368/Raw/epiRIL-368_QC_Reads_WGBS/Epiril368_4_C_R1_L001_R2_val_2.fq.gz -d /media/waqas/Chaudhary/Bsmap_Output/Ath_ChrAll.fa -o BSMAP_Output/epiRIL-368_Rep1_L1.bam -p 6"

The error pop in as:

"[bsmapz] @sat Apr 11 20:31:30 2020 loading reference file: /media/waqas/Chaudhary/Bsmap_Output/Ath_ChrAll.fa (format: FASTA)
[bsmapz] @sat Apr 11 20:31:30 2020 7 reference seqs loaded, total size 119667750 bp. 0 secs passed
[bsmapz] @sat Apr 11 20:31:33 2020 create seed table. 3 secs passed
[bsmapz] @sat Apr 11 20:31:33 2020 Pair-end alignment(6 threads),
Input read file #1: /media/waqas/30004AFD3CB35412/All_Data/WGBS-Seq/epiRIL-368/Raw/epiRIL-368_QC_Reads_WGBS/Epiril368_4_C_R1_L001_R1_val_1.fq.gz (format: gzipped FASTQ)
Input read file #2: /media/waqas/30004AFD3CB35412/All_Data/WGBS-Seq/epiRIL-368/Raw/epiRIL-368_QC_Reads_WGBS/Epiril368_4_C_R1_L001_R2_val_2.fq.gz (format: gzipped FASTQ)
Output file: BSMAP_Output/epiRIL-368_Rep1_L1.bam*** buffer overflow detected ***: BSMAPz/bsmapz terminated
Aborted (core dumped)"

I need help to troubleshoot this error.
Any help is much appreciated.

Thanks

SC

ZeroDivisionError: float division by zero

Hello,

I am trying to run the methratio.py script with the following code:

#PBS -N Methratio_G1SI199
#PBS -l select=1:ncpus=12:mem=62gb:interconnect=fdr,walltime=24:00:00
#PBS -m abe
#PBS -j oe

cd $PBS_O_WORKDIR
python ~/miniconda2/envs/methratio/bin/methratio.py -o G1SI199_methratio.txt -d /scratch1/psubba/GCF_008822105.2_bTaeGut2.pat.W.v2_genomic.fna -z -x CG /home/psubba/BSMAP_output_BAM/G1SI199_interval.bam

OUTPUT/ERROR:

[methratio] @Wed May 12 09:05:20 2021 loading reference file: /scratch1/psubba/GCF_008822105.2_bTaeGut2.pat.W.v2_genomic.fna ...
[main_samview] incorrect number of arguments for -X option. Aborting.
[methratio] @Wed May 12 09:05:49 2021 read 0 lines
[methratio] @Wed May 12 09:05:49 2021 writing G1SI199_methratio.txt ...
Traceback (most recent call last):
File "/home/psubba/miniconda2/envs/methratio/bin/methratio.py", line 257, in
disp('total %d valid mappings, %d covered cytosines, average coverage: %.2f fold.' % (nmap, nc, float(nd)/nc))
ZeroDivisionError: float division by zero

I am not sure how to fix this error because I have successfully used the same script in the past and it has worked. Please let me know if I am doing something wrong. Please also let me know if I can provide you with any other information. Thank you in advance!

"AssertionError: String length does not match CIGAR" when running methratio.py

Hi, I got this error when I run methratio.py to parse bam file:

[methratio] @mon Oct 10 11:52:28 2022 Using 90% of available memory (866 MB) as limit
[methratio] @mon Oct 10 11:52:28 2022 Presorting inputs
[methratio] @mon Oct 10 11:52:29 2022 Calling samtools sort on SRR800059.uniq.bam and using 36 MB of memory
[bam_sort_core] merging from 792 files and 24 in-memory blocks...
[methratio] @mon Oct 10 12:04:41 2022 Processing 1 chromosomes at a time
[methratio] @mon Oct 10 12:04:43 2022 Reading CM002885.2 from SRR800059.uniq.tmpSrt.bam with samtools
Traceback (most recent call last):
File "methratio.py", line 588, in
main()
File "methratio.py", line 143, in main
ret = map(chromWorker, argList)
File "methratio.py", line 293, in chromWorker
options.rm_dup, options.trim_fillin, coverage, chromSize)
File "methratio.py", line 467, in get_sam_alignment
seq = parseCigar(seq, cigar)
File "methratio.py", line 457, in parseCigar
assert originalLen == index, "String length does not match CIGAR"
AssertionError: String length does not match CIGAR


So I checked the script and I find this function (line 432 - 458):

def parseCigar(seq, cigar):
'''
Delete insertions and dash "-" deletions
>>> parseCigar('ACTAGAATGGCT','3M1I3M1D5M')
'ACTGAA-TGGCT'
>>> parseCigar('ACTG','2M')
Traceback (most recent call last):
...
AssertionError: String length does not match CIGAR
'''
index = 0 # index in seq
originalLen = len(seq)
cigarMatch = cigarRE.findall(cigar)
for align in cigarMatch:
length = int(align[:-1])
op = align[-1]
if op == 'M':
index += length
elif op == 'I':
seq = seq[:index]+seq[index+length:]
elif op == 'D':
seq = seq[:index]+'-'*length+seq[index:]
index += length
else:
raise ValueError("%c not a valid CIGAR character"%(op))
assert originalLen == index, "String length does not match CIGAR"
return seq

Where I felt a little strange about "assert originalLen == index". In my understanding, "index" == "M" number + "D" number, rather than "M" number + "I" number (i.e. "originalLen"). So, "originalLen" won't equal to "index" unless the "I" number equal to the "D" number.
Is this a BUG, or have I misunderstood?

methratio.py AssertionError

Hi,
I'm getting an error when running methratio.py.
Code: methratio.py -o methratio.txt -d Genome.fna -r bsmap.bam
Output:
[methratio] @sat Apr 9 20:12:26 2022 Using 90% of available memory (11165 MB) as limit
[methratio] @sat Apr 9 20:12:26 2022 Presorting inputs
[methratio] @sat Apr 9 20:12:26 2022 Processing 8 chromosomes at a time
Traceback (most recent call last):
File "/usr/local/bin/methratio.py", line 590, in
main()
File "/usr/local/bin/methratio.py", line 136, in main
chromPool = ChromPool(maxChromProcs)
File "/usr/lib/python3.8/multiprocessing/pool.py", line 212, in init
self._repopulate_pool()
File "/usr/lib/python3.8/multiprocessing/pool.py", line 303, in _repopulate_pool
return self._repopulate_pool_static(self._ctx, self.Process,
File "/usr/lib/python3.8/multiprocessing/pool.py", line 319, in _repopulate_pool_static
w = Process(ctx, target=worker,
File "/usr/lib/python3.8/multiprocessing/process.py", line 82, in init
assert group is None, 'group argument must be None for now'
AssertionError: group argument must be None for now

Any idea what is wrong? Thanks.

methratio.py OverflowError

I get this error when running methratio.py, samples was mapped to GRCh37. It seems that the error usually occurs when chr2 is being read.

methratio.py -d GRCh37.fa -o output.txt -x CG -N 18 sample.bam

Things I've tried:

  • Different sample (doesn't work)
  • Removing all tmp files before running methratio.py (doesn't work)
  • Using the python 3 branch in this repository (no error but it just stops doing anything after 2 minutes)
  • Setting -N to 1 and setting -N to 25 (doesn't work)
  • BSMAPz has 76 GB memory available. I tried limiting this to 25GB with -M (doesn't work)
[methratio] @Thu May 14 14:22:53 2020   Using 90% of available memory (38621 MB) as limit
[methratio] @Thu May 14 14:22:53 2020   Presorting inputs
[methratio] @Thu May 14 14:22:53 2020   Calling samtools sort on mapped_reads/GRCh37.CFD1900266-NBL_R1_trimmed_BSMAP.bam and using 38621 MB of memory
[methratio] @Thu May 14 14:23:06 2020   Processing 1 chromosomes at a time
[part of log cut]
[methratio] @Thu May 14 14:27:48 2020   Reading 2 from mapped_reads/GRCh37.CFD1900266-NBL_R1_trimmed_BSMAP.tmpSrt.bam with samtools
Traceback (most recent call last):
  File "/apps/gent/CO7/skylake-ib/software/BSMAPz/1.1.1-intel-2019b-Python-2.7.16/bin/methratio.py", line 571, in <module>
    main()
  File "/apps/gent/CO7/skylake-ib/software/BSMAPz/1.1.1-intel-2019b-Python-2.7.16/bin/methratio.py", line 138, in main
    ret = map(chromWorker, argList)
  File "/apps/gent/CO7/skylake-ib/software/BSMAPz/1.1.1-intel-2019b-Python-2.7.16/bin/methratio.py", line 287, in chromWorker
    searchFunc(refseq, seq, depth, meth, convert, match, pos)
  File "/apps/gent/CO7/skylake-ib/software/BSMAPz/1.1.1-intel-2019b-Python-2.7.16/bin/methratio.py", line 529, in searchFunc
    depth[ip] += 1
OverflowError: unsigned short is greater than maximum

methratio.py -t end repair option does not work

Hello,

am trying to extract the methylated cytosines. The program works perfectly but if I want to use the end repair option (-t or "--trim-fillin") it fails. This is the output:

Traceback (most recent call last):
  File "/data/SHARED_SOFTWARE/anaconda3/envs/scMeth/bin/methratio.py", line 571, in <module>
    main()
  File "/data/SHARED_SOFTWARE/anaconda3/envs/scMeth/bin/methratio.py", line 138, in main
    ret = map(chromWorker, argList)
  File "/data/SHARED_SOFTWARE/anaconda3/envs/scMeth/bin/methratio.py", line 283, in chromWorker
    refseq = refCache.fetch(pos, pos2)
  File "/data/SHARED_SOFTWARE/anaconda3/envs/scMeth/bin/methratio.py", line 214, in fetch
    assert(pos >= self.start)
AssertionError

I am using python-2.7.3

Any suggestions?
Thanks in advance,
Tommaso

methratio.py ERROR in conda and multicores of CPU: list index out of range

Hi,
I write R codes by system(" conda run -n py27 python2 methratio.py -I -o ...") to use methratio.py to process rrbs bam files.
I work in a node, which has 32 cores of cpus and 64 G memory. I revised the "samtools" command inside the original script by "conda run samtools ", and the runs failed with :

[methratio] @Fri Nov 24 15:29:11 2023 	Processing 64 chromosomes at a time
[methratio] @Fri Nov 24 15:29:12 2023 	Reading chr18_gl000207_random from batch1116/22T0007789.tmpSrt.bam with  conda run samtools
......
[methratio] @Fri Nov 24 15:29:16 2023 	Reading chr1 from batch1116/22T0007789.tmpSrt.bam with conda run samtools
Traceback (most recent call last):
  File "methratio.py", line 590, in <module>
    main()
  File "methratio.py", line 138, in main
    ret = chromPool.map(chromWorker, argList, chunksize=1)
  File "/public3/home/scg9946/miniconda3/envs/py27/lib/python2.7/multiprocessing/pool.py", line 253, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/public3/home/scg9946/miniconda3/envs/py27/lib/python2.7/multiprocessing/pool.py", line 572, in get
    raise self._value
IndexError: list index out of range
[methratio] @Fri Nov 24 15:29:39 2023 	Reading chrUn_gl000223 from batch1116/22T0007789.tmpSrt.bam with  conda run samtools
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
BrokenPipeError: [Errno 32] Broken pipe

However,the methratio script runs successfully using only one core of CPU.

methdiff.py error

Hello,
I research Whole-Genome Bisulfite Sequencing in Arabidopsis thaliana.

I have a question about methdiff.py.
When I carried out methdiff.py, program was spewed following error.

python methdiff.py -o diff.txt -b 1 -d GCF_000001735.3_TAIR10_genomic.fna WT.txt SI.txt
[methdiff] @TUE Jul 18 17:37:52 2017 reading reference file: ../test_1/GCF_000001735.3_TAIR10_genomic_1.fna ...
[methdiff] @TUE Jul 18 17:37:54 2017 processing NC_000932.1 ...
[methdiff] @TUE Jul 18 17:37:54 2017 processing NC_001284.2 ...
Traceback (most recent call last):
File "/home/yhanda/software/bsmap-2.90/methdiff.py", line 133, in
cmp_chrom(cr)
File "/home/yhanda/software/bsmap-2.90/methdiff.py", line 112, in cmp_chrom
pval = get_pval(m[0], d[0], m[1], d[1])
File "/home/yhanda/software/bsmap-2.90/methdiff.py", line 86, in get_pval
l0, u0 = conf_intv(m0, d0, z0)
File "/home/yhanda/software/bsmap-2.90/methdiff.py", line 81, in conf_intv
span = z * (p * (1 - p) / d + z2 / (4 * d * d)) ** 0.5
ValueError: negative number cannot be raised to a fractional power

I think that this error has been attributed to python version from following site (I use Python 2.7.5).
https://stackoverflow.com/questions/17747124/valueerror-negative-number-cannot-be-raised-to-a-fractional-power
However, I don’t know how to rewrite the methdiff.py.

Please let me know corrective strategy.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.