GithubHelp home page GithubHelp logo

nuttylogic / bsbolt Goto Github PK

View Code? Open in Web Editor NEW
20.0 3.0 12.0 83.28 MB

BiSulfite Bolt - A Bisulfite Sequencing Alignment and Processing Tool

License: MIT License

Python 100.00%
methylation bisulfite-sequencing dna-alignment

bsbolt's Introduction

BSBolt (BiSulfite Bolt)

A fast and safe bisulfite sequencing analysis platform

BiSuflite Bolt (BSBolt); a fast and scalable bisulfite sequencing analysis platform. BSBolt is an integrated analysis platform that offers support for bisulfite sequencing read simulation, alignment, methylation calling, data aggregation, and data imputation. BSBolt has been validated to work with a wide array of bisulfite sequencing data,including whole genome bisulfite sequencing (WGBS), reduced representative bisulfite sequencing data (RRBS), and targeted methylation sequencing data. BSBolt utilizes forked versions of BWA and WGSIM for read alignment and read simulation respectively. BSBolt is released under the MIT license.

Publication

Farrell, C., Thompson, M., Tosevska, A., Oyetunde, A. & Pellegrini, M. BiSulfite Bolt: A BiSulfite Sequencing Analysis Platform. 2020.10.06.328559 (2020). doi:10.1101/2020.10.06.328559

Documentation

Documentation can be found at https://bsbolt.readthedocs.io.

Release Notes

  • v1.6.0
    • MethyDackel compatibility
    • Option to output alignment to stdout
  • v1.5.0
    • Improved thread handling for methylation / variant calling.
    • Experimental bisulfite aware SNP caller.
  • v1.4.8
    • Fixed bug ending alignment when the reference template end greater than reference boundary.
  • v1.4.7
    • Alignment stats fix.
  • v1.4.6
    • Alignment statistics now output as generated.
    • Fixed bug where alignment would stop when observed mappability was low.
  • v1.4.5
    • Fixed maximum read depth bug that prevented methylation call on site covered by greater than 8000 reads
    • Refactored build script, with experimental support for M1 Macs
  • v1.4.4
    • The default entry point for BSBolt has changed from BSBolt to bsbolt for conda compatibility

Installation

PyPi Installation

Pre-compiled binaries can be installed using PyPi. Binaries are available for python >=3.6 on unix like systems (macOS >=10.15 and linux).

pip3 install bsbolt --user

Conda Installation

BSBolt can be installed using the conda package manager using the instructions below.

conda config --add channels bioconda
conda config --add channels conda-forge
conda install -c cpfarrell bsbolt

Installing from Source

Dependencies

  • zlib-devel >= 1.2.3-29
  • GCC >= 8.3.1
# clone the repository
git clone https://github.com/NuttyLogic/BSBolt.git
cd bsbolt
# compile and install package
pip3 install .

Installing from Source on macOS

Dependencies

  • autoconf
  • automake
  • homebrew
  • xcode

Installation from source requires xcode command line utilities, homebrew macOS package manager, autoconf, python (>=3.6), and automake.The full installation process is outlined below.

# install xcode utilities
xcode-select --install
# install homebrew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# install autoconf
brew install autoconf
# install automake
brew install automake
# optionally install python > 3.5
brew install python3.8
# clone the repository
git clone https://github.com/NuttyLogic/BSBolt.git
cd BSBolt
# compile and install package
pip3 install -e .

Usage

Following installation BSBolt can be called using bsbolt Module.

python3 -m bsbolt
usage: bsbolt Module {Module Arguments}

BiSulfite Bolt v1.6.0

options:
  -h, --help            show this help message and exit

subcommands:
  Please invoke bsbolt module for help, see bsbolt.readthedocs.io for detailed
  documentation

  Index, Align, CallMethylation, AggregateMatrix, Simulate, Impute
    Align               Alignment
    Index               Index Generation
    CallMethylation     Methylation Calling
    AggregateMatrix     CGmap Matrix Aggregation
    Simulate            Read Simulation
    Impute              kNN Imputation
    Sort                BAM Sort
    BamIndex            BAM Index
    CallVariation       Genetic Variation Calling
    GenotypeMatrix      Variant Bed Matrix Aggregation
�

bsbolt's People

Contributors

aborujerdpur avatar nuttylogic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

bsbolt's Issues

Align Error

Hi,

I met an error when running the following Command:
bsbolt Align -F1 trimed.trimmed.paired.R2.fq.gz -F2 trimed.trimmed.paired.R1.fq.gz -t 1 -OT 1 -O aligned -DB /gpfs/scratch/guoq03/Reference/sscrofa11.1/GCF_000003025.6/sscrofa11.1_bsbolt/ >> aligned.bsbolt.log 2>> aligned.bsbolt.log

The log and the error is below:

Tue May 7 16:16:16 2024 Done.
Tue May 7 16:16:16 2024 Sorting by read name.
Command: samtools sort -@ 1 -n -m 4G aligned.bam > aligned.nsrt.bam
Deleting aligned.bam
uoq03/anaconda3/envs/premethyst/lib/python3.10/site-packages/bsbolt/main.py", line 23, in launch_bsb
launcher(arguments)
File "/gpfs/home/guoq03/anaconda3/envs/premethyst/lib/python3.10/site-packages/bsbolt/Utils/Launcher.py", line 105, in launch_alignment
assert os.path.exists(f'{database}.opac'), f'-DB {arguments.DB} not complete, please re-index genome'
AssertionError: -DB /gpfs/scratch/guoq03/Reference/sscrofa11.1/GCF_000003025.6/sscrofa11.1_bsbolt/ not complete, please re-index genome
~

Do you have any ideas on the error?

Many thanks,
Zoe

Exception in thread Thread-3:

I am getting an error while doing bsbolt call methylation.
Here is the error I get. I have attached my submitting script. Can someone help me with what I am doing wrong.

**Exception in thread Thread-3:
Traceback (most recent call last):
File "/salilab/diva1/home/anaconda/anaconda3/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/salilab/diva1/home/anaconda/anaconda3/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, self._kwargs)
File "/salilab/diva1/home/anaconda/anaconda3/lib/python3.8/multiprocessing/pool.py", line 592, in _handle_results
cache[job]._set(i, obj)
File "/salilab/diva1/home/anaconda/anaconda3/lib/python3.8/multiprocessing/pool.py", line 778, in _set
self._error_callback(self._value)
File "/wynton/home/pillailab/agala/.local/lib/python3.8/site-packages/bsbolt/CallMethylation/ProcessMethylationContigs.py", line 134, in methylation_process_error
raise MethylationCallingError(error)
bsbolt.CallMethylation.ProcessMethylationContigs.MethylationCallingError: [Errno 20] Not a directory: '/wynton/scratch/agala/PTC_NC/Homo_sapiens.GRCh38.dna.primary_assembly.fa/1.pkl

bsboltcallmeth_launch.txt
bsboltcallmeth_work.txt
'

provide intermediate results on mappability

I see that bsbolt is very sensitive to parameters and input data while the time required to process the whole dataset can be >10 hours It would be nice if it will output to stdout intermediate results for mappability so I can stop the pipeline if I see that intermediate mappability is zero.

Error with bsbolt index

Hi everyone, I get an error when try to index with bsbolt Index:I try to index the ref genome of hg38 only with -G and -DB but it shows

index: invalid option -- 'b'.

And the same error appeared also with the test data of bsbolt. Any suggestion on how can I solve this?

Simulation Result

Hello,

I am an undergraduate student using BSBolt's simulation functionality, and this might be a silly question: the results I got are the pickle files as follows, but I was expecting to get paired-end fastq files so that I can use them for alignment later on. Am I doing something wrong here? Thank you so much!

Screen Shot 2023-01-04 at 12 53 27 AM

BSBolt Align err

When we run the BSBolt to align reads to reference genome. We find an error, the information as follow:

/public/home/ppguan/soft/miniconda/miniconda2/envs/python3.6/lib/python3.6/site-packages/bsbolt/External/BWA/bwa mem -Y -A 1 -B 4 -D 0.5 -E 1,1 -L 30,30 -T 10 -U 17 -W 0 -c 500 -d 100 -k 19 -m 50 -r 1.5 -t 1 -w 100 -y 20 -O 6,6 -h 100,200 -e 0.1 -l 0.5 -n 5 -Z 0.95 ./BSBolt_index/hg19.fa ./SRR5392315_head1M_1.cut2.fastq
['/public/home/ppguan/soft/miniconda/miniconda2/envs/python3.6/lib/python3.6/site-packages/bsbolt/External/BWA/bwa', 'mem', '-Y', '-A', '1', '-B', '4', '-D', '0.5', '-E', '1,1', '-L', '30,30', '-T', '10', '-U', '17', '-W', '0', '-c', '500', '-d', '100', '-k', '19', '-m', '50', '-r', '1.5', '-t', '1', '-w', '100', '-y', '20', '-O', '6,6', '-h', '100,200', '-e', '0.1', '-l', '0.5', '-n', '5', '-Z', '0.95', './BSBolt_index/hg19.fa', './SRR5392315_head1M_1.cut2.fastq']
/public/home/ppguan/soft/miniconda/miniconda2/envs/python3.6/lib/python3.6/site-packages/bsbolt/External/BWA/bwa: /lib64/libm.so.6: version GLIBC_2.29' not found (required by /public/home/ppguan/soft/miniconda/miniconda2/envs/python3.6/lib/python3.6/site-packages/bsbolt/External/BWA/bwa) /public/home/ppguan/soft/miniconda/miniconda2/envs/python3.6/lib/python3.6/site-packages/bsbolt/External/BWA/bwa: /lib64/libc.so.6: version GLIBC_2.33' not found (required by /public/home/ppguan/soft/miniconda/miniconda2/envs/python3.6/lib/python3.6/site-packages/bsbolt/External/BWA/bwa)
1
Traceback (most recent call last):
File "/public/home/ppguan/soft/miniconda/miniconda2/envs/python3.6/bin/BSBolt", line 11, in
sys.exit(launch_bsb())
File "/public/home/ppguan/soft/miniconda/miniconda2/envs/python3.6/lib/python3.6/site-packages/bsbolt/main.py", line 23, in launch_bsb
launcher(arguments)
File "/public/home/ppguan/soft/miniconda/miniconda2/envs/python3.6/lib/python3.6/site-packages/bsbolt/Utils/Launcher.py", line 110, in launch_alignment
alignment failed, check options
align_bisulfite(bwa_cmd, arguments.O, arguments.OT)
File "/public/home/ppguan/soft/miniconda/miniconda2/envs/python3.6/lib/python3.6/site-packages/bsbolt/Utils/Launcher.py", line 43, in align_bisulfite
bs_alignment.align_reads()
File "/public/home/ppguan/soft/miniconda/miniconda2/envs/python3.6/lib/python3.6/site-packages/bsbolt/Align/AlignReads.py", line 70, in align_reads
raise BisulfiteAlignmentError
bsbolt.Align.AlignReads.BisulfiteAlignmentError

Through we find the error when we build index for reference genome, it looks like that we get the refernece index. However, the align commnd can not run it successfuly. So could you help me to sovle this problem, Thanks a lot.

GenotypeMatrix and External not included in packages argument to setup in setup.py

After pip installing bsbolt from source code in mamba, GenotypeMatrix and External are not copied to the site-packages/bsbolt folder, and bsbolt throws a module not found error for GenotypeMatrix. I added GenotypeMatrix and External to the list of packages in the setup.py setup function's packages argument, and it works fine.

condas install BSbolt is missing two different glibc versions

somone had a similar issue earlier but I have the impression that is was not solved. I installed the conda package of BSbolt with micromaba and now it throws out two different glibc error messages:
/netscratch/grp_turck/lib/micromamba/envs/BS_pipeline/lib/python3.9/site-packages/bsbolt/External/BWA/bwa: /lib/x86_64-linux-gnu/libm.so.6: version GLIBC_2.29' not found (required by /netscratch/grp_turck/lib/micromamba/envs/BS_pipeline/lib/python3.9/site-packages/bsbolt/External/BWA/bwa) /netscratch/grp_turck/lib/micromamba/envs/BS_pipeline/lib/python3.9/site-packages/bsbolt/External/BWA/bwa: /lib/x86_64-linux-gnu/libc.so.6: versionGLIBC_2.33' not found (required by /netscratch/grp_turck/lib/micromamba/envs/BS_pipeline/lib/python3.9/site-packages/bsbolt/External/BWA/bwa)

Why is it asking for two different glibcs?

Mapping to original bottom strand

During read alignment of my single end, directional RRBS data I noticed that ~50% or reads mapped to Watson C2T while ~50% mapped to Crick C2T. However, it is my understanding that reads originating from the original bottom strand should map to Crick G2A, and Crick C2T should represent reads complementary to original bottom. Is there something I'm missing here?

I have run through the provided test example and can confirm that all reads map either Watson C2T or Crick C2T:

python3 -m bsbolt Index -G ~/tests/TestData/BSB_test.fa -DB ~/tests/TestData/BSB_Test_DB
python3 -m bsbolt Simulate -G ~/tests/TestData/BSB_test.fa -O ~/tests/TestSimulations/BSB_up
python3 -m bsbolt Align -DB ~/tests/TestData/BSB_Test_DB -F1 ~/tests/TestSimulations/BSB_up_1.fq -O ~/tests/BSB_up_test

Clipped output:

[main] Real time: 23.105 sec; CPU: 22.961 sec
Alignment Complete: Time 0:00:23

Total Reads: 313856
Mappability: 100.000 %

Reads Mapped to Watson_C2T: 156914
Reads Mapped to Crick_C2T: 156942
Reads Mapped to Watson_G2A: 0
Reads Mapped to Crick_G2A: 0

Unmapped Reads (Single / Paired Ends): 0
Bisulfite Ambiguous: 0

Excessive Amount of Bisulfite Ambiguous

I'm testing different bisulfite mappers for my data. BSBolt completely outperforms Bismark (Mappability 94% vs. 35%). However, 70M of 71M of my reads are "Bisulfite Ambiguous". Are the default mapping parameters too lax? Could this amount of ambiguous reads be because my isolate is somewhat different from the reference genome?

bwa mem -Y -A 1 -B 4 -D 0.5 -E 1,1 -L 30,30 -T 10 -U 17 -W 0 -c 500 -d 100 -k 19 -m 50 -r 1.5 -t 20 -w 100 -y 20 -O 6,6 -h 100,200 -e 0.1 -l 0.5 -n 5 -Z 0.95 genome/LT15-INDEX/BSB_ref.fa C1_r1.fq.gz C1_r2.fq.gz

Alignment Complete: Time 4:43:38

Total Reads: 71455594
Mappability: 94.301 %

Reads Mapped to Watson_C2T: 33763063
Reads Mapped to Crick_C2T: 33619994
Reads Mapped to Watson_G2A: 33769331
Reads Mapped to Crick_G2A: 33613726

Unmapped Reads (Single / Paired Ends): 8145074
Bisulfite Ambiguous: 70779193

I would appreciate any insights or suggestions on how to address this issue.

Can the software Bsbolt build the mm10 database?

After running the Bsbolt index command for a day, the mm10 database is still not established, After the bwa index command is used to create the database, the sa file is stuck

[bwt_gen] Finished constructing BWT in 614 iterations.
[bwa_index] 1979.83 seconds elapse.
[bwa_index] Update BWT... 23.93 sec
[bwa_index] Pack forward-only FASTA... 11.86 sec
[bwa_index] Construct SA from BWT and Occ...

issues with MeDIP Bs-Seq

I am having issues with MeDIP Bs-Seq, in particular https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1313979
Could please you clarify how to apply BSBOLT for this type of samples? Because I get Mappability of 0% there

My current pipeline is https://github.com/antonkulaga/epigenetics/blob/main/methylation/bsbolt/bsbolt_map_run.wdl with inputs https://github.com/antonkulaga/epigenetics/blob/main/methylation/bsbolt/inputs/medip.json

It produces the following output:

/opt/conda/lib/python3.8/site-packages/bsbolt/External/BWA/bwa mem -Y -p -A 1 -B 4 -D 0.5 -E 1,1 -L 30,30 -T 10 -U 17 -W 0 -c 500 -d 100 -k 19 -m 50 -r 1.5 -t 12 -w 100 -y 20 -O 6,6 -h 100,200 -e 0.1 -l 0.5 -n 5 -Z 0.95 /cromwell-executions/bsbolt_map_run/8b64021a-dc68-4be5-af1c-3f2a75cf9917/call-align/inputs/419547341/Homo_sapiens/BSB_ref.fa /cromwell-executions/bsbolt_map_run/8b64021a-dc68-4be5-af1c-3f2a75cf9917/call-align/inputs/-1855285201/SRR1142023.fastq_cleaned.fastq.gz
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 2504546 sequences (120000004 bp)...
[M::process] 2504546 single-end sequences; 0 paired-end sequences
[M::process] read 2505444 sequences (120000084 bp)...
[M::mem_process_seqs] Processed 2504546 reads in 1419.179 CPU sec, 119.575 real sec
[M::process] 2505444 single-end sequences; 0 paired-end sequences
[M::process] read 2513636 sequences (120000015 bp)...
[M::mem_process_seqs] Processed 2505444 reads in 1410.863 CPU sec, 117.414 real sec
[M::process] 2513636 single-end sequences; 0 paired-end sequences
[M::process] read 2503810 sequences (120000050 bp)...
[M::mem_process_seqs] Processed 2513636 reads in 1426.470 CPU sec, 118.794 real sec
[M::process] 2503810 single-end sequences; 0 paired-end sequences
[M::process] read 2505598 sequences (120000016 bp)...
[M::mem_process_seqs] Processed 2503810 reads in 1425.033 CPU sec, 118.903 real sec
[M::process] 2505598 single-end sequences; 0 paired-end sequences
[M::process] read 2513340 sequences (120000000 bp)...
[M::mem_process_seqs] Processed 2505598 reads in 1426.481 CPU sec, 118.650 real sec
[M::process] 2513340 single-end sequences; 0 paired-end sequences
[M::process] read 2501888 sequences (120000046 bp)...
[M::mem_process_seqs] Processed 2513340 reads in 1434.959 CPU sec, 119.693 real sec
[M::process] 2501888 single-end sequences; 0 paired-end sequences
[M::process] read 2502828 sequences (120000078 bp)...
[M::mem_process_seqs] Processed 2501888 reads in 1426.232 CPU sec, 118.963 real sec
[M::process] 2502828 single-end sequences; 0 paired-end sequences
[M::process] read 2509258 sequences (120000032 bp)...
[M::mem_process_seqs] Processed 2502828 reads in 1417.316 CPU sec, 118.041 real sec
[M::process] 2509258 single-end sequences; 0 paired-end sequences
[M::process] read 1657692 sequences (79569209 bp)...
[M::mem_process_seqs] Processed 2509258 reads in 1422.571 CPU sec, 118.703 real sec
[M::process] 1657692 single-end sequences; 0 paired-end sequences
[M::mem_process_seqs] Processed 1657692 reads in 938.316 CPU sec, 78.338 real sec
[main] Version: BSB-1.2.1-BWA-fork-0.7.17
[main] CMD: /opt/conda/lib/python3.8/site-packages/bsbolt/External/BWA/bwa mem -Y -p -A 1 -B 4 -D 0.5 -E 1,1 -L 30,30 -T 10 -U 17 -W 0 -c 500 -d 100 -k 19 -m 50 -r 1.5 -t 12 -w 100 -y 20 -O 6,6 -h 100,200 -e 0.1 -l 0.5 -n 5 -Z 0.95 /cromwell-executions/bsbolt_map_run/8b64021a-dc68-4be5-af1c-3f2a75cf9917/call-align/inputs/419547341/Homo_sapiens/BSB_ref.fa /cromwell-executions/bsbolt_map_run/8b64021a-dc68-4be5-af1c-3f2a75cf9917/call-align/inputs/-1855285201/SRR1142023.fastq_cleaned.fastq.gz
[main] Real time: 1168.473 sec; CPU: 13770.827 sec
Alignment Complete: Time 0:19:29
------------------------------
Total Reads: 24218040
Mappability: 0.000 %
------------------------------
Reads Mapped to Watson_C2T: 0
Reads Mapped to Crick_C2T: 0
Reads Mapped to Watson_G2A: 0
Reads Mapped to Crick_G2A: 0
------------------------------
Unmapped Reads (Single / Paired Ends): 24218040
Bisulfite Ambiguous: 24218040

The command executed by Cromwell is:

 bsbolt Align -F1 /cromwell-executions/bsbolt_map_run/8b64021a-dc68-4be5-af1c-3f2a75cf9917/call-align/inputs/-1855285201/SRR1142023.fastq_cleaned.fastq.gz  -DB /cromwell-executions/bsbolt_map_run/8b64021a-dc68-4be5-af1c-3f2a75cf9917/call-align/inputs/419547341/Homo_sapiens -O GSM1313979_bsbolt -OT 12 -R '@RG ID:GSM1313979' -p \
-t 12

Using The CGmap withiout creating the Aggregationmatrix

Hello @NuttyLogic

Thanks for the wonderful and easy-to-implement program. I am currently using this package however I have just one sample which was sequenced paired-End. The CGmap I obtained from the Methylation call process contains both forward and reverse methylation probability for each context. Please which of the stand information should be used in the downstream analysis. If both strands are informative, How can I make sense of the methylation probability for each Cytosine position from both strand information? Thanks.

ZeroDivisionError: division by zero

Hi,

I get an error after running the following command.

Command :

python3 -m BSBolt Align -DB mouse_ref -F1 Sample_R1.fastq.gz -F2 Sample_R2.fastq.gz -O output_prefix -M -t 10

Error:

alignment failed, check options
Traceback (most recent call last):
File ".../anaconda3/envs/bsbolt/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File ".../anaconda3/envs/bsbolt/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File ".../anaconda3/envs/bsbolt/lib/python3.8/site-packages/BSBolt/main.py", line 27, in
launch_bsb()
File ".../anaconda3/envs/bsbolt/lib/python3.8/site-packages/BSBolt/main.py", line 23, in launch_bsb
launcher(arguments)
File ".../anaconda3/envs/bsbolt/lib/python3.8/site-packages/BSBolt/Utils/Launcher.py", line 107, in launch_alignment
align_bisulfite(bwa_cmd, arguments.O)
File ".../anaconda3/envs/bsbolt/lib/python3.8/site-packages/BSBolt/Utils/Launcher.py", line 47, in align_bisulfite
mapping_stats = process_mapping_statistics(bs_alignment.mapping_statistics)
File ".../anaconda3/envs/bsbolt/lib/python3.8/site-packages/BSBolt/Utils/Launcher.py", line 55, in process_mapping_statistics
mappability = (mapping_dict['TotalAlignments'] - mapping_dict['Unaligned']) / mapping_dict['TotalAlignments']
ZeroDivisionError: division by zero

.../anaconda3/envs/bsbolt/lib/python3.8/site-packages/BSBolt/External/BWA/bwa mem -Y -M -A 1 -B 4 -D 0.5 -E 1,1 -L 30,30 -T 10 -U 17 -W 0 -c 500 -d 100 -k 19 -m 50 -r 1.5 -t 10 -w 100 -y 20 -O 6,6 -h 100,200 -e 0.1 -l 0.5 -n 5 -Z 0.95 BSB_ref.fa Sample_R1.fastq.gz Sample_R2.fastq.gz
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 666668 sequences (100000200 bp)...
[M::process] read 666668 sequences (100000200 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 253394, 4, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (315, 369, 427)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (91, 651)
[M::mem_pestat] mean and std.dev: (371.49, 88.15)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 763)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
Alignment Complete: Time 0:02:15

Is this problem due to my data or is it in the program?

Best regards.

Write output to stdout [Feature Request]

It would be extremely useful if there was an option to write to stdout for pipelining purposes.

I figure a flag that when called either changes:

bam_compression = subprocess.Popen([stream_bam, '-@', str(self.output_threads), '-o', f'{self.output}.bam'],

to something like
bam_compression = subprocess.Popen([stream_bam, '-@', str(self.output_threads), '-o', '/dev/stdout'], or something similar.

or just has uncompressed header and SAM format reads written to stdout (would be the fastest for pipelines, since it skips compression/decompression entirely).

List of methylated sites from simulated reads

Hello,
maybe I'm just missing this option, so apologies if it is my mistake, but is there an option when simulating BS-reads to save the positions and level of methylation of the sites in the reference genome? I can see the cigar that contains the information, but it would be handy toi have a function that extracts that automatically.

Thanks in advance
Andrea

EDIT: I can see that there are the _variants.pkl, .pkl and _values.pkl files, but I can't find a documentation to detail their contents.

Aligning G2A single read

There does not seem to be a way to align a single read explicitly with G2A coversion. I need to align my paired reads separately as they are bisulfite converted HiC, and bwa mem does not handle the long insert distances well. I've found that the second read will automatically be assumed to have C2T coversion if presented as -F1.

Is there a way to do this currently? I was thinking that maybe a combination of -UN and -SP might force it to consider G2A preferentially?

Could not retrieve index file for .bam

Hello,

When running

python3 -m bsbolt CallMethylation -I BSBolt_output/SRP097629/Alignment_test.bam -O ~/tests/BSB_pe_test -DB ~/tests/TestData/BSB_Test_DB -t 2

I get the following error:

[E::idx_find_and_load] Could not retrieve index file for 'BSBolt_output/SRP097629/Alignment_test.bam'
[E::hts_idx_push] NO_COOR reads not in a single block at the end 29 -1
[E::sam_index] Read 'SRR5195621.2' with ref_name='NC_000079.6', ref_length=120421639, flags=67, pos=89576430 cannot be indexed
Traceback (most recent call last):
File "/root/.local/lib/python3.7/site-packages/bsbolt/CallMethylation/ProcessMethylationContigs.py", line 70, in init
self.input_bam = pysam.Samfile(input_file, 'rb', require_index=True)
File "pysam/libcalignmentfile.pyx", line 742, in pysam.libcalignmentfile.AlignmentFile.cinit
File "pysam/libcalignmentfile.pyx", line 1007, in pysam.libcalignmentfile.AlignmentFile._open
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/.local/lib/python3.7/site-packages/bsbolt/main.py", line 27, in
launch_bsb()
File "/root/.local/lib/python3.7/site-packages/bsbolt/main.py", line 23, in launch_bsb
launcher(arguments)
File "/root/.local/lib/python3.7/site-packages/bsbolt/Utils/Launcher.py", line 138, in launch_methylation_call
bedgraph_output=arguments.BG)
File "/root/.local/lib/python3.7/site-packages/bsbolt/CallMethylation/ProcessMethylationContigs.py", line 73, in init
pysam.index(input_file)
File "/root/.local/lib/python3.7/site-packages/pysam/utils.py", line 75, in call
stderr))
pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=samtools index: failed to create index for "BSBolt_output/SRP097629/Alignment_test.bam"\n'

Could this be because the

  1. Sequences were not trimmed,
  2. Bad quality reads, or
  3. It has nothing to do with BSBolt & everything to do with samtools ?

Thanks!

conda install of BSBolt issue

I installed BSbolt via conda, but ran into the following issue:

bsbolt Index -G /mnt/e/refs/mm10/mm10.fa -DB /mnt/e/refs/mm10/ -IA
/home/oconnelb/anaconda3/lib/python3.8/site-packages/bsbolt/External/BWA/bwa: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /home/oconnelb/anaconda3/lib/python3.8/site-packages/bsbolt/External/BWA/bwa)

Looking online, this seems to be a result of when the BWA install was compiled?

In any case, building from the github still seems to work alright.

Add the ability to pass -x (feature request)

It would be very useful to be able to pass -x intractg to the underlying bwa mem call. While the basic parameters can be adjusted manually (e.g. Mismatch score) the intractg flag also causes bwa mem to use a modified alignment algorithm that is useful for HiC and similar data.

BSBolt Aggregate Matrix error

This is my script:

#!/bin/bash
. /etc/bashrc
module load CBI
module load Sali
module load samtools/1.9
module load anaconda
export PATH="${PATH}:~/.local/bin"
BSBoltaggregate.bash.e32547.txt
BSBoltaggregate.txt

numcores="8"; minsitecvg="1"; minnumsamps="20"
keytsv="PTC_NC_samples.tsv"
numsamples="$(wc -l < "${keytsv}")"

outbase="Plex${numsamples}.bsbolt-grch38.sorted.CGmap-MQ20-BQ10-min1-max1000.sorted"
minsampfract="$( (echo 'scale=17'; echo "(${minnumsamps}+0.5)/${numsamples}") | bc)"

for ctx in CHG; do
smpnamescommasep="$(awk -F $'\t' '{ printf("%s%s", (NR==1)?"":",",$1); }' < "${keytsv}")"
mapfilescommasep="$(awk -F $'\t' '{ printf("%s%s.'"${ctx}"'map.gz", (NR==1)?"":",",$2); }' < "${keytsv}")"
options="-t ${numcores} -S ${smpnamescommasep} -F ${mapfilescommasep}"
options="${options} -min-coverage ${minsitecvg} -min-sample ${minsampfract}"
optdesn="${ctx}-mincvg${minsitecvg}-minsamps${minnumsamps}"

    python3 -m bsbolt AggregateMatrix -count ${options} -O "${outbase}".aggcount-"${optdesn}".tsv
    python3 -m bsbolt AggregateMatrix        ${options} -O "${outbase}".aggfrmth-"${optdesn}".tsv

done

This is the error I get:
Exception in thread Thread-6:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/salilab/diva1/home/anaconda/anaconda3/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/wynton/home/pillailab/agala/.local/lib/python3.8/site-packages/bsbolt/Matrix/SiteAggregator.py", line 69, in collect_methylation_sites
matrix_key = {site: count for count, site in enumerate(matrix_sites)}
File "/wynton/home/pillailab/agala/.local/lib/python3.8/site-packages/bsbolt/Matrix/SiteAggregator.py", line 69, in
matrix_key = {site: count for count, site in enumerate(matrix_sites)}
MemoryError
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/salilab/diva1/home/anaconda/anaconda3/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/salilab/diva1/home/anaconda/anaconda3/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/salilab/diva1/home/anaconda/anaconda3/lib/python3.8/multiprocessing/pool.py", line 592, in _handle_results
cache[job]._set(i, obj)
File "/salilab/diva1/home/anaconda/anaconda3/lib/python3.8/multiprocessing/pool.py", line 778, in _set
self._error_callback(self._value)
File "/wynton/home/pillailab/agala/.local/lib/python3.8/site-packages/bsbolt/Utils/UtilityFunctions.py", line 104, in propagate_error
raise error
MemoryError

I had given ample memory for running the script but still cannot figure out what else is wrong.
PTC_NC_samples.txt

Here is an example of the file name, I am working on:
S8.sorted.deduped.CGmap-MQ20-BQ10-min1-max1000.sorted.CHGmap.gz

I have also attached the file from where it will obtain sample names and file names. I have changed the extension of the file from tsv to txt for uploading here

Alignment for fasta ?

I have had test the software using fasta file. However it report error such as :

[M::mem_process_seqs] Processed 148150 reads in 261.581 CPU sec, 132.685 real sec
[E::sam_parse1] SEQ and QUAL are of different length
[W::sam_read1] Parse error at line 153
truncated file.

my fasta file is as normal fasta file and check passed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.