iprada / circle-map Goto Github PK
View Code? Open in Web Editor NEWA method for circular DNA detection based on probabilistic mapping of ultrashort reads
License: MIT License
A method for circular DNA detection based on probabilistic mapping of ultrashort reads
License: MIT License
Hello,
I installed Circle-Map via conda and encountered the following problem:
Computing the coverage of the identified eccDNA
Merging intervals for coverage computation
Traceback (most recent call last):
File "/public/jxyue/Projects/ecDNA/build/circlemap_conda_env/bin/Circle-Map", line 10, in <module>
sys.exit(main())
File "/public/jxyue/Projects/ecDNA/build/circlemap_conda_env/lib/python3.7/site-packages/circlemap/circle_map.py", line 1164, in main
run = circle_map()
File "/public/jxyue/Projects/ecDNA/build/circlemap_conda_env/lib/python3.7/site-packages/circlemap/circle_map.py", line 199, in __init__
output = coverage_object.compute_coverage(coverage_object.get_wg_coverage())
File "/public/jxyue/Projects/ecDNA/build/circlemap_conda_env/lib/python3.7/site-packages/circlemap/Coverage.py", line 98, in compute_coverage
for cov_dict,header_dict in cov_generator:
File "/public/jxyue/Projects/ecDNA/build/circlemap_conda_env/lib/python3.7/site-packages/circlemap/Coverage.py", line 60, in get_wg_coverage
merged_bed = self.bed.sort().merge()
File "/public/jxyue/Projects/ecDNA/build/circlemap_conda_env/lib/python3.7/site-packages/pybedtools/bedtool.py", line 917, in decorated
result = method(self, *args, **kwargs)
File "/public/jxyue/Projects/ecDNA/build/circlemap_conda_env/lib/python3.7/site-packages/pybedtools/bedtool.py", line 240, in not_implemented_func
raise NotImplementedError(help_str)
NotImplementedError: "sortBed" does not appear to be installed or on the path, so this method is disabled. Please install a more recent version of BEDTools and re-import to use this method.
I checked the file "/public/jxyue/Projects/ecDNA/build/circlemap_conda_env/lib/python3.7/site-packages/pybedtools/bedtool.py
" and clearly sortBed
has been defined in it. So I don't know why this problem occurs. Also, this problem persists even after I explicitly installed latest version of bedtools and pybedtools via conda.
Thanks in advance.
Best,
Jia-Xing
Dear Author:
I encountered a problem when using Circle-Map. I found that when running Circle-Map Realign, the progress bar is always 0%, as shown below. I thought it was a problem of building a database, but later I found it was not. I followed the instructions in your tutorial, but it was not feasible. I want to ask you, is there any other reason? The key is that it does not report an error, I have no clue! Looking forward to your reply!
Hello,
I am running Circle-Map Realign and receive an error message. Here is my output:
/bin/sh: bedtools: command not found
/bin/sh: mergeBed: command not found
[E::idx_find_and_load] Could not retrieve index file for '/scratch/mariacas_root/mariacas99/hartlama/NPA_sorted_qname.bam'
2021-03-22 15:39:40: Realigning reads using Circle-Map
2021-03-22 15:39:40: Clustering structural variant reads
2021-03-22 15:39:40: Splitting clusters to to processors
0%| | 0/300 [00:00<?, ?it/s]
0it [00:00, ?it/s]�[A
1%| | 3/300 [00:00<00:10, 27.79it/s]
3it [00:00, 27.93it/s]�[A
2%|▏ | 7/300 [00:00<00:09, 32.38it/s]
7it [00:00, 32.45it/s]�[A
4%|▎ | 11/300 [00:00<00:08, 35.22it/s]
11it [00:00, 35.26it/s]�[A
5%|▌ | 16/300 [00:00<00:07, 36.80it/s]
16it [00:00, 36.80it/s]�[A
7%|▋ | 21/300 [00:00<00:07, 38.81it/s]
21it [00:00, 38.82it/s]�[A
8%|▊ | 25/300 [00:00<00:07, 36.18it/s]
25it [00:00, 36.20it/s]�[A
10%|█ | 30/300 [00:00<00:07, 38.27it/s]
30it [00:00, 38.28it/s]�[A
11%|█▏ | 34/300 [00:00<00:07, 38.00it/s]
34it [00:00, 38.01it/s]�[A
13%|█▎ | 39/300 [00:01<00:06, 41.32it/s]
39it [00:01, 41.33it/s]�[A
15%|█▍ | 44/300 [00:01<00:06, 39.19it/s]
44it [00:01, 39.19it/s]�[A
16%|█▌ | 48/300 [00:01<00:06, 39.29it/s]
48it [00:01, 39.29it/s]�[A
18%|█▊ | 53/300 [00:01<00:06, 40.25it/s]
53it [00:01, 40.25it/s]�[A
19%|█▉ | 58/300 [00:01<00:06, 40.05it/s]
58it [00:01, 40.05it/s]�[A
21%|██ | 63/300 [00:01<00:05, 39.95it/s]
63it [00:01, 39.95it/s]�[A
23%|██▎ | 68/300 [00:01<00:05, 42.02it/s]
68it [00:01, 42.02it/s]�[A
24%|██▍ | 73/300 [00:01<00:05, 40.69it/s]
73it [00:01, 40.69it/s]�[A
26%|██▌ | 78/300 [00:02<00:05, 38.99it/s]
78it [00:02, 38.99it/s]�[A
27%|██▋ | 82/300 [00:02<00:05, 37.94it/s]
82it [00:02, 37.94it/s]�[A
29%|██▉ | 87/300 [00:02<00:05, 38.68it/s]
87it [00:02, 38.67it/s]�[A
31%|███ | 92/300 [00:02<00:05, 40.76it/s]
92it [00:02, 40.76it/s]�[A
32%|███▏ | 97/300 [00:02<00:05, 39.99it/s]
97it [00:02, 39.99it/s]�[A
34%|███▍ | 102/300 [00:02<00:05, 38.90it/s]
102it [00:02, 38.90it/s]�[A
36%|███▌ | 107/300 [00:02<00:05, 38.36it/s]
107it [00:02, 38.36it/s]�[A
37%|███▋ | 112/300 [00:02<00:04, 40.58it/s]
112it [00:02, 40.57it/s]�[A
39%|███▉ | 117/300 [00:02<00:04, 40.21it/s]
117it [00:02, 40.21it/s]�[A
41%|████ | 122/300 [00:03<00:04, 40.63it/s]
122it [00:03, 40.63it/s]�[A
42%|████▏ | 127/300 [00:03<00:04, 41.45it/s]
127it [00:03, 41.45it/s]�[A
44%|████▍ | 132/300 [00:03<00:04, 39.08it/s]
132it [00:03, 39.08it/s]�[A
45%|████▌ | 136/300 [00:03<00:04, 39.09it/s]
136it [00:03, 39.09it/s]�[A
47%|████▋ | 141/300 [00:03<00:04, 39.71it/s]
141it [00:03, 39.71it/s]�[A
49%|████▊ | 146/300 [00:03<00:03, 41.04it/s]
146it [00:03, 41.04it/s]�[A
50%|█████ | 151/300 [00:03<00:03, 40.82it/s]
151it [00:03, 40.81it/s]�[A
52%|█████▏ | 156/300 [00:03<00:03, 41.06it/s]
156it [00:03, 41.05it/s]�[A
54%|█████▎ | 161/300 [00:04<00:03, 40.60it/s]
161it [00:04, 40.60it/s]�[A
55%|█████▌ | 166/300 [00:04<00:03, 42.18it/s]
166it [00:04, 42.18it/s]�[A
57%|█████▋ | 171/300 [00:04<00:03, 40.95it/s]
171it [00:04, 40.95it/s]�[A
59%|█████▊ | 176/300 [00:04<00:02, 41.77it/s]
176it [00:04, 41.77it/s]�[A
60%|██████ | 181/300 [00:04<00:02, 41.10it/s]
181it [00:04, 41.10it/s]�[A
62%|██████▏ | 186/300 [00:04<00:02, 40.16it/s]
186it [00:04, 40.16it/s]�[A
64%|██████▎ | 191/300 [00:04<00:02, 40.76it/s]
191it [00:04, 40.76it/s]�[A
65%|██████▌ | 196/300 [00:04<00:02, 40.94it/s]
196it [00:04, 40.94it/s]�[A
67%|██████▋ | 201/300 [00:05<00:02, 41.02it/s]
201it [00:05, 41.02it/s]�[A
69%|██████▊ | 206/300 [00:05<00:02, 40.98it/s]
206it [00:05, 40.98it/s]�[A
70%|███████ | 211/300 [00:05<00:02, 41.35it/s]
211it [00:05, 41.35it/s]�[A
72%|███████▏ | 216/300 [00:05<00:02, 40.54it/s]
216it [00:05, 40.52it/s]�[A
74%|███████▎ | 221/300 [00:05<00:01, 41.21it/s]
221it [00:05, 41.21it/s]�[A
75%|███████▌ | 226/300 [00:05<00:01, 39.83it/s]
226it [00:05, 39.83it/s]�[A
77%|███████▋ | 231/300 [00:05<00:01, 39.87it/s]
231it [00:05, 39.87it/s]�[A
79%|███████▊ | 236/300 [00:05<00:01, 42.01it/s]
236it [00:05, 42.00it/s]�[A
80%|████████ | 241/300 [00:06<00:01, 41.64it/s]
241it [00:06, 41.64it/s]�[A
82%|████████▏ | 246/300 [00:06<00:01, 40.68it/s]
246it [00:06, 40.68it/s]�[A
84%|████████▎ | 251/300 [00:06<00:01, 40.03it/s]
251it [00:06, 40.02it/s]�[A
85%|████████▌ | 256/300 [00:06<00:01, 39.99it/s]
256it [00:06, 39.99it/s]�[A
87%|████████▋ | 261/300 [00:06<00:00, 39.90it/s]
261it [00:06, 39.88it/s]�[A
88%|████████▊ | 265/300 [00:06<00:00, 39.55it/s]
265it [00:06, 39.56it/s]�[A
90%|█████████ | 270/300 [00:06<00:00, 39.10it/s]
270it [00:06, 39.10it/s]�[A
92%|█████████▏| 275/300 [00:06<00:00, 41.71it/s]
275it [00:06, 41.69it/s]�[A
93%|█████████▎| 280/300 [00:07<00:00, 40.67it/s]
280it [00:07, 40.68it/s]�[A
95%|█████████▌| 285/300 [00:07<00:00, 40.55it/s]
285it [00:07, 40.56it/s]�[A
97%|█████████▋| 290/300 [00:07<00:00, 38.49it/s]
290it [00:07, 38.49it/s]�[A
98%|█████████▊| 295/300 [00:07<00:00, 39.31it/s]
295it [00:07, 39.31it/s]�[A
100%|██████████| 300/300 [00:07<00:00, 40.57it/s]
300it [00:07, 40.57it/s]�[A
300it [00:07, 39.98it/s]
100%|██████████| 300/300 [00:07<00:00, 39.97it/s]
2021-03-22 15:39:51: Writting final output to disk
Traceback (most recent call last):
File "/home/hartlama/.local/bin/Circle-Map", line 33, in
sys.exit(load_entry_point('Circle-Map==1.1.5', 'console_scripts', 'Circle-Map')())
File "/home/hartlama/.local/lib/python3.9/site-packages/circlemap/circle_map.py", line 1165, in main
run = circle_map()
File "/home/hartlama/.local/lib/python3.9/site-packages/circlemap/circle_map.py", line 187, in init
output = merge_final_output(self.args.sbam, self.args.output, begin, self.args.split,
File "/home/hartlama/.local/lib/python3.9/site-packages/circlemap/utils.py", line 1266, in merge_final_output
second_merging_round = unparsed_pd.sort_values(['chrom', 'start', 'end']).reset_index()
File "/home/hartlama/.local/lib/python3.9/site-packages/pandas/core/frame.py", line 5442, in sort_values
keys = [self._get_label_or_level_values(x, axis=axis) for x in by]
File "/home/hartlama/.local/lib/python3.9/site-packages/pandas/core/frame.py", line 5442, in
keys = [self._get_label_or_level_values(x, axis=axis) for x in by]
File "/home/hartlama/.local/lib/python3.9/site-packages/pandas/core/generic.py", line 1684, in _get_label_or_level_values
raise KeyError(key)
KeyError: 'chrom'
Hi:
I am using Circle-Map to detect ecDNA from circle-seq data(paired-end).
However, strangely, the output bedfiles show that ecDNAs only exist in Chr10 and Chr11, and another sample only in Chr10,11,12,13,14,15. I don't know if it is the truth or is caused by some errors. My input BAM files contains all the chromosome. And I check the log files, it shows some warnings:
Can these warnings contribute to the results above?
Hi,
I read your interesting paper and wanted to try this pipeline. I have some circular DNA reads produced by the Circle-seq method and sequenced by nanopore. I am wondering if this pipeline would work for Nanopore sequencing data with relatively low coverage?
Thanks
Is there any way to speed up Circle Realign? I'm running with 10 processors, 15GB each, on a server, but I still have some iterations taking 1-3 hours. My sorted_read_candidates.bam file is 16GB. Any suggestions?
Hi,
Just wonder is there an easy way to mark those evidence reads so that they can be viewed through IGV ? Thanks!
Naive question. You don't need to unzip the human references to index or align with bwa, right?
Same with the samtools indexing, right?
Chris
After successfully running cirlcemap as outlined in the tutorial, I get these run time errors in the job output file and am unsure how to interpret them:
/global/home/users/diplockn/.local/lib/python3.6/site-packages/circlemap/Coverage.py:150: RuntimeWarning: invalid value encountered in ulong_scalars
ext_array[0:(self.ilen + self.ext)])
/global/home/users/diplockn/.local/lib/python3.6/site-packages/circlemap/Coverage.py:151: RuntimeWarning: invalid value encountered in ulong_scalars
end_coverage_ratio = np.sum(region_array[-self.ilen:]) / np.sum(ext_array[-(self.ilen + self.ext):])
/global/home/users/diplockn/.local/lib/python3.6/site-packages/numpy/core/fromnumeric.py:3335: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
/global/home/users/diplockn/.local/lib/python3.6/site-packages/numpy/core/_methods.py:161: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
/global/home/users/diplockn/.local/lib/python3.6/site-packages/numpy/core/_methods.py:217: RuntimeWarning: Degrees of freedom <= 0 for slice
keepdims=keepdims)
/global/home/users/diplockn/.local/lib/python3.6/site-packages/numpy/core/_methods.py:186: RuntimeWarning: invalid value encountered in true_divide
arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
/global/home/users/diplockn/.local/lib/python3.6/site-packages/numpy/core/_methods.py:209: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
After running Circle-map, I can get a bed file of ECC locations. But how can I get the ECC reads? I think not all reads that in the ECC locations are ECC reads.
Hi Inigo,
I have multiple samples where Circle-Map will fail with:
"An error happenend during execution. Exiting"
stderr:
[E::idx_find_and_load] Could not retrieve index file for 'output/circle-map/namesort_bams/tissue1.namesort.bam'
0%| | 0/800 [00:00<?, ?it/s]
0it [00:00, ?it/s]�[A
0%| | 1/800 [00:14<3:17:10, 14.81s/it]
0%| | 1/800 [00:14<3:18:24, 14.90s/it]
0it [00:14, ?it/s]
stdout:
2020-10-13 14:06:35: Realigning reads using Circle-Map
2020-10-13 14:06:35: Clustering structural variant reads
2020-10-13 14:25:21: Splitting clusters to to processors
2020-10-13 14:25:38: An error happenend during execution. Exiting
Perhaps this is shared with Issue #35 ? However, the speed isn't my issue, but rather the early exit / run failure. One of my samples exits very quickly, so maybe I'll be able to debug with that sample as my test case. I'll let you know if I have any luck.
When this error occurs, could the call to sys.exit() be modified to return exit code 1? Or any non-zero value? With the default return status of 0, wrapper job scripts around Circle-Map will not be able to detect that an error has occurred.
Hi Iñigo,
I was performing the Circle-Map Realign tutorial, and encountered a few warnings.
My command:
Circle-Map Realign -t 4 -i sort_circular_read_candidates.bam -qbam qname_unknown_circle.bam -sbam sorted_unknown_circle.bam -fasta $ref_genome -o my_unknown_circle.bed
Note that I am using the hg38-related reference GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna which differs from your tutorial slightly.
Output:
2020-09-17 11:16:14: Realigning reads using Circle-Map
2020-09-17 11:16:14: Clustering structural variant reads
2020-09-17 11:16:37: Splitting clusters to to processors
[E::idx_find_and_load] Could not retrieve index file for 'qname_unknown_circle.bam'
400it [00:14, 27.54it/s]███████████████████████████████████████████████████████████████████████████ | 396/400 [00:14<00:00, 33.09it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 400/400 [00:14<00:00, 27.54it/s]
2020-09-17 11:16:52: Writting final output to disk
2020-09-17 11:16:52: Finished!
2020-09-17 11:16:52: Circle-Map Realign finished indentifying circles in 0.6356727162996928
2020-09-17 11:16:52: Circle-Map has identified 1 circles
Computing the coverage of the identified eccDNA
Merging intervals for coverage computation
/home/michael/.snakemake/conda/68e54810/lib/python3.8/subprocess.py:849: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
self.stderr = io.open(errread, 'rb', bufsize)
/home/michael/.snakemake/conda/68e54810/lib/python3.8/subprocess.py:849: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
self.stderr = io.open(errread, 'rb', bufsize)
Computing coverage on interval chr7:143911106-143917553
The warning
[E::idx_find_and_load] Could not retrieve index file for 'qname_unknown_circle.bam'
Seems to make sense, as I don't think you can index a qname-sorted bam file?
The warnings:
/home/michael/.snakemake/conda/68e54810/lib/python3.8/subprocess.py:849: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
self.stderr = io.open(errread, 'rb', bufsize)
/home/michael/.snakemake/conda/68e54810/lib/python3.8/subprocess.py:849: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
self.stderr = io.open(errread, 'rb', bufsize)
seem to catch and correct themselves, but I am unsure what is causing them.
Is it safe to ignore these warnings?
If so, can they be suppressed or explained in the stdout to avoid panicking me as the user? :-)
Thanks,
Michael
dear Inigo,
I have some doubts about the output result.
I find the discordants column and Split-reads column contain a large number of zero from the output text, and what does these zero mean. if it indicate that no dicordants-reads or split-reads support for these circle DNA.
As shown in Figure 1 in the article,circle DNA were detected from discordant reads and soft-clipped reads, so it will be that one circle DNA have at least one discordant reads.
Is there anything I misunderstood?
I encountered the error:
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the ``molecule_type`` as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information
This seems to have changed with biopython 1.78
Installing biopython=1.77 fixed this issue for me.
Perhaps update the conda recipe to specify biopython=1.77?
Dear Inigo,
In the project I've analyzed so far, I've found that some rows in the output file(.bed file) have no values in columns 10 and 11. How to explain this situation. Is it a problem caused by my specific analysis or a normal condition caused by software analysis? I am very sorry to disturb you and look forward to your reply.
hello iprada,
I meet some problem when I was downloading the raw data, the command is as follows.
wget https://raw.githubusercontent.com/iprada/Circle-Map/master/tutorial/unknown_circle_reads_1.fastq
wget https://raw.githubusercontent.com/iprada/Circle-Map/master/tutorial/unknown_circle_reads_2.fastq
it showed "Connection refused...", is the data available now ? Could you please help me and give me some suggestions, thank you for your attention.
Best regards.
Gu,
The conda install is done, and the Circle-Map command is not accessible from bash. Throws "command not found".
Traceback (most recent call last):
File "/home/iprada/miniconda3/bin/Circle-Map", line 11, in <module>
load_entry_point('Circle-Map==1.1.1', 'console_scripts', 'Circle-Map')()
File "/home/iprada/miniconda3/lib/python3.7/site-packages/Circle_Map-1.1.1-py3.7.egg/circlemap/circle_map.py", line 870, in main
File "/home/iprada/miniconda3/lib/python3.7/site-packages/Circle_Map-1.1.1-py3.7.egg/circlemap/circle_map.py", line 198, in __init__
AttributeError: 'Namespace' object has no attribute 'sbam'
The coverage calculation of the circles found with Circle-Map Repeats fails due to the inconsistencies in the variable names
Dear,
I recently read your excellent work on circle-map, thank you for your contribution, but I have some small confusion, I noticed that in the middle part of the paper you used data from Circle-seq to test circle-map , This method enriches eccDNA before sequencing. Is this necessary for circle-map? If not, can the data generated by the classic WGS sequencing process be used? If possible, 1) Will noise from other genomic regions have an adverse effect; 2) Do we need to adopt more relaxed screening criteria for the output of circle-map?
Best.
Dear iprada
sorry to bother you again, today I updated conda and circle-map, but latest version of circle-map report an error saying that
Circle-Map ReadExtractor -i qname_unknown_circle.bam -o circular_read_candidates.bam
Traceback (most recent call last):
File "/home/yjy/anaconda3/bin/Circle-Map", line 6, in
from circlemap.circle_map import main
File "/home/yjy/anaconda3/lib/python3.8/site-packages/circlemap/circle_map.py", line 39, in
from circlemap.simulations import sim_ecc_reads
File "/home/yjy/anaconda3/lib/python3.8/site-packages/circlemap/simulations.py", line 30, in
from Bio.Alphabet import generic_dna
File "/home/yjy/anaconda3/lib/python3.8/site-packages/Bio/Alphabet/init.py", line 20, in
raise ImportError(
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type
as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information.
./circle-map-half.sh: line 7: samtools: command not found
./circle-map-half.sh: line 8: samtools: command not found
./circle-map-half.sh: line 9: samtools: command not found
Traceback (most recent call last):
File "/home/yjy/anaconda3/bin/Circle-Map", line 6, in
from circlemap.circle_map import main
File "/home/yjy/anaconda3/lib/python3.8/site-packages/circlemap/circle_map.py", line 39, in
from circlemap.simulations import sim_ecc_reads
File "/home/yjy/anaconda3/lib/python3.8/site-packages/circlemap/simulations.py", line 30, in
from Bio.Alphabet import generic_dna
File "/home/yjy/anaconda3/lib/python3.8/site-packages/Bio/Alphabet/init.py", line 20, in
raise ImportError(
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type
as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information.
I have seen other issue indicated that installing biopyhon=1.77 may be helpful, but it didn't work for me. So I am confused again (sigh)
Thanks for providing this handy program for calling circular DNA.
I am running the Circle-Map Realign with a -t 40 option on the HPC. In the beginning, everything looks fine, the temp bed files keep updating and the CPU load is normal. However, to a certain point, the node became idle and the program only use very low CPU and the temp bed file was not updating.
Till I write this issue, the program is still running. And the temp bed file stays the same size for around 10 hours. I wlll let you know if the program end up finished or failed. Thanks!
When I run Circle-Map ReadExtractor in a conda environment use the following command:
Circle-Map ReadExtractor -i SRR6315400_q.bam -o SRR6315400_q_cir.bam
there is something wrong occurred:
Traceback (most recent call last):
File "/usr/local/bin/Circle-Map", line 8, in
sys.exit(main())
File "/usr/local/lib/python2.7/dist-packages/circlemap/circle_map.py", line 1164, in main
run = circle_map()
File "/usr/local/lib/python2.7/dist-packages/circlemap/circle_map.py", line 130, in init
object.extract_sv_circleReads()
File "/usr/local/lib/python2.7/dist-packages/circlemap/extract_circle_SV_reads.py", line 115, in extract_sv_circleReads
if (processed_reads/1000000).is_integer() == True:
AttributeError: 'int' object has no attriute 'is_integer'
I tried multiple methods, while it still didn't work out. Could you have any suggestions?
Dear iprada,
Recently, I am trying to use Circle-Map to perform eccRNA calling, I followed the tutorial you posted here using Realign module, my issues are:
what's the difference between Realign and Repeats modules, which one is the best to calling eccDNA using WGS data?
How to speed up the last step of Realign? I used the following codes as:
/public/software/Anaconda3/bin/Circle-Map Realign
-t 20
-i S1_sort_circular_read_candidates.bam
-qbam S1_qname_unknown_circle.bam
-sbam S1_sorted_unknown_circle.bam
-fasta /public/genomes/Hsapiens/hg38/seq/hg38.fa
-o S1_unknown_circle.bed
&> S1_Circle-Map_detect.log
I used HPC, the single node I requested "nodes=1:ppn=20,mem=64g"
I can get the bed files for eccDNA, however, always need 5 days or more to run the last step.
Should I downsample the bam file?
Thanks so much!
Juncheng
Hi,
I am following the https://github.com/iprada/Circle-Map/wiki/Tutorial:-Identification-of-circular-DNA-using-Circle-Map-Realign to learn how to use this tool, but it failed in the 3rd step.
$ Circle-Map ReadExtractor -i qname_unknown_circle.bam -o circular_read_candidates.bam
[E::idx_find_and_load] Could not retrieve index file for '/public/home/zhaoqi/test/Circle-Map/tutorial/qname_unknown_circle.bam'
Extracting circular structural variants
finished extracting reads. Elapsed time: 0.00025257269541422524 mins
Thanks for using Circle-Map
Any suggestions?
Hello,
May I ask what criteria do you use to filter the output of Circle-Map Realign?
Circle-Map has reported 7102 circles in my plant genome and it seems to contain a lot of false positives. Should I maily filter by split reads number that at least 1 discordant reads and 5 split reads in the site?
cat Circle-Map.output |awk '$4>=1 & $5 >=5'
Best wishes,
Panpan
Dear iprada
I extracted circular DNA fragments from human somatic tissue, sequenced them, and then try to align data to the ref genome (hg38) using bwa mem. But bwa mem always skipped orientation (FF/RF/RR). Other posts also reflect such questions, but they all have the following "as there are not enough pairs by bwa mem" hints. My alignments were just "skip orientation FF; skip orientation RF; skip orientation RR". However, I can still get a whole bunch of sam file, so I just wonder whether these skips matter or not? Do they affect the circle-map analysis?
Thanks a lot
Here is the head of raw sequencing data
@A00582:345:HFVCWDSXY:2:1101:1506:1094 1:N:0:CGTACTAG+TAGATCGC
CAATTAAAACTGACTACAAAAAGAAAATATTGCATTGTAAAATAATAAAAGCATGTAAATGCTTTATAAATTTTATAGGCTATTTTCTGAGTAACTTTCCCATGATTCCCCGGTTCTGTGCTATATGGTAGCATTGCTGGAACCGGAAGT
+
:F,FFFFFFFFFFF,F:F:FFFF:FF:FF:FF,,FFFFFF:,FFFFFF:F::FFFF:FFFFFF:FFF:FF,FFFFFFF:FFFFF:F:FFFFF,FFFFF,:FFFFFFF,,FF:FFF:FF:,,,FFFFFFFFF,:FF,FF:,:F:,FF::,F
@A00582:345:HFVCWDSXY:2:1101:3152:1094 1:N:0:CGTACTAG+TAGATCGC
GTGAAGAGCTGCATTAGGAATCTTAAGGTGGAGGTTGGGGTAGGTGGCTTGAGCTGTCTCTTATACACATCTCCCAGCCCACGAGACCGTACTAGATCGCGTATGCCGTCGTCTTCGTGCACAGGGGGGGGGGGGGGGGGGGGGGGGGGG
+
FFF:FF,FFFFFFFFFFFFFFFF:FFFF::FFF,:FFF,F,FFFFFFFFFFFFFFF:FFFF:FFFFFFFFF:FFFFFFFFFFFFFFF:FF,FFFFFFF,FFFFFF,FFFF,:,,,,,:,,:,,,,:FFFFFFFFFFFFFFFFFFFFFFFF
@A00582:345:HFVCWDSXY:2:1101:4562:1094 1:N:0:CGTACTAG+TAGATCGC
GGATTAGAGGCAGTGATCTACACATTCATTAAAGAAGCATTGAAGTAAATTATGAATCCCGTGATGCATATTGAATCTGTCTCTTATACACATCTCCGAGCCCACGAGACCGTACTAGATCTCGTAAGCCGTCTTCTGCGTGAAAAGGGG
And here is the head of running status
https://ibb.co/4TgrMSz
Hello iprada,
I have tried CIRCexplorer2 to detect eccDNA as mentioned in your paper, but I got few eccDNA output. I guess I misused the command of CIRCexplorer2. Could you please tell me the detail command to run CIRCexplorer2 to detect eccDNA, if the input is 2 fastq files or the sam file.
Best wishes,
Gu
Hi sir:
I meet a error
bwa mem -q hg38.fa unknown_circle_reads_1.fastq unknown_circle_reads_2.fastq > unknown_circle.sam
mem: invalid option -- 'q'
Hi Inigo,
While running my samples, I encountered some warnings that I believe come from divide-by-zero during certain coverage calculations.
/home/michael/circles/.snakemake/conda/68e54810/lib/python3.8/site-packages/circlemap/Coverage.py:149: RuntimeWarning: invalid value encountered in ulong_scalars
start_coverage_ratio = np.sum(region_array[0:self.ilen]) / np.sum(
/home/michael/circles/.snakemake/conda/68e54810/lib/python3.8/site-packages/circlemap/Coverage.py:151: RuntimeWarning: invalid value encountered in ulong_scalars
end_coverage_ratio = np.sum(region_array[-self.ilen:]) / np.sum(ext_array[-(self.ilen + self.ext):])
/home/michael/circles/.snakemake/conda/68e54810/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3372: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/home/michael/circles/.snakemake/conda/68e54810/lib/python3.8/site-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
/home/michael/circles/.snakemake/conda/68e54810/lib/python3.8/site-packages/numpy/core/_methods.py:233: RuntimeWarning: Degrees of freedom <= 0 for slice
ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
/home/michael/circles/.snakemake/conda/68e54810/lib/python3.8/site-packages/numpy/core/_methods.py:194: RuntimeWarning: invalid value encountered in true_divide
arrmean = um.true_divide(
/home/michael/circles/.snakemake/conda/68e54810/lib/python3.8/site-packages/numpy/core/_methods.py:226: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
The *.circle.bed output is still generated successfully.
Are these warnings expected?
Is it safe to ignore them?
Thanks,
Michael
Dear iprada
After using Circle-Map Realign, I can get many potential ecDNAs in .bed file. But can I get counts of a specific ecDNA by Circle-Map? or I need other software? Thanks a lot
I am new to eccDNA identification .After reading the two articles of Circle-Map and tutorials, I have some questions of using Repeats subcommand : Why Repeats need not to extract discordant paired reads and split reads before calling circle DNA ? and how should I combine Repeats and Realign these two commands to eccDNA calling?
And finally,I find a words that circular DNA interval overlapping more than 50% will be merged iterablely in the article on Nucleic acid Research,how this process is implimented? Is there a software to do this? Sorry for so much questions,Looking forward for your replay!
Best wishes
Circle-Map Output File
Column #5, Note #2
I have submitted my two bash jobs of Circle-Map to the server with 16 and 8 cpu's but still it has not finished. Please tell me whats wrong with it?
`JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
35484 skabdul RUN normal mn02 16*c02n08 *rcle-Map1 Jul 19 09:46
31157 skabdul RUN normal mn02 8*c05n03 Circle-Map Jul 18 15:28`
My pipeline is :
Circle-Map Realign -t 16 -i sort_circular_read_candidates.bam -qbam qname_M1_map.bam -sbam sorted_M1_map.bam -fasta Zea_mays.AGPv4.dna.toplevel.fa -o outcircle/my_circle_DNA.bed
Dear,
I have read your paper published on Bioinformatics (https://doi.org/10.1186/s12859-019-3160-3), and thank you for your outstanding work. But I noticed that, according to some literature reports, the sequences on ecDNA may not correspond continuously to the reference genome. For example, Structurally, EGFRvIII included ecDNA dont contain several exons (1,2), Can circle-map accurately identify this variation and display it in the output?
When I used samtools index to index the bam file, I found that an error would be reported. The species is Ginkgo biloba and the chromosome length is longer than that of humans. It cannot be constructed. And I use the -c parameter to form a csi file, and I want to further use the eccDNA identification software Circle- Map software cannot analyze further and report errors. There is a solution.
The csi file cannot be used for the next step of Circle-Map eccDNA recognition
Hi,
When I follow the tutorial for identification of circular DNA using Circle Map Realign,I encounter an error.
My command Input:
Circle-Map Realign -i 019_sort_circular_read_candidates.bam -qbam 019_qname.bam -sbam 019_sort.bam -fasta 019_new.fna -o 019_circle.bed
The standard output gives an error:
2020-09-03 09:08:43: Realigning reads using Circle-Map
2020-09-03 09:08:43: Clustering structural variant reads
2020-09-03 09:08:44: Splitting clusters to to processors
[E::idx_find_and_load] Could not retrieve index file for '019_qname.bam'
0%| Traceback (most recent call last): | 0/100 [00:00<?, ?it/s]
File "/beegfs/home/lxd/miniconda3/envs/pgcgap/lib/python3.7/site-packages/circlemap/realigner.py", line 222, in realign
plus_base_freqs = background_freqs(plus_coding_interval)
File "/beegfs/home/lxd/miniconda3/envs/pgcgap/lib/python3.7/site-packages/circlemap/utils.py", line 876, in background_freqs
return{nucleotide: seq.count(nucleotide)/len(seq) for nucleotide in 'ATCG'}
File "/beegfs/home/lxd/miniconda3/envs/pgcgap/lib/python3.7/site-packages/circlemap/utils.py", line 876, in <dictcomp>
return{nucleotide: seq.count(nucleotide)/len(seq) for nucleotide in 'ATCG'}
ZeroDivisionError: division by zero
/beegfs/home/lxd/miniconda3/envs/pgcgap/lib/python3.7/site-packages/circlemap/realigner.py:391: UserWarning: Failed on interval chrom 019_contig161
start 25534
end 25834
Name: 2, dtype: object due to the error division by zero
str(interval), str(e)))
1%|█▎ | 1/100 [00:01<02:59, 1.81s/it]
2020-09-03 09:08:46: An error happenend during execution. Exiting
0it [00:01, ?it/s]
Thanks a lot, hope to get your suggestions!
Dear Inigo:
Hello! About Circle-Map, I use Circle-Map version=1.1.3 to identify eccDNA, Circle-Map Realign command to get the result, I want to get the sequence of the spite reads of each eccDNA in the sample and the sequence of Discordants reads, is there any instruction or way ? I check the Circle-Map ReadExtractor parameters, and I can get Reads indicating circular DNA structural variants. Discordants reads and spite reads cannot be output directly. Is there a way to extract the two types of read sequences corresponding to eccDNA based on the *_sort_circular_read_candidates.bam file? Looking forward to your reply.Would like to ask you ! Many thanks !
Inigo,
This is possibly more a question about the structure of sam/bam files (I am a beginner in bioinformatics). If so, I apologize in advance.
I have been using Circle-Map to analyze my mtDNA sequencing runs, and also to see if I can identify circular DNA from other chr's in my circle-enriched preparations.
In my last prep, Circle-Map assigned some very high circle scores to regions with no discordant pairs (see orange shading). Also coverage was highly variable across the region. So you might want to consider assigning more scoring weight to coverage uniformity of potential circles.
I did find one other region with very low coverage but which had both discordant reads and split reads (see yellow shaded entry above). I wanted to see the details of these reads and so I inspected the information for the individual reads with IGV. Here is an IGV image of one of the discordant (and split) reads. The read information for the read bracketed in red is shown in the next image.
My confusion is that IGV says that the read is hard-clipped. I had the impression that Circle-Map would ignore hard-clipped reads. Is that correct? Or is it the case that Circle-Map will only looks for supplementary alignments, whether the read is hard or soft clipped?
Thanks,
Chris
Inigo,
I am developing sample prep application for mtDNA sequencing that should also be useful for ecDNA discovery. I lyse cells in the sample well of an electrophoresis cassette (SageHLS, Sage Science, Inc.) and run DNA into the gel under conditions where all genomic DNA becomes trapped in the wall of the sample well, and smaller DNA continues into the gel. The smaller DNA is purified by electroelution, and we sequence it on a Miseq.
In our first experiment, we got a lot of mtDNA sequence that aligned at high quality. I ran Circle-Map on our data and sorted the bed file for high scoring circles using column 6.
As expected, the mtDNA sequence stood out as the highest score (203000 in column 6) with very high numbers of discordant reads and split reads.
However there was also one other high quality call on chr16. In further inspection, this potential circle would have been 11Mb in size (and included the centromere of chr16). I am doubtful that DNA of this size would enter our gel cassette. On inspection of the sequence boundaries, it appears that the right boundary turns out to be a segmental duplication of the left boundary. So it is a kind "false positive" signal, although it has the right general structure at the calculated joint to generate discordant reads and split reads that would be characteristic of a circular joint. As you pointed out in your tutorial, the poor coverage (0.82 uncovered sequence, column 11), would be a giveaway that something was not right. In my case, the size of the circle was also problematic, insofar as we were using a gel system of limited size capacity.
I bring this up because I suspect the boundaries of segmental duplications may cause false "circle" signals for other users. I don't have any hot ideas for how to solve this, except perhaps to incorporate coverage data (column 11) in the circle score (column 6).
Best regards,
Chris
When running Circle-Map Realign in a conda environment
Circle-Map Realign -t 24 -i sorted_candidates.A.bam -qbam qcircle.A.bam -sbam sorted_circle.A.bam -fasta ref.fa -o final_circle.A.bed
I get an OSError:
File "/global/home/users/diplockn/.conda/envs/circle/lib/python3.7/site-packages/pybedtools/bedtool.py", line 3256, in cat
TMP.write(str(f))
OSError: [Errno 28] No space left on device
This doesn't appear to be a ram issue since the cluster I'm running on does not give an out of memory error
I get the same error when using python /global/home/users/diplockn/Circle-Map/circlemap/circle_map.py Realign -t 24 ...
Hello,
I have followed the steps in the Realign tutorial using my data up to the last step. I am trying to run it as follows:
Circle-Map Realign -t 3 -i sorted_circular_read_candidates_NPA.bam -qbam NPA_sorted_qname.bam -sbam NPA_sorted_coord.bam -fasta mm10/mm10.fa -o NPA_circle.bed -dir /scratch/mariacas_root/mariacas99/hartlama/
But I keep getting the same error (below). I know the file is in this location, and I'm able to view it using samtools.
[E::hts_open_format] Failed to open file "sorted_circular_read_candidates_NPA.bam" : No such file or directory
2021-03-22 14:45:18: Realigning reads using Circle-Map
2021-03-22 14:45:18: Clustering structural variant reads
Traceback (most recent call last):
File "/home/hartlama/.local/bin/Circle-Map", line 8, in
sys.exit(main())
File "/home/hartlama/.local/lib/python3.9/site-packages/circlemap/circle_map.py", line 1164, in main
run = circle_map()
File "/home/hartlama/.local/lib/python3.9/site-packages/circlemap/circle_map.py", line 137, in init
splitted, sorted_bam, begin = start_realign(self.args.i, self.args.output, self.args.threads,
File "/home/hartlama/.local/lib/python3.9/site-packages/circlemap/utils.py", line 1329, in start_realign
eccdna_bam = ps.AlignmentFile("%s" % circle_bam, "rb")
File "pysam/libcalignmentfile.pyx", line 742, in pysam.libcalignmentfile.AlignmentFile.cinit
File "pysam/libcalignmentfile.pyx", line 941, in pysam.libcalignmentfile.AlignmentFile._open
FileNotFoundError: [Errno 2] could not open alignment file sorted_circular_read_candidates_NPA.bam
: No such file or directory
Is there something wrong with my file?
A small subset of samples were failing with:
circlemap/extract_circle_SV_reads.py", line 123, in extract_sv_circleReads
if read.is_read2 and read.qname == read1.qname:
AttributeError: 'str' object has no attribute 'qname'
The loop structure for iterating over the bam file in both utils.py: insert_size_dist()
and extract_circle_SV_reads.py: extract_sv_circleReads()
assumes paired reads, alternating read1-read2-read1-read2.
The case of an unpaired read is handled correctly, as long as any read1 has already been encountered. The first read1 overwrites the initialization string read1 = ''
to a pysam alignment.
However, if a read2 is encountered before any read1, then read1 is still an empty string with no attribute qname
.
My fix was to edit utils.py
and extract_circle_SV_reads.py
such that:
if read.is_read1:
read1 = read
# Checks for initialization read1 = '' by looking for read1 type == string.
# read1 type == pysam.Alignment after first read.is_read1 is encountered
# Condition should only be met if unpaired read2 begins BAM file
elif isinstance(read1, str):
pass
else:
if read.is_read2 and read.qname == read1.qname:
...
An alternative fix that would probably run faster would be to initialize read1 as type pysam.alignment with read1.qname = "fakename". This would remove the elif statement from each loop iteration, but I don't know which pysam function could be used to generate a fake read / qname like that.
If I find time, I'll try to roll my collection of small fixes into a PR.
Cheers,
Michael
Hello there,
I would like to ask, before processing the circle-seq data, your process is directly compared with bwa. I would like to ask whether to use picard to remove duplicates and merge samples for raw data.
Looking forward to your answer.
Good luck!
Jiang Xiaoyu
Dear,
I am using Circle-map to predict ecDNA,My data is download from SRA (for example, https://www.ncbi.nlm.nih.gov/sra/?term=SRR9089606). I firstly downsample this data to about 10X by seqtk like this:
seqtk sample -2 -s606 {input.r1} 103333333 | gzip -c > {output.out1}
seqtk sample -2 -s606 {input.r2} 103333333 | gzip -c > {output.out2}
But at the finally step, Circle-Map Realign report so many warnings like this:
UserWarning: WARNING: Could not compute the probability for the mate interval priors [['chr4', 63670302, 63670344, 'DR', 'R', '0.999999'], ['chr4', 63670302, 63670344, 'DR', 'R', '0.999999']] due to the following error cannot convert float NaN to integer
You can download and review my pipline written by snakemake from https://send.firefox.com/download/ae3406fc3df56d2a/#MvYyGhVOUUgspILl8UWUAQ , and could you please give me some advises about this problem, thanks for your attention and kindly help.
Best.
Zhang.
Hi,
It's really interesting tool to identify circle DNA from DNAseq data. But, recently, we are considered with the variety of circle DNA length and length of insenrtion of reads, which would influence the usage of quanlity control tools. These library sizes of circle DNA were like ATACseq results, the inseration length of DNA fragments could be less than 50bp or more than 150bp. And in ATACseq quality control, we were suggested to use NGmerge, which could keep the short reads reminding, to cut adapters and filter reads, instead of fastqc (or other trim tools in standard pipeline) which will remove shorted reads directly.
So, we also want to know whether trim tools we need to use to finish quality control before alignment.
Thanks!
Hello,
I am following the Realign tutorial and am hoping to combine 3 samples in my analysis. My question is: at which step should I merge the .sam/.bam files? Right before I use the Circle-Map ReadExtractor command??
Thank you
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.