iprada / circle-map Goto Github PK

View Code? Open in Web Editor NEW

60.0 6.0 18.0 3.84 MB

A method for circular DNA detection based on probabilistic mapping of ultrashort reads

License: MIT License

Python 100.00%

eccdna ecdna genotyping ngs structural-variation circular-dna circrna circrna-prediction circrnas microdna

circle-map's Issues

NotImplementedError: "sortBed"

Hello,

I installed Circle-Map via conda and encountered the following problem:

Computing the coverage of the identified eccDNA
Merging intervals for coverage computation
Traceback (most recent call last):
  File "/public/jxyue/Projects/ecDNA/build/circlemap_conda_env/bin/Circle-Map", line 10, in <module>
    sys.exit(main())
  File "/public/jxyue/Projects/ecDNA/build/circlemap_conda_env/lib/python3.7/site-packages/circlemap/circle_map.py", line 1164, in main
    run = circle_map()
  File "/public/jxyue/Projects/ecDNA/build/circlemap_conda_env/lib/python3.7/site-packages/circlemap/circle_map.py", line 199, in __init__
    output = coverage_object.compute_coverage(coverage_object.get_wg_coverage())
  File "/public/jxyue/Projects/ecDNA/build/circlemap_conda_env/lib/python3.7/site-packages/circlemap/Coverage.py", line 98, in compute_coverage
    for cov_dict,header_dict in cov_generator:
  File "/public/jxyue/Projects/ecDNA/build/circlemap_conda_env/lib/python3.7/site-packages/circlemap/Coverage.py", line 60, in get_wg_coverage
    merged_bed = self.bed.sort().merge()
  File "/public/jxyue/Projects/ecDNA/build/circlemap_conda_env/lib/python3.7/site-packages/pybedtools/bedtool.py", line 917, in decorated
    result = method(self, *args, **kwargs)
  File "/public/jxyue/Projects/ecDNA/build/circlemap_conda_env/lib/python3.7/site-packages/pybedtools/bedtool.py", line 240, in not_implemented_func
    raise NotImplementedError(help_str)
NotImplementedError: "sortBed" does not appear to be installed or on the path, so this method is disabled.  Please install a more recent version of BEDTools and re-import to use this method.

I checked the file "/public/jxyue/Projects/ecDNA/build/circlemap_conda_env/lib/python3.7/site-packages/pybedtools/bedtool.py" and clearly sortBed has been defined in it. So I don't know why this problem occurs. Also, this problem persists even after I explicitly installed latest version of bedtools and pybedtools via conda.

Thanks in advance.

Best,
Jia-Xing

About Circle-Map Realign calculation

Dear Author：
I encountered a problem when using Circle-Map. I found that when running Circle-Map Realign, the progress bar is always 0%, as shown below. I thought it was a problem of building a database, but later I found it was not. I followed the instructions in your tutorial, but it was not feasible. I want to ask you, is there any other reason? The key is that it does not report an error, I have no clue! Looking forward to your reply!

Realign KeyError

Hello,

I am running Circle-Map Realign and receive an error message. Here is my output:

/bin/sh: bedtools: command not found
/bin/sh: mergeBed: command not found
[E::idx_find_and_load] Could not retrieve index file for '/scratch/mariacas_root/mariacas99/hartlama/NPA_sorted_qname.bam'
2021-03-22 15:39:40: Realigning reads using Circle-Map

2021-03-22 15:39:40: Clustering structural variant reads

2021-03-22 15:39:40: Splitting clusters to to processors

0%| | 0/300 [00:00<?, ?it/s]

0it [00:00, ?it/s]�[A
1%| | 3/300 [00:00<00:10, 27.79it/s]

3it [00:00, 27.93it/s]�[A
2%|▏ | 7/300 [00:00<00:09, 32.38it/s]

7it [00:00, 32.45it/s]�[A
4%|▎ | 11/300 [00:00<00:08, 35.22it/s]

11it [00:00, 35.26it/s]�[A
5%|▌ | 16/300 [00:00<00:07, 36.80it/s]

16it [00:00, 36.80it/s]�[A
7%|▋ | 21/300 [00:00<00:07, 38.81it/s]

21it [00:00, 38.82it/s]�[A
8%|▊ | 25/300 [00:00<00:07, 36.18it/s]

25it [00:00, 36.20it/s]�[A
10%|█ | 30/300 [00:00<00:07, 38.27it/s]

30it [00:00, 38.28it/s]�[A
11%|█▏ | 34/300 [00:00<00:07, 38.00it/s]

34it [00:00, 38.01it/s]�[A
13%|█▎ | 39/300 [00:01<00:06, 41.32it/s]

39it [00:01, 41.33it/s]�[A
15%|█▍ | 44/300 [00:01<00:06, 39.19it/s]

44it [00:01, 39.19it/s]�[A
16%|█▌ | 48/300 [00:01<00:06, 39.29it/s]

48it [00:01, 39.29it/s]�[A
18%|█▊ | 53/300 [00:01<00:06, 40.25it/s]

53it [00:01, 40.25it/s]�[A
19%|█▉ | 58/300 [00:01<00:06, 40.05it/s]

58it [00:01, 40.05it/s]�[A
21%|██ | 63/300 [00:01<00:05, 39.95it/s]

63it [00:01, 39.95it/s]�[A
23%|██▎ | 68/300 [00:01<00:05, 42.02it/s]

68it [00:01, 42.02it/s]�[A
24%|██▍ | 73/300 [00:01<00:05, 40.69it/s]

73it [00:01, 40.69it/s]�[A
26%|██▌ | 78/300 [00:02<00:05, 38.99it/s]

78it [00:02, 38.99it/s]�[A
27%|██▋ | 82/300 [00:02<00:05, 37.94it/s]

82it [00:02, 37.94it/s]�[A
29%|██▉ | 87/300 [00:02<00:05, 38.68it/s]

87it [00:02, 38.67it/s]�[A
31%|███ | 92/300 [00:02<00:05, 40.76it/s]

92it [00:02, 40.76it/s]�[A
32%|███▏ | 97/300 [00:02<00:05, 39.99it/s]

97it [00:02, 39.99it/s]�[A
34%|███▍ | 102/300 [00:02<00:05, 38.90it/s]

102it [00:02, 38.90it/s]�[A
36%|███▌ | 107/300 [00:02<00:05, 38.36it/s]

107it [00:02, 38.36it/s]�[A
37%|███▋ | 112/300 [00:02<00:04, 40.58it/s]

112it [00:02, 40.57it/s]�[A
39%|███▉ | 117/300 [00:02<00:04, 40.21it/s]

117it [00:02, 40.21it/s]�[A
41%|████ | 122/300 [00:03<00:04, 40.63it/s]

122it [00:03, 40.63it/s]�[A
42%|████▏ | 127/300 [00:03<00:04, 41.45it/s]

127it [00:03, 41.45it/s]�[A
44%|████▍ | 132/300 [00:03<00:04, 39.08it/s]

132it [00:03, 39.08it/s]�[A
45%|████▌ | 136/300 [00:03<00:04, 39.09it/s]

136it [00:03, 39.09it/s]�[A
47%|████▋ | 141/300 [00:03<00:04, 39.71it/s]

141it [00:03, 39.71it/s]�[A
49%|████▊ | 146/300 [00:03<00:03, 41.04it/s]

146it [00:03, 41.04it/s]�[A
50%|█████ | 151/300 [00:03<00:03, 40.82it/s]

151it [00:03, 40.81it/s]�[A
52%|█████▏ | 156/300 [00:03<00:03, 41.06it/s]

156it [00:03, 41.05it/s]�[A
54%|█████▎ | 161/300 [00:04<00:03, 40.60it/s]

161it [00:04, 40.60it/s]�[A
55%|█████▌ | 166/300 [00:04<00:03, 42.18it/s]

166it [00:04, 42.18it/s]�[A
57%|█████▋ | 171/300 [00:04<00:03, 40.95it/s]

171it [00:04, 40.95it/s]�[A
59%|█████▊ | 176/300 [00:04<00:02, 41.77it/s]

176it [00:04, 41.77it/s]�[A
60%|██████ | 181/300 [00:04<00:02, 41.10it/s]

181it [00:04, 41.10it/s]�[A
62%|██████▏ | 186/300 [00:04<00:02, 40.16it/s]

186it [00:04, 40.16it/s]�[A
64%|██████▎ | 191/300 [00:04<00:02, 40.76it/s]

191it [00:04, 40.76it/s]�[A
65%|██████▌ | 196/300 [00:04<00:02, 40.94it/s]

196it [00:04, 40.94it/s]�[A
67%|██████▋ | 201/300 [00:05<00:02, 41.02it/s]

201it [00:05, 41.02it/s]�[A
69%|██████▊ | 206/300 [00:05<00:02, 40.98it/s]

206it [00:05, 40.98it/s]�[A
70%|███████ | 211/300 [00:05<00:02, 41.35it/s]

211it [00:05, 41.35it/s]�[A
72%|███████▏ | 216/300 [00:05<00:02, 40.54it/s]

216it [00:05, 40.52it/s]�[A
74%|███████▎ | 221/300 [00:05<00:01, 41.21it/s]

221it [00:05, 41.21it/s]�[A
75%|███████▌ | 226/300 [00:05<00:01, 39.83it/s]

226it [00:05, 39.83it/s]�[A
77%|███████▋ | 231/300 [00:05<00:01, 39.87it/s]

231it [00:05, 39.87it/s]�[A
79%|███████▊ | 236/300 [00:05<00:01, 42.01it/s]

236it [00:05, 42.00it/s]�[A
80%|████████ | 241/300 [00:06<00:01, 41.64it/s]

241it [00:06, 41.64it/s]�[A
82%|████████▏ | 246/300 [00:06<00:01, 40.68it/s]

246it [00:06, 40.68it/s]�[A
84%|████████▎ | 251/300 [00:06<00:01, 40.03it/s]

251it [00:06, 40.02it/s]�[A
85%|████████▌ | 256/300 [00:06<00:01, 39.99it/s]

256it [00:06, 39.99it/s]�[A
87%|████████▋ | 261/300 [00:06<00:00, 39.90it/s]

261it [00:06, 39.88it/s]�[A
88%|████████▊ | 265/300 [00:06<00:00, 39.55it/s]

265it [00:06, 39.56it/s]�[A
90%|█████████ | 270/300 [00:06<00:00, 39.10it/s]

270it [00:06, 39.10it/s]�[A
92%|█████████▏| 275/300 [00:06<00:00, 41.71it/s]

275it [00:06, 41.69it/s]�[A
93%|█████████▎| 280/300 [00:07<00:00, 40.67it/s]

280it [00:07, 40.68it/s]�[A
95%|█████████▌| 285/300 [00:07<00:00, 40.55it/s]

285it [00:07, 40.56it/s]�[A
97%|█████████▋| 290/300 [00:07<00:00, 38.49it/s]

290it [00:07, 38.49it/s]�[A
98%|█████████▊| 295/300 [00:07<00:00, 39.31it/s]

295it [00:07, 39.31it/s]�[A
100%|██████████| 300/300 [00:07<00:00, 40.57it/s]

300it [00:07, 40.57it/s]�[A
300it [00:07, 39.98it/s]

100%|██████████| 300/300 [00:07<00:00, 39.97it/s]
2021-03-22 15:39:51: Writting final output to disk
Traceback (most recent call last):
File "/home/hartlama/.local/bin/Circle-Map", line 33, in
sys.exit(load_entry_point('Circle-Map==1.1.5', 'console_scripts', 'Circle-Map')())
File "/home/hartlama/.local/lib/python3.9/site-packages/circlemap/circle_map.py", line 1165, in main
run = circle_map()
File "/home/hartlama/.local/lib/python3.9/site-packages/circlemap/circle_map.py", line 187, in init
output = merge_final_output(self.args.sbam, self.args.output, begin, self.args.split,
File "/home/hartlama/.local/lib/python3.9/site-packages/circlemap/utils.py", line 1266, in merge_final_output
second_merging_round = unparsed_pd.sort_values(['chrom', 'start', 'end']).reset_index()
File "/home/hartlama/.local/lib/python3.9/site-packages/pandas/core/frame.py", line 5442, in sort_values
keys = [self._get_label_or_level_values(x, axis=axis) for x in by]
File "/home/hartlama/.local/lib/python3.9/site-packages/pandas/core/frame.py", line 5442, in
keys = [self._get_label_or_level_values(x, axis=axis) for x in by]
File "/home/hartlama/.local/lib/python3.9/site-packages/pandas/core/generic.py", line 1684, in _get_label_or_level_values
raise KeyError(key)
KeyError: 'chrom'

ecDNA only from Chr10 and Chr11

Hi:
I am using Circle-Map to detect ecDNA from circle-seq data(paired-end).
However, strangely, the output bedfiles show that ecDNAs only exist in Chr10 and Chr11, and another sample only in Chr10,11,12,13,14,15. I don't know if it is the truth or is caused by some errors. My input BAM files contains all the chromosome. And I check the log files, it shows some warnings:

Can these warnings contribute to the results above?

Would circle-map works for Nanopore reads?

Hi,
I read your interesting paper and wanted to try this pipeline. I have some circular DNA reads produced by the Circle-seq method and sequenced by nanopore. I am wondering if this pipeline would work for Nanopore sequencing data with relatively low coverage?
Thanks

Long runtime with Circle Realign?

Is there any way to speed up Circle Realign? I'm running with 10 processors, 15GB each, on a server, but I still have some iterations taking 1-3 hours. My sorted_read_candidates.bam file is 16GB. Any suggestions?

Is there a way to tag the split and discordant reads in bam files of each called ecDNA ?

Hi,
Just wonder is there an easy way to mark those evidence reads so that they can be viewed through IGV ? Thanks!

a question

Naive question. You don't need to unzip the human references to index or align with bwa, right?
Same with the samtools indexing, right?

Chris

Interpreting runtime warnings

After successfully running cirlcemap as outlined in the tutorial, I get these run time errors in the job output file and am unsure how to interpret them:

/global/home/users/diplockn/.local/lib/python3.6/site-packages/circlemap/Coverage.py:150: RuntimeWarning: invalid value encountered in ulong_scalars
ext_array[0:(self.ilen + self.ext)])
/global/home/users/diplockn/.local/lib/python3.6/site-packages/circlemap/Coverage.py:151: RuntimeWarning: invalid value encountered in ulong_scalars
end_coverage_ratio = np.sum(region_array[-self.ilen:]) / np.sum(ext_array[-(self.ilen + self.ext):])
/global/home/users/diplockn/.local/lib/python3.6/site-packages/numpy/core/fromnumeric.py:3335: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
/global/home/users/diplockn/.local/lib/python3.6/site-packages/numpy/core/_methods.py:161: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
/global/home/users/diplockn/.local/lib/python3.6/site-packages/numpy/core/_methods.py:217: RuntimeWarning: Degrees of freedom <= 0 for slice
keepdims=keepdims)
/global/home/users/diplockn/.local/lib/python3.6/site-packages/numpy/core/_methods.py:186: RuntimeWarning: invalid value encountered in true_divide
arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
/global/home/users/diplockn/.local/lib/python3.6/site-packages/numpy/core/_methods.py:209: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)

how to get ecc reads from Circle-map outputs file

After running Circle-map, I can get a bed file of ECC locations. But how can I get the ECC reads? I think not all reads that in the ECC locations are ECC reads.

An error happenend during execution. Exiting

Hi Inigo,

I have multiple samples where Circle-Map will fail with:
"An error happenend during execution. Exiting"

stderr:

[E::idx_find_and_load] Could not retrieve index file for 'output/circle-map/namesort_bams/tissue1.namesort.bam'
  0%|          | 0/800 [00:00<?, ?it/s]
0it [00:00, ?it/s]�[A
  0%|          | 1/800 [00:14<3:17:10, 14.81s/it]
  0%|          | 1/800 [00:14<3:18:24, 14.90s/it]
0it [00:14, ?it/s]

stdout:

2020-10-13 14:06:35: Realigning reads using Circle-Map

2020-10-13 14:06:35: Clustering structural variant reads

2020-10-13 14:25:21: Splitting clusters to to processors

2020-10-13 14:25:38: An error happenend during execution. Exiting

Perhaps this is shared with Issue #35 ? However, the speed isn't my issue, but rather the early exit / run failure. One of my samples exits very quickly, so maybe I'll be able to debug with that sample as my test case. I'll let you know if I have any luck.
When this error occurs, could the call to sys.exit() be modified to return exit code 1? Or any non-zero value? With the default return status of 0, wrapper job scripts around Circle-Map will not be able to detect that an error has occurred.

Warnings in Circle-Map Realign tutorial

Hi Iñigo,

I was performing the Circle-Map Realign tutorial, and encountered a few warnings.

My command:

Circle-Map Realign -t 4 -i sort_circular_read_candidates.bam -qbam qname_unknown_circle.bam -sbam sorted_unknown_circle.bam -fasta $ref_genome -o my_unknown_circle.bed

Note that I am using the hg38-related reference GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna which differs from your tutorial slightly.

Output:

2020-09-17 11:16:14: Realigning reads using Circle-Map

2020-09-17 11:16:14: Clustering structural variant reads

2020-09-17 11:16:37: Splitting clusters to to processors

[E::idx_find_and_load] Could not retrieve index file for 'qname_unknown_circle.bam'
400it [00:14, 27.54it/s]███████████████████████████████████████████████████████████████████████████ | 396/400 [00:14<00:00, 33.09it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 400/400 [00:14<00:00, 27.54it/s]
2020-09-17 11:16:52: Writting final output to disk
2020-09-17 11:16:52: Finished!
2020-09-17 11:16:52: Circle-Map Realign finished indentifying circles in 0.6356727162996928

2020-09-17 11:16:52: Circle-Map has identified 1 circles

Computing the coverage of the identified eccDNA
Merging intervals for coverage computation
/home/michael/.snakemake/conda/68e54810/lib/python3.8/subprocess.py:849: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stderr = io.open(errread, 'rb', bufsize)
/home/michael/.snakemake/conda/68e54810/lib/python3.8/subprocess.py:849: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stderr = io.open(errread, 'rb', bufsize)
Computing coverage on interval chr7:143911106-143917553

The warning
[E::idx_find_and_load] Could not retrieve index file for 'qname_unknown_circle.bam'
Seems to make sense, as I don't think you can index a qname-sorted bam file?

The warnings:

/home/michael/.snakemake/conda/68e54810/lib/python3.8/subprocess.py:849: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stderr = io.open(errread, 'rb', bufsize)
/home/michael/.snakemake/conda/68e54810/lib/python3.8/subprocess.py:849: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stderr = io.open(errread, 'rb', bufsize)

seem to catch and correct themselves, but I am unsure what is causing them.

Is it safe to ignore these warnings?
If so, can they be suppressed or explained in the stdout to avoid panicking me as the user? :-)

Thanks,
Michael

Some doubts about the output of Circle-map

dear Inigo,

I have some doubts about the output result.
I find the discordants column and Split-reads column contain a large number of zero from the output text, and what does these zero mean. if it indicate that no dicordants-reads or split-reads support for these circle DNA.
As shown in Figure 1 in the article,circle DNA were detected from discordant reads and soft-clipped reads, so it will be that one circle DNA have at least one discordant reads.
Is there anything I misunderstood?

Latest biopython version breaks Alphabet dependency

I encountered the error:
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the ``molecule_type`` as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information

This seems to have changed with biopython 1.78
Installing biopython=1.77 fixed this issue for me.

Perhaps update the conda recipe to specify biopython=1.77?

Some columns in the output have no values

Dear Inigo,

 In the project I've analyzed so far, I've found that some rows in the output file(.bed file) have no values in columns 10 and 11. How to explain this situation.  Is it a problem caused by my specific analysis or a normal condition caused by software analysis? I am very sorry to disturb you and look forward to your reply.

Connection refused while downloading the raw data

hello iprada,
I meet some problem when I was downloading the raw data, the command is as follows.

wget https://raw.githubusercontent.com/iprada/Circle-Map/master/tutorial/unknown_circle_reads_1.fastq
wget https://raw.githubusercontent.com/iprada/Circle-Map/master/tutorial/unknown_circle_reads_2.fastq

 it showed "Connection refused...", is the data available now ? Could you please help me and give me some suggestions, thank you for your attention.

Best regards.
Gu,

Circle-Map does not install properly

The conda install is done, and the Circle-Map command is not accessible from bash. Throws "command not found".

Error on coverage calculation of Circle-Map Repeats

Traceback (most recent call last):
  File "/home/iprada/miniconda3/bin/Circle-Map", line 11, in <module>
    load_entry_point('Circle-Map==1.1.1', 'console_scripts', 'Circle-Map')()
  File "/home/iprada/miniconda3/lib/python3.7/site-packages/Circle_Map-1.1.1-py3.7.egg/circlemap/circle_map.py", line 870, in main
  File "/home/iprada/miniconda3/lib/python3.7/site-packages/Circle_Map-1.1.1-py3.7.egg/circlemap/circle_map.py", line 198, in __init__
AttributeError: 'Namespace' object has no attribute 'sbam'

The coverage calculation of the circles found with Circle-Map Repeats fails due to the inconsistencies in the variable names

Is the classic WGS data well compatible with circle-seq

Dear,
I recently read your excellent work on circle-map, thank you for your contribution, but I have some small confusion, I noticed that in the middle part of the paper you used data from Circle-seq to test circle-map , This method enriches eccDNA before sequencing. Is this necessary for circle-map? If not, can the data generated by the classic WGS sequencing process be used? If possible, 1) Will noise from other genomic regions have an adverse effect; 2) Do we need to adopt more relaxed screening criteria for the output of circle-map?
Best.

ImportError: Bio.Alphabet has been removed from Biopython.

Dear iprada
sorry to bother you again, today I updated conda and circle-map, but latest version of circle-map report an error saying that

Circle-Map ReadExtractor -i qname_unknown_circle.bam -o circular_read_candidates.bam

Traceback (most recent call last):
File "/home/yjy/anaconda3/bin/Circle-Map", line 6, in
from circlemap.circle_map import main
File "/home/yjy/anaconda3/lib/python3.8/site-packages/circlemap/circle_map.py", line 39, in
from circlemap.simulations import sim_ecc_reads
File "/home/yjy/anaconda3/lib/python3.8/site-packages/circlemap/simulations.py", line 30, in
from Bio.Alphabet import generic_dna
File "/home/yjy/anaconda3/lib/python3.8/site-packages/Bio/Alphabet/init.py", line 20, in
raise ImportError(
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information.
./circle-map-half.sh: line 7: samtools: command not found
./circle-map-half.sh: line 8: samtools: command not found
./circle-map-half.sh: line 9: samtools: command not found
Traceback (most recent call last):
File "/home/yjy/anaconda3/bin/Circle-Map", line 6, in
from circlemap.circle_map import main
File "/home/yjy/anaconda3/lib/python3.8/site-packages/circlemap/circle_map.py", line 39, in
from circlemap.simulations import sim_ecc_reads
File "/home/yjy/anaconda3/lib/python3.8/site-packages/circlemap/simulations.py", line 30, in
from Bio.Alphabet import generic_dna
File "/home/yjy/anaconda3/lib/python3.8/site-packages/Bio/Alphabet/init.py", line 20, in
raise ImportError(
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information.

I have seen other issue indicated that installing biopyhon=1.77 may be helpful, but it didn't work for me. So I am confused again (sigh)

Realign process running at very low CPU Usage

Thanks for providing this handy program for calling circular DNA.
I am running the Circle-Map Realign with a -t 40 option on the HPC. In the beginning, everything looks fine, the temp bed files keep updating and the CPU load is normal. However, to a certain point, the node became idle and the program only use very low CPU and the temp bed file was not updating.

Till I write this issue, the program is still running. And the temp bed file stays the same size for around 10 hours. I wlll let you know if the program end up finished or failed. Thanks!

AttributeError: 'int' object has no attribute 'is_integer'

When I run Circle-Map ReadExtractor in a conda environment use the following command:
Circle-Map ReadExtractor -i SRR6315400_q.bam -o SRR6315400_q_cir.bam
there is something wrong occurred:
Traceback (most recent call last):
File "/usr/local/bin/Circle-Map", line 8, in
sys.exit(main())
File "/usr/local/lib/python2.7/dist-packages/circlemap/circle_map.py", line 1164, in main
run = circle_map()
File "/usr/local/lib/python2.7/dist-packages/circlemap/circle_map.py", line 130, in init
object.extract_sv_circleReads()
File "/usr/local/lib/python2.7/dist-packages/circlemap/extract_circle_SV_reads.py", line 115, in extract_sv_circleReads
if (processed_reads/1000000).is_integer() == True:
AttributeError: 'int' object has no attriute 'is_integer'

I tried multiple methods, while it still didn't work out. Could you have any suggestions?

How to speed up the last step of Circle-Map Realign?

Dear iprada,

Recently, I am trying to use Circle-Map to perform eccRNA calling, I followed the tutorial you posted here using Realign module, my issues are:

what's the difference between Realign and Repeats modules, which one is the best to calling eccDNA using WGS data?
How to speed up the last step of Realign? I used the following codes as:

/public/software/Anaconda3/bin/Circle-Map Realign
-t 20
-i S1_sort_circular_read_candidates.bam
-qbam S1_qname_unknown_circle.bam
-sbam S1_sorted_unknown_circle.bam
-fasta /public/genomes/Hsapiens/hg38/seq/hg38.fa
-o S1_unknown_circle.bed
&> S1_Circle-Map_detect.log

I used HPC, the single node I requested "nodes=1:ppn=20,mem=64g"
I can get the bed files for eccDNA, however, always need 5 days or more to run the last step.

Should I downsample the bam file?

Thanks so much!

Juncheng

Could not retrieve index file for examples

Hi,

I am following the https://github.com/iprada/Circle-Map/wiki/Tutorial:-Identification-of-circular-DNA-using-Circle-Map-Realign to learn how to use this tool, but it failed in the 3rd step.

$ Circle-Map ReadExtractor -i qname_unknown_circle.bam -o circular_read_candidates.bam
[E::idx_find_and_load] Could not retrieve index file for '/public/home/zhaoqi/test/Circle-Map/tutorial/qname_unknown_circle.bam'
Extracting circular structural variants
finished extracting reads. Elapsed time: 0.00025257269541422524 mins
Thanks for using Circle-Map

Any suggestions?

Criteria for filtering

Hello,
May I ask what criteria do you use to filter the output of Circle-Map Realign?
Circle-Map has reported 7102 circles in my plant genome and it seems to contain a lot of false positives. Should I maily filter by split reads number that at least 1 discordant reads and 5 split reads in the site?

cat Circle-Map.output |awk '$4>=1 & $5 >=5'

Best wishes,
Panpan

Question: Tutorial step 2 skip orientation FF/RF/RR by bwa mem

Dear iprada

I extracted circular DNA fragments from human somatic tissue, sequenced them, and then try to align data to the ref genome (hg38) using bwa mem. But bwa mem always skipped orientation (FF/RF/RR). Other posts also reflect such questions, but they all have the following "as there are not enough pairs by bwa mem" hints. My alignments were just "skip orientation FF; skip orientation RF; skip orientation RR". However, I can still get a whole bunch of sam file, so I just wonder whether these skips matter or not? Do they affect the circle-map analysis?

Thanks a lot

Here is the head of raw sequencing data
@A00582:345:HFVCWDSXY:2:1101:1506:1094 1:N:0:CGTACTAG+TAGATCGC
CAATTAAAACTGACTACAAAAAGAAAATATTGCATTGTAAAATAATAAAAGCATGTAAATGCTTTATAAATTTTATAGGCTATTTTCTGAGTAACTTTCCCATGATTCCCCGGTTCTGTGCTATATGGTAGCATTGCTGGAACCGGAAGT
+
:F,FFFFFFFFFFF,F:F:FFFF:FF:FF:FF,,FFFFFF:,FFFFFF:F::FFFF:FFFFFF:FFF:FF,FFFFFFF:FFFFF:F:FFFFF,FFFFF,:FFFFFFF,,FF:FFF:FF:,,,FFFFFFFFF,:FF,FF:,:F:,FF::,F
@A00582:345:HFVCWDSXY:2:1101:3152:1094 1:N:0:CGTACTAG+TAGATCGC
GTGAAGAGCTGCATTAGGAATCTTAAGGTGGAGGTTGGGGTAGGTGGCTTGAGCTGTCTCTTATACACATCTCCCAGCCCACGAGACCGTACTAGATCGCGTATGCCGTCGTCTTCGTGCACAGGGGGGGGGGGGGGGGGGGGGGGGGGG
+
FFF:FF,FFFFFFFFFFFFFFFF:FFFF::FFF,:FFF,F,FFFFFFFFFFFFFFF:FFFF:FFFFFFFFF:FFFFFFFFFFFFFFF:FF,FFFFFFF,FFFFFF,FFFF,:,,,,,:,,:,,,,:FFFFFFFFFFFFFFFFFFFFFFFF
@A00582:345:HFVCWDSXY:2:1101:4562:1094 1:N:0:CGTACTAG+TAGATCGC
GGATTAGAGGCAGTGATCTACACATTCATTAAAGAAGCATTGAAGTAAATTATGAATCCCGTGATGCATATTGAATCTGTCTCTTATACACATCTCCGAGCCCACGAGACCGTACTAGATCTCGTAAGCCGTCTTCTGCGTGAAAAGGGG

And here is the head of running status
https://ibb.co/4TgrMSz

Command help

Hello iprada,
I have tried CIRCexplorer2 to detect eccDNA as mentioned in your paper, but I got few eccDNA output. I guess I misused the command of CIRCexplorer2. Could you please tell me the detail command to run CIRCexplorer2 to detect eccDNA, if the input is 2 fastq files or the sam file.
Best wishes,
Gu

bwa not work

Hi sir:
I meet a error
bwa mem -q hg38.fa unknown_circle_reads_1.fastq unknown_circle_reads_2.fastq > unknown_circle.sam
mem: invalid option -- 'q'

Realign process seems to freeze

After starting the Realign command, the terminal gives feedback (via progress bar) for about 15-20 minutes, and then freezes. The size of the temporary files also appears constant after this freeze occurs. The terminal view after the freeze looks like this:

Is this normal behavior?

Divide by zero during coverage calculation?

Hi Inigo,

While running my samples, I encountered some warnings that I believe come from divide-by-zero during certain coverage calculations.

/home/michael/circles/.snakemake/conda/68e54810/lib/python3.8/site-packages/circlemap/Coverage.py:149: RuntimeWarning: invalid value encountered in ulong_scalars
  start_coverage_ratio = np.sum(region_array[0:self.ilen]) / np.sum(
/home/michael/circles/.snakemake/conda/68e54810/lib/python3.8/site-packages/circlemap/Coverage.py:151: RuntimeWarning: invalid value encountered in ulong_scalars
  end_coverage_ratio = np.sum(region_array[-self.ilen:]) / np.sum(ext_array[-(self.ilen + self.ext):])
/home/michael/circles/.snakemake/conda/68e54810/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3372: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
/home/michael/circles/.snakemake/conda/68e54810/lib/python3.8/site-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
/home/michael/circles/.snakemake/conda/68e54810/lib/python3.8/site-packages/numpy/core/_methods.py:233: RuntimeWarning: Degrees of freedom <= 0 for slice
  ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
/home/michael/circles/.snakemake/conda/68e54810/lib/python3.8/site-packages/numpy/core/_methods.py:194: RuntimeWarning: invalid value encountered in true_divide
  arrmean = um.true_divide(
/home/michael/circles/.snakemake/conda/68e54810/lib/python3.8/site-packages/numpy/core/_methods.py:226: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)

The *.circle.bed output is still generated successfully.

Are these warnings expected?
Is it safe to ignore them?

Thanks,
Michael

Suspicious feedback during coverage computation

Is this normal? If not, what do I need to do to avoid this?

how to use Circle-map to get counts for a specific ecDNA?

Dear iprada
After using Circle-Map Realign, I can get many potential ecDNAs in .bed file. But can I get counts of a specific ecDNA by Circle-Map? or I need other software? Thanks a lot

Questions of Circle-Map Repeats and Realignment

I am new to eccDNA identification .After reading the two articles of Circle-Map and tutorials, I have some questions of using Repeats subcommand : Why Repeats need not to extract discordant paired reads and split reads before calling circle DNA ? and how should I combine Repeats and Realign these two commands to eccDNA calling?
And finally,I find a words that circular DNA interval overlapping more than 50% will be merged iterablely in the article on Nucleic acid Research,how this process is implimented? Is there a software to do this? Sorry for so much questions,Looking forward for your replay!
Best wishes

Circle-Map Output File

Circle-Map Output File
Column #5, Note #2

If you want to filter the output by read evidence, this is a good column to apply filters to as it provides a direct evidence of the amount of reads that cross the circular DNA breakpoint. In our experiments, we got reliable results applying a filter of at least 2 split reads. However, this is very dependent on the research question you want to answer. As a rule of thumb, we recommend setting a filter of 5-10 split reads if you want to minimize the number of false positives.
-My question is, How can I apply filter on my output bed file to know the reads evidence of my circular DNA?

ecDNA detected only in Chr10 and Chr11

Circle-Map takes longer on server

I have submitted my two bash jobs of Circle-Map to the server with 16 and 8 cpu's but still it has not finished. Please tell me whats wrong with it?
`JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME

35484 skabdul RUN normal mn02 16*c02n08 *rcle-Map1 Jul 19 09:46

31157 skabdul RUN normal mn02 8*c05n03 Circle-Map Jul 18 15:28`

My pipeline is :

Circle-Map Realign -t 16 -i sort_circular_read_candidates.bam -qbam qname_M1_map.bam -sbam sorted_M1_map.bam -fasta Zea_mays.AGPv4.dna.toplevel.fa -o outcircle/my_circle_DNA.bed

Does circle-map work well if ecdna contains structural variation?

Dear，
I have read your paper published on Bioinformatics (https://doi.org/10.1186/s12859-019-3160-3), and thank you for your outstanding work. But I noticed that, according to some literature reports, the sequences on ecDNA may not correspond continuously to the reference genome. For example, Structurally, EGFRvIII included ecDNA dont contain several exons (1,2), Can circle-map accurately identify this variation and display it in the output?

Turner, K., Deshpande, V., Beyter, D. et al. Extrachromosomal oncogene amplification drives tumour evolution and genetic heterogeneity. Nature 543, 122–125 (2017).
Wu, S., Turner, K.M., Nguyen, N. et al. Circular ecDNA promotes accessible chromatin and high oncogene expression. Nature 575, 699–703 (2019).
Best.
Zhang
2020.3.21

about samtools index for bam

When I used samtools index to index the bam file, I found that an error would be reported. The species is Ginkgo biloba and the chromosome length is longer than that of humans. It cannot be constructed. And I use the -c parameter to form a csi file, and I want to further use the eccDNA identification software Circle- Map software cannot analyze further and report errors. There is a solution.

The csi file cannot be used for the next step of Circle-Map eccDNA recognition

UserWarning: Failed on interval chrom

Hi,
When I follow the tutorial for identification of circular DNA using Circle Map Realign,I encounter an error.
My command Input:
Circle-Map Realign -i 019_sort_circular_read_candidates.bam -qbam 019_qname.bam -sbam 019_sort.bam -fasta 019_new.fna -o 019_circle.bed
The standard output gives an error:

2020-09-03 09:08:43: Realigning reads using Circle-Map

2020-09-03 09:08:43: Clustering structural variant reads

2020-09-03 09:08:44: Splitting clusters to to processors

[E::idx_find_and_load] Could not retrieve index file for '019_qname.bam'
  0%|             Traceback (most recent call last):                                                                                                | 0/100 [00:00<?, ?it/s]
  File "/beegfs/home/lxd/miniconda3/envs/pgcgap/lib/python3.7/site-packages/circlemap/realigner.py", line 222, in realign
    plus_base_freqs = background_freqs(plus_coding_interval)
  File "/beegfs/home/lxd/miniconda3/envs/pgcgap/lib/python3.7/site-packages/circlemap/utils.py", line 876, in background_freqs
    return{nucleotide: seq.count(nucleotide)/len(seq) for nucleotide in 'ATCG'}
  File "/beegfs/home/lxd/miniconda3/envs/pgcgap/lib/python3.7/site-packages/circlemap/utils.py", line 876, in <dictcomp>
    return{nucleotide: seq.count(nucleotide)/len(seq) for nucleotide in 'ATCG'}
ZeroDivisionError: division by zero
/beegfs/home/lxd/miniconda3/envs/pgcgap/lib/python3.7/site-packages/circlemap/realigner.py:391: UserWarning: Failed on interval chrom    019_contig161
start            25534
end              25834
Name: 2, dtype: object due to the error division by zero
  str(interval), str(e)))
  1%|█▎                                                                                                                                     | 1/100 [00:01<02:59,  1.81s/it]
2020-09-03 09:08:46: An error happenend during execution. Exiting
0it [00:01, ?it/s]

Thanks a lot, hope to get your suggestions!

Sequence extraction of spite reads and discordants reads

Dear Inigo:
Hello! About Circle-Map, I use Circle-Map version=1.1.3 to identify eccDNA, Circle-Map Realign command to get the result, I want to get the sequence of the spite reads of each eccDNA in the sample and the sequence of Discordants reads, is there any instruction or way ? I check the Circle-Map ReadExtractor parameters, and I can get Reads indicating circular DNA structural variants. Discordants reads and spite reads cannot be output directly. Is there a way to extract the two types of read sequences corresponding to eccDNA based on the *_sort_circular_read_candidates.bam file? Looking forward to your reply.Would like to ask you ! Many thanks !

How does Circle-Map use hard clipped reads?

Inigo,
This is possibly more a question about the structure of sam/bam files (I am a beginner in bioinformatics). If so, I apologize in advance.

I have been using Circle-Map to analyze my mtDNA sequencing runs, and also to see if I can identify circular DNA from other chr's in my circle-enriched preparations.

In my last prep, Circle-Map assigned some very high circle scores to regions with no discordant pairs (see orange shading). Also coverage was highly variable across the region. So you might want to consider assigning more scoring weight to coverage uniformity of potential circles.

I did find one other region with very low coverage but which had both discordant reads and split reads (see yellow shaded entry above). I wanted to see the details of these reads and so I inspected the information for the individual reads with IGV. Here is an IGV image of one of the discordant (and split) reads. The read information for the read bracketed in red is shown in the next image.

My confusion is that IGV says that the read is hard-clipped. I had the impression that Circle-Map would ignore hard-clipped reads. Is that correct? Or is it the case that Circle-Map will only looks for supplementary alignments, whether the read is hard or soft clipped?

Thanks,
Chris

Segmental duplications give false circle signals?

Inigo,
I am developing sample prep application for mtDNA sequencing that should also be useful for ecDNA discovery. I lyse cells in the sample well of an electrophoresis cassette (SageHLS, Sage Science, Inc.) and run DNA into the gel under conditions where all genomic DNA becomes trapped in the wall of the sample well, and smaller DNA continues into the gel. The smaller DNA is purified by electroelution, and we sequence it on a Miseq.

In our first experiment, we got a lot of mtDNA sequence that aligned at high quality. I ran Circle-Map on our data and sorted the bed file for high scoring circles using column 6.

As expected, the mtDNA sequence stood out as the highest score (203000 in column 6) with very high numbers of discordant reads and split reads.

However there was also one other high quality call on chr16. In further inspection, this potential circle would have been 11Mb in size (and included the centromere of chr16). I am doubtful that DNA of this size would enter our gel cassette. On inspection of the sequence boundaries, it appears that the right boundary turns out to be a segmental duplication of the left boundary. So it is a kind "false positive" signal, although it has the right general structure at the calculated joint to generate discordant reads and split reads that would be characteristic of a circular joint. As you pointed out in your tutorial, the poor coverage (0.82 uncovered sequence, column 11), would be a giveaway that something was not right. In my case, the size of the circle was also problematic, insofar as we were using a gel system of limited size capacity.

I bring this up because I suspect the boundaries of segmental duplications may cause false "circle" signals for other users. I don't have any hot ideas for how to solve this, except perhaps to incorporate coverage data (column 11) in the circle score (column 6).

Best regards,
Chris

Only got ec

OSError: [Errno 28] No space left on device

When running Circle-Map Realign in a conda environment
Circle-Map Realign -t 24 -i sorted_candidates.A.bam -qbam qcircle.A.bam -sbam sorted_circle.A.bam -fasta ref.fa -o final_circle.A.bed

I get an OSError:
File "/global/home/users/diplockn/.conda/envs/circle/lib/python3.7/site-packages/pybedtools/bedtool.py", line 3256, in cat
TMP.write(str(f))
OSError: [Errno 28] No space left on device

This doesn't appear to be a ram issue since the cluster I'm running on does not give an out of memory error

I get the same error when using python /global/home/users/diplockn/Circle-Map/circlemap/circle_map.py Realign -t 24 ...

Fail to open alignment file

Hello,

I have followed the steps in the Realign tutorial using my data up to the last step. I am trying to run it as follows:

Circle-Map Realign -t 3 -i sorted_circular_read_candidates_NPA.bam -qbam NPA_sorted_qname.bam -sbam NPA_sorted_coord.bam -fasta mm10/mm10.fa -o NPA_circle.bed -dir /scratch/mariacas_root/mariacas99/hartlama/

But I keep getting the same error (below). I know the file is in this location, and I'm able to view it using samtools.

[E::hts_open_format] Failed to open file "sorted_circular_read_candidates_NPA.bam" : No such file or directory
2021-03-22 14:45:18: Realigning reads using Circle-Map

2021-03-22 14:45:18: Clustering structural variant reads

Traceback (most recent call last):
File "/home/hartlama/.local/bin/Circle-Map", line 8, in
sys.exit(main())
File "/home/hartlama/.local/lib/python3.9/site-packages/circlemap/circle_map.py", line 1164, in main
run = circle_map()
File "/home/hartlama/.local/lib/python3.9/site-packages/circlemap/circle_map.py", line 137, in init
splitted, sorted_bam, begin = start_realign(self.args.i, self.args.output, self.args.threads,
File "/home/hartlama/.local/lib/python3.9/site-packages/circlemap/utils.py", line 1329, in start_realign
eccdna_bam = ps.AlignmentFile("%s" % circle_bam, "rb")
File "pysam/libcalignmentfile.pyx", line 742, in pysam.libcalignmentfile.AlignmentFile.cinit
File "pysam/libcalignmentfile.pyx", line 941, in pysam.libcalignmentfile.AlignmentFile._open
FileNotFoundError: [Errno 2] could not open alignment file sorted_circular_read_candidates_NPA.bam: No such file or directory

Is there something wrong with my file?

ReadExtractor fails if the first read in bam file is an unpaired read2

A small subset of samples were failing with:

circlemap/extract_circle_SV_reads.py", line 123, in extract_sv_circleReads
    if read.is_read2 and read.qname == read1.qname:
AttributeError: 'str' object has no attribute 'qname'

The loop structure for iterating over the bam file in both utils.py: insert_size_dist() and extract_circle_SV_reads.py: extract_sv_circleReads() assumes paired reads, alternating read1-read2-read1-read2.

The case of an unpaired read is handled correctly, as long as any read1 has already been encountered. The first read1 overwrites the initialization string read1 = '' to a pysam alignment.

However, if a read2 is encountered before any read1, then read1 is still an empty string with no attribute qname.

My fix was to edit utils.py and extract_circle_SV_reads.py such that:

        if read.is_read1:
            read1 = read

        # Checks for initialization read1 = '' by looking for read1 type == string.
        # read1 type == pysam.Alignment after first read.is_read1 is encountered
        # Condition should only be met if unpaired read2 begins BAM file
        elif isinstance(read1, str):
            pass

        else:
            if read.is_read2 and read.qname == read1.qname:
                ...

An alternative fix that would probably run faster would be to initialize read1 as type pysam.alignment with read1.qname = "fakename". This would remove the elif statement from each loop iteration, but I don't know which pysam function could be used to generate a fake read / qname like that.

If I find time, I'll try to roll my collection of small fixes into a PR.

Cheers,
Michael

Preprocessing circle-seq data before using circle-map

Hello there,
I would like to ask, before processing the circle-seq data, your process is directly compared with bwa. I would like to ask whether to use picard to remove duplicates and merge samples for raw data.
Looking forward to your answer.
Good luck!
Jiang Xiaoyu

WARNING: Could not compute the probability for the mate interval priors

Dear,
I am using Circle-map to predict ecDNA，My data is download from SRA (for example, https://www.ncbi.nlm.nih.gov/sra/?term=SRR9089606). I firstly downsample this data to about 10X by seqtk like this:
seqtk sample -2 -s606 {input.r1} 103333333 | gzip -c > {output.out1}
seqtk sample -2 -s606 {input.r2} 103333333 | gzip -c > {output.out2}
But at the finally step, Circle-Map Realign report so many warnings like this:

UserWarning: WARNING: Could not compute the probability for the mate interval priors [['chr4', 63670302, 63670344, 'DR', 'R', '0.999999'], ['chr4', 63670302, 63670344, 'DR', 'R', '0.999999']] due to the following error cannot convert float NaN to integer

You can download and review my pipline written by snakemake from https://send.firefox.com/download/ae3406fc3df56d2a/#MvYyGhVOUUgspILl8UWUAQ , and could you please give me some advises about this problem, thanks for your attention and kindly help.
Best.
Zhang.

NGmerge or fastqc shoud be used in quality control before alignment

Hi,
It's really interesting tool to identify circle DNA from DNAseq data. But, recently, we are considered with the variety of circle DNA length and length of insenrtion of reads, which would influence the usage of quanlity control tools. These library sizes of circle DNA were like ATACseq results, the inseration length of DNA fragments could be less than 50bp or more than 150bp. And in ATACseq quality control, we were suggested to use NGmerge, which could keep the short reads reminding, to cut adapters and filter reads, instead of fastqc (or other trim tools in standard pipeline) which will remove shorted reads directly.
So, we also want to know whether trim tools we need to use to finish quality control before alignment.
Thanks!

combining samples before Circle-Map analysis

Hello,

I am following the Realign tutorial and am hoping to combine 3 samples in my analysis. My question is: at which step should I merge the .sam/.bam files? Right before I use the Circle-Map ReadExtractor command??

Thank you

iprada / circle-map Goto Github PK

circle-map's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs