Hello, I have been using Mum&Co to call SVs between two homologo

Size of deletions about mumandco HOT 6 OPEN

samtobam commented on May 30, 2024

Size of deletions

from mumandco.

Comments (6)

SAMtoBAM commented on May 30, 2024

Hi Linus,
Sorry for the late reply,

I have not used diploid or phased assemblies as MUM&Co essentially assumes a 1-1 pairing
So unfortunately I have not come across this before to a great extent

The difference between the beginning and end of alignments in the query is generally MUMmer alignment based, i.e. sometimes alignment halts prior to the actual breaksite.
However in the deletion test set I do not find this problem, the greatest difference i have for the query coordinates is 5bp (since the genomes are identical apart from the deletions)
Could you send me your output for the test dataset?

Are you using MUMmer4?

Coordinates are 1-based. Each value corresponding to a base-pair.

Thanks
Samuel

from mumandco.

linusdsv commented on May 30, 2024

Hi Samuel,

Thank you very much for your response.

I am doing the alignment between two chromosomes, with one being the paternal chromosome and the other the maternal, which I guess does not differ much (in concept) from the data that have been processed previously with Mum&Co, right?

What is curious to me is that when I call the SVs in Assemblytics (standard settings), which you also used for your benchmark tests, I did not get these extremely long query gap sizes for deletions, which indicates to me that this trend seems to be rooted in the way Mum&Co is processing the MUMmer output. I created some dotplots aligning the regions where these deletions are located and their flanking regions on the two chromosomes. For the deletions with very long query gap size, I could not see any alignments of the flanking regions, while for the deletions with small query gap sizes, I could observe the alignments.

Yes, I am using MUMmer4.

I added the test dataset output below. As mentioned, my results agree with your observation, the greatest difference for query coordinates is 5 bp as well.

DEL100_test.SVs_all.zip

Best regards,
Linus

from mumandco.

SAMtoBAM commented on May 30, 2024

Hi again,

Understood, I thought you meant aligning both homologous chromosomes at the same time against a reference.
You are right, this is as most generic situations

First for the test data, you said you saw the same problems
"I also observed this in your yeast.tidy dataset, although to a much lesser extent"
Did you meant the differences between 1-5 bp? as otherwise I see no major differences

For the large differences in your data, perhaps the clue is here:
"When I run my pairs of allelic pseudomolecules, I get the pattern described above for around 30% of the deletions. Additionally, if I align one contig (e.g. 2-4 Mb) to the complete allelic pseudomolecule, the size specified in the last two columns is always less than 50 bp and very often in the single digits. What may be the reason for this different outcome when aligning the full versus one part of the chromosome?"
By this you mean that when you do whole genome alignments you find these large gaps in the query positions but not when you only align contig by contig to a whole genome?
The calls for insertions and deletions are meant to be only within a single contig, but perhaps overlapping contigs are having an impact on the way the coordinates are determined...or something else...
Could you send me your alignment files for both the whole genome and a contig that you used in your test?

Thanks Linus
Samuel

from mumandco.

linusdsv commented on May 30, 2024

Hi Samuel,

Yes, sorry for the confusion, I did not mean any other pattern than the deletions that sometimes have a query gap size of up to 5. So the data produced for the test set should be the same as yours.

Yes, the query gap sizes seem to be smaller when I extract the longest contig from one chromosome and align it to the other chromosome, however this of course may also happen as a side effect from selecting just a subset of the data, as there are still deletion gap sizes of >100 bp.

Below are the TSV files for both cases:
Mum&Co_Output.zip

If you want to look at delta files, here are the ones for the single contigs (the others are too large to upload).
Delta_Files_One_Contig.zip

If you need more files, feel free to ask for them.

Thanks a lot for the help,
Linus

from mumandco.

SAMtoBAM commented on May 30, 2024

Hi Linus,
Sorry for the late reply,

I meant can you send me the entire alignment file produced for a whole genome alignment and a single contig alignment
Where there are large gaps in the contig coordinates when doing the whole genome alignment and where this large gap disappears when aligning the single contig
Perhaps this happens in the longest?

Thanks

from mumandco.

SAMtoBAM commented on May 30, 2024

Is this still a problem?

from mumandco.

Size of deletions about mumandco HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs