GithubHelp home page GithubHelp logo

Size of deletions about mumandco HOT 6 OPEN

samtobam avatar samtobam commented on May 30, 2024
Size of deletions

from mumandco.

Comments (6)

SAMtoBAM avatar SAMtoBAM commented on May 30, 2024

Hi Linus,
Sorry for the late reply,

I have not used diploid or phased assemblies as MUM&Co essentially assumes a 1-1 pairing
So unfortunately I have not come across this before to a great extent

The difference between the beginning and end of alignments in the query is generally MUMmer alignment based, i.e. sometimes alignment halts prior to the actual breaksite.
However in the deletion test set I do not find this problem, the greatest difference i have for the query coordinates is 5bp (since the genomes are identical apart from the deletions)
Could you send me your output for the test dataset?

Are you using MUMmer4?

Coordinates are 1-based. Each value corresponding to a base-pair.

Thanks
Samuel

from mumandco.

linusdsv avatar linusdsv commented on May 30, 2024

Hi Samuel,

Thank you very much for your response.

I am doing the alignment between two chromosomes, with one being the paternal chromosome and the other the maternal, which I guess does not differ much (in concept) from the data that have been processed previously with Mum&Co, right?

What is curious to me is that when I call the SVs in Assemblytics (standard settings), which you also used for your benchmark tests, I did not get these extremely long query gap sizes for deletions, which indicates to me that this trend seems to be rooted in the way Mum&Co is processing the MUMmer output. I created some dotplots aligning the regions where these deletions are located and their flanking regions on the two chromosomes. For the deletions with very long query gap size, I could not see any alignments of the flanking regions, while for the deletions with small query gap sizes, I could observe the alignments.

Yes, I am using MUMmer4.

I added the test dataset output below. As mentioned, my results agree with your observation, the greatest difference for query coordinates is 5 bp as well.

DEL100_test.SVs_all.zip

Best regards,
Linus

from mumandco.

SAMtoBAM avatar SAMtoBAM commented on May 30, 2024

Hi again,

Understood, I thought you meant aligning both homologous chromosomes at the same time against a reference.
You are right, this is as most generic situations

First for the test data, you said you saw the same problems
"I also observed this in your yeast.tidy dataset, although to a much lesser extent"
Did you meant the differences between 1-5 bp? as otherwise I see no major differences

For the large differences in your data, perhaps the clue is here:
"When I run my pairs of allelic pseudomolecules, I get the pattern described above for around 30% of the deletions. Additionally, if I align one contig (e.g. 2-4 Mb) to the complete allelic pseudomolecule, the size specified in the last two columns is always less than 50 bp and very often in the single digits. What may be the reason for this different outcome when aligning the full versus one part of the chromosome?"
By this you mean that when you do whole genome alignments you find these large gaps in the query positions but not when you only align contig by contig to a whole genome?
The calls for insertions and deletions are meant to be only within a single contig, but perhaps overlapping contigs are having an impact on the way the coordinates are determined...or something else...
Could you send me your alignment files for both the whole genome and a contig that you used in your test?

Thanks Linus
Samuel

from mumandco.

linusdsv avatar linusdsv commented on May 30, 2024

Hi Samuel,

Yes, sorry for the confusion, I did not mean any other pattern than the deletions that sometimes have a query gap size of up to 5. So the data produced for the test set should be the same as yours.

Yes, the query gap sizes seem to be smaller when I extract the longest contig from one chromosome and align it to the other chromosome, however this of course may also happen as a side effect from selecting just a subset of the data, as there are still deletion gap sizes of >100 bp.

Below are the TSV files for both cases:
Mum&Co_Output.zip

If you want to look at delta files, here are the ones for the single contigs (the others are too large to upload).
Delta_Files_One_Contig.zip

If you need more files, feel free to ask for them.

Thanks a lot for the help,
Linus

from mumandco.

SAMtoBAM avatar SAMtoBAM commented on May 30, 2024

Hi Linus,
Sorry for the late reply,

I meant can you send me the entire alignment file produced for a whole genome alignment and a single contig alignment
Where there are large gaps in the contig coordinates when doing the whole genome alignment and where this large gap disappears when aligning the single contig
Perhaps this happens in the longest?

Thanks

from mumandco.

SAMtoBAM avatar SAMtoBAM commented on May 30, 2024

Is this still a problem?

from mumandco.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.