Comments (6)
Hi Linus,
Sorry for the late reply,
I have not used diploid or phased assemblies as MUM&Co essentially assumes a 1-1 pairing
So unfortunately I have not come across this before to a great extent
The difference between the beginning and end of alignments in the query is generally MUMmer alignment based, i.e. sometimes alignment halts prior to the actual breaksite.
However in the deletion test set I do not find this problem, the greatest difference i have for the query coordinates is 5bp (since the genomes are identical apart from the deletions)
Could you send me your output for the test dataset?
Are you using MUMmer4?
Coordinates are 1-based. Each value corresponding to a base-pair.
Thanks
Samuel
from mumandco.
Hi Samuel,
Thank you very much for your response.
I am doing the alignment between two chromosomes, with one being the paternal chromosome and the other the maternal, which I guess does not differ much (in concept) from the data that have been processed previously with Mum&Co, right?
What is curious to me is that when I call the SVs in Assemblytics (standard settings), which you also used for your benchmark tests, I did not get these extremely long query gap sizes for deletions, which indicates to me that this trend seems to be rooted in the way Mum&Co is processing the MUMmer output. I created some dotplots aligning the regions where these deletions are located and their flanking regions on the two chromosomes. For the deletions with very long query gap size, I could not see any alignments of the flanking regions, while for the deletions with small query gap sizes, I could observe the alignments.
Yes, I am using MUMmer4.
I added the test dataset output below. As mentioned, my results agree with your observation, the greatest difference for query coordinates is 5 bp as well.
Best regards,
Linus
from mumandco.
Hi again,
Understood, I thought you meant aligning both homologous chromosomes at the same time against a reference.
You are right, this is as most generic situations
First for the test data, you said you saw the same problems
"I also observed this in your yeast.tidy dataset, although to a much lesser extent"
Did you meant the differences between 1-5 bp? as otherwise I see no major differences
For the large differences in your data, perhaps the clue is here:
"When I run my pairs of allelic pseudomolecules, I get the pattern described above for around 30% of the deletions. Additionally, if I align one contig (e.g. 2-4 Mb) to the complete allelic pseudomolecule, the size specified in the last two columns is always less than 50 bp and very often in the single digits. What may be the reason for this different outcome when aligning the full versus one part of the chromosome?"
By this you mean that when you do whole genome alignments you find these large gaps in the query positions but not when you only align contig by contig to a whole genome?
The calls for insertions and deletions are meant to be only within a single contig, but perhaps overlapping contigs are having an impact on the way the coordinates are determined...or something else...
Could you send me your alignment files for both the whole genome and a contig that you used in your test?
Thanks Linus
Samuel
from mumandco.
Hi Samuel,
Yes, sorry for the confusion, I did not mean any other pattern than the deletions that sometimes have a query gap size of up to 5. So the data produced for the test set should be the same as yours.
Yes, the query gap sizes seem to be smaller when I extract the longest contig from one chromosome and align it to the other chromosome, however this of course may also happen as a side effect from selecting just a subset of the data, as there are still deletion gap sizes of >100 bp.
Below are the TSV files for both cases:
Mum&Co_Output.zip
If you want to look at delta files, here are the ones for the single contigs (the others are too large to upload).
Delta_Files_One_Contig.zip
If you need more files, feel free to ask for them.
Thanks a lot for the help,
Linus
from mumandco.
Hi Linus,
Sorry for the late reply,
I meant can you send me the entire alignment file produced for a whole genome alignment and a single contig alignment
Where there are large gaps in the contig coordinates when doing the whole genome alignment and where this large gap disappears when aligning the single contig
Perhaps this happens in the longest?
Thanks
from mumandco.
Is this still a problem?
from mumandco.
Related Issues (20)
- How to understanding the inversion result? HOT 3
- SV call from multiple genomes comparison HOT 8
- multiple threads? HOT 7
- awk: cmd. line:1: fatal: division by zero attempted HOT 4
- Total duplications and inversions between the homologous chromosomes of an haplotype-phased genome assembly HOT 7
- syntax error near unexpected token `newline' HOT 6
- Return to 'multiple threads?' closed issue HOT 5
- --threads has no default value HOT 1
- What does it mean 'imprecise' calls for the VCF HOT 2
- MUMandCO v3.8, aborted using example files HOT 1
- support for other alingment formats HOT 1
- Aligning different size genomes HOT 1
- awk: cmd. line:1: fatal: division by zero attempted HOT 8
- The insertion problem in the final output HOT 7
- problems in inversion
- VCF header format error
- Excesively long execution time HOT 3
- Variable number of variants found over runs HOT 2
- What does the label 'complicated' mean? HOT 2
- No insertions but many deletions found HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mumandco.