Comments (7)
Ok so that confirms what I was saying, that duplications only detect expansions within the query and not contractions.
Although I could technically invert the detection using the reciprocal alignment and find contractions...If I tried this modification would you like to try run it? and see if you find the alternate events either side?
Hmmm this inversion result is more worrying
It will most likely be due to the filtering process which gets rid of a lot of false positives due to the mapping of repetitive regions.
it requires that during the nucmer alignment filtering to find the global alignment, that in the absence of the inverted alignment found in the less strictly filtered alignment, that no other alignments can be found in this region. So perhaps it is because they are in repetitive regions?
Maybe it is currently a little strict when it comes to large inversions and could be relaxed a little more...
The difference is the sizes between the two genomes would most likely be as you said due to the differences in the haplotypes in these regions and/or obscurity in mapping at the boarders of the inversion, due to again repetitive regions. This later case is not unlikely considering inversion commonly form with repetitive material.
Let me know how the dotplots look for these inversions
Have a good day!
from mumandco.
Hi Gabriele,
Thanks for using MUM&Co
For duplications:
Technically that is how it should function, the duplications are to be from haplotype1. MUM&Co doesn't predict 'expansions' and 'contractions' so duplications only are the addition of material in the query. It some cases the contractions would appear as deletions but I cannot be too sure how it will respond in all cases.
Interestingly, now I am writing this, there isn't any reason why I could not find the opposite and look for expansion. Might add this as a feature.
The co-oridnates for the query chromosome positions of duplications can vary in accuracy I have noticed. Do you find this an issue?
Regarding inversions: correct again, any position in the opposite orientation would appear regardless of which haplotype is the query or the reference
Assuming all the chromosomes are highly co-linear, running one at a time should be ok however in most cases I would recommend running the whole genome vs whole genome.
Did you have a look at my pipeline for the phased haplotype assembly? I hope it can be of help.
Hope that helps
from mumandco.
Thanks for the quick reply.
As mentioned above, for duplications I performed the same process using haplotype2-chr1 as query and haplotype1-chr1 as reference (so inverting them). The results are reported below.
ref_chr | query_chr | ref_start | ref_stop | size | SV_type | query_start | query_stop | ref_chr | query_chr | ref_start | ref_stop | size | SV_type | query_start | query_stop | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
chr1 | chr1 | 1349462 | 1349653 | 191 | duplication | 1725124 | 1725127 | chr1 | chr1 | 4794578 | 4890703 | 96125 | duplication | 2549810 | 2549811 | |
chr1 | chr1 | 1678135 | 1682964 | 4829 | duplication | 1377849 | 1377851 | chr1 | chr1 | 6419492 | 6432519 | 13027 | duplication | 7035943 | 7193751 | |
chr1 | chr1 | 3156729 | 3186550 | 29821 | duplication | 4085015 | 4085016 | chr1 | chr1 | 11599672 | 12011191 | 411519 | duplication | 12790274 | 12790275 | |
chr1 | chr1 | 11331514 | 11384802 | 53288 | duplication | 10232990 | 10232990 | chr1 | chr1 | 13854641 | 13854779 | 138 | duplication | 14990751 | 14990787 | |
chr1 | chr1 | 11783930 | 11820088 | 36158 | duplication | 11051724 | 11051725 | chr1 | chr1 | 16046772 | 16112494 | 65722 | duplication | 17303064 | 17303064 | |
chr1 | chr1 | 11888669 | 11888776 | 107 | duplication | 11117719 | 11117799 | chr1 | chr1 | 17970320 | 17984232 | 13912 | duplication | 19237291 | 19237292 | |
chr1 | chr1 | 15899129 | 15931090 | 31961 | duplication | 14797193 | 14797194 | chr1 | chr1 | 18832015 | 18833967 | 1952 | duplication | 32598786 | 32598787 | |
chr1 | chr1 | 18225961 | 18226087 | 126 | duplication | 16968211 | 16968325 | chr1 | chr1 | 19519200 | 19561168 | 41968 | duplication | 19822394 | 19822394 | |
chr1 | chr1 | 25382432 | 25388788 | 6356 | duplication | 25545490 | 25545493 | chr1 | chr1 | 25641447 | 25647307 | 5860 | duplication | 25484693 | 25484694 | |
chr1 | chr1 | 25475440 | 25484693 | 9253 | duplication | 25647307 | 25647308 | chr1 | chr1 | 27616766 | 27616992 | 226 | duplication | 25978671 | 25978673 | |
chr1 | chr1 | 26899767 | 26899827 | 60 | duplication | 26233660 | 26233661 | chr1 | chr1 | 30293973 | 30294139 | 166 | duplication | 30904862 | 30904869 | |
chr1 | chr1 | 33461685 | 33641555 | 179870 | duplication | 33013693 | 33013694 | chr1 | chr1 | 31333977 | 31343066 | 9089 | duplication | 32013518 | 32013519 |
So, I can say that I have found 10 duplications on haplotype1-chr1 and 10 duplications on haplotype2-chr1, for a total a 20 different duplications. That fact that the coordinates for the query chromosome positions of duplications can vary in accuracy should not be a big issue.
As regards inversions, comparing haplotype1-chr1 vs. haplotype2-chr1 and haplotype2-chr1 vs. haplotype1-chr1, it seems that some inversions were identified only in one case or only in the other. The results are reported below.
ref_chr | query_chr | ref_start | ref_stop | size | SV_type | query_start | query_stop | ref_chr | query_chr | ref_start | ref_stop | size | SV_type | query_start | query_stop | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
chr1 | chr1 | 928392 | 936829 | 8437 | inversion | 922519 | 937228 | chr1 | chr1 | 1078963 | 2013971 | 935008 | inversion | 1078547 | 1725591 | |
chr1 | chr1 | 1078547 | 1725591 | 647044 | inversion | 1078963 | 2013971 | chr1 | chr1 | 2739937 | 3999333 | 1259396 | inversion | 2450920 | 3188513 | |
chr1 | chr1 | 3322217 | 4595464 | 1273247 | inversion | 4085015 | 4619181 | chr1 | chr1 | 4007253 | 4010944 | 3691 | inversion | 3196440 | 3200122 | |
chr1 | chr1 | 6718265 | 7022914 | 304649 | inversion | 6261822 | 6419491 | chr1 | chr1 | 4058381 | 4060924 | 2543 | inversion | 3283498 | 3298224 | |
chr1 | chr1 | 7694800 | 7732806 | 38006 | inversion | 6921248 | 6933932 | chr1 | chr1 | 4085015 | 4619181 | 534166 | inversion | 3322217 | 4595464 | |
chr1 | chr1 | 7997744 | 8008258 | 10514 | inversion | 7199138 | 7251443 | chr1 | chr1 | 5398638 | 5836884 | 438246 | inversion | 5195438 | 6293327 | |
chr1 | chr1 | 10914442 | 11723952 | 809510 | inversion | 10166357 | 10961509 | chr1 | chr1 | 6261822 | 6419491 | 157669 | inversion | 6718265 | 7022914 | |
chr1 | chr1 | 12790274 | 13200365 | 410091 | inversion | 12011191 | 12036824 | chr1 | chr1 | 6921248 | 6933932 | 12684 | inversion | 7694800 | 7732806 | |
chr1 | chr1 | 18915584 | 19153861 | 238277 | inversion | 17659104 | 17897649 | chr1 | chr1 | 9665694 | 10131689 | 465995 | inversion | 10422224 | 10879774 | |
chr1 | chr1 | 19349912 | 19459708 | 109796 | inversion | 18117940 | 18117941 | chr1 | chr1 | 10166357 | 10955599 | 789242 | inversion | 10914442 | 11729862 | |
chr1 | chr1 | 25365381 | 25374070 | 8689 | inversion | 25521330 | 25530347 | chr1 | chr1 | 16514121 | 16524182 | 10061 | inversion | 17772558 | 17781891 | |
chr1 | chr1 | 25796661 | 25805131 | 8470 | inversion | 25964739 | 25973163 | chr1 | chr1 | 17659104 | 17897649 | 238545 | inversion | 18915584 | 19153861 | |
chr1 | chr1 | 25819305 | 26699887 | 880582 | inversion | 25987337 | 26037960 | chr1 | chr1 | 25964739 | 25973163 | 8424 | inversion | 25796661 | 25805131 | |
chr1 | chr1 | 30808857 | 30829719 | 20862 | inversion | 30197948 | 30219007 | chr1 | chr1 | 25987337 | 26037960 | 50623 | inversion | 25819305 | 26699887 | |
chr1 | chr1 | 32655857 | 33014034 | 358177 | inversion | 32082789 | 32393776 | chr1 | chr1 | 30197948 | 30219007 | 21059 | inversion | 30808857 | 30829719 | |
chr1 | chr1 | 33016918 | 33053334 | 36416 | inversion | 32396648 | 32492220 | chr1 | chr1 | 31170104 | 31680072 | 509968 | inversion | 31774371 | 32309684 | |
chr1 | chr1 | 33218101 | 33325587 | 107486 | inversion | 32589793 | 32697994 | chr1 | chr1 | 32027721 | 32393776 | 366055 | inversion | 32710925 | 33014034 | |
chr1 | chr1 | 33641556 | 33855619 | 214063 | inversion | 33193505 | 33406455 | chr1 | chr1 | 32469931 | 32571206 | 101275 | inversion | 33165089 | 33199342 | |
chr1 | chr1 | 32589793 | 32697994 | 108201 | inversion | 33218101 | 33325587 | |||||||||
chr1 | chr1 | 33193505 | 33406455 | 212950 | inversion | 33641556 | 33855619 |
Is there a particular reason why this can happen? However, I can verify the veracity of these cases through a dot plot analysis. I noticed that even in cases of shared identification the size of the event can change, but I think this could be due to the presence of gaps and indels within the inverted region. Could it be plausible?
Thanks for all the information you will be able to give me!
Yes, I'll have a look on your pipeline. I wasn't aware of that, thank you again.
Gabriele
from mumandco.
Thanks for all this information!
I'm happy to run the program again once you have made the necessary changes and I'll let you know the results of the dot plots.
See you soon!
from mumandco.
Hi there,
Sorry for taking a couple days
Essentially I have just reverted the parsing of the alignment for duplications onto the reciprocal alignment instead, and therefore this should find, If I am not mistaken, contractions should be the opposite on duplications. (but it took me a few hours to re-understand all the intermediate files and filters etc)
So here is a draft version of the script
Is is inside a newer version I will release soon with a vcf output
However it is limited to use with MUMmer4 as it requires to be given the threads option (-t) at the end of the line (I have wrote this in the read me on how to use the MUMmer4 specific version)
Let me know how it goes (it's gzipped just so I can upload it here)
mumandco_v3_MUMmer4_multithreads_contractions.sh.gz
from mumandco.
Hi!
I tested your new code for duplications using haplotype1-chr1 vs. haplotype2-chr1 and then I compared the result with the old one. The results are reported below.
new mumandco_v3 haplotype1-chr1 vs. haplotype2-chr1:
ref_chr | query_chr | ref_start | ref_stop | size | SV_type | query_start | query_stop |
---|---|---|---|---|---|---|---|
chr1 | chr1 | 1349462 | 1349653 | 191 | duplication | 1725124 | 1725127 |
chr1 | chr1 | 1678135 | 1682964 | 4829 | duplication | 1377849 | 1377851 |
chr1 | chr1 | 3156729 | 3186550 | 29821 | duplication | 4085015 | 4085016 |
chr1 | chr1 | 11331514 | 11384802 | 53288 | duplication | 10232990 | 10232990 |
chr1 | chr1 | 11783930 | 11820088 | 36158 | duplication | 11051724 | 11051725 |
chr1 | chr1 | 11888669 | 11888776 | 107 | duplication | 11117719 | 11117799 |
chr1 | chr1 | 15899129 | 15931090 | 31961 | duplication | 14797193 | 14797194 |
chr1 | chr1 | 18225961 | 18226087 | 126 | duplication | 16968211 | 16968325 |
chr1 | chr1 | 25382432 | 25388788 | 6356 | duplication | 25545490 | 25545493 |
chr1 | chr1 | 25475440 | 25484693 | 9253 | duplication | 25647307 | 25647308 |
chr1 | chr1 | 26899767 | 26899827 | 60 | duplication | 26233660 | 26233661 |
chr1 | chr1 | 33461685 | 33641555 | 179870 | duplication | 33013693 | 33013694 |
chr1 | chr1 | 2549810 | 2549811 | 96125 | contraction | 4794578 | 4890703 |
chr1 | chr1 | 14990751 | 14990787 | 138 | contraction | 13854641 | 13854779 |
chr1 | chr1 | 17303064 | 17303064 | 65722 | contraction | 16046772 | 16112494 |
chr1 | chr1 | 19237291 | 19237292 | 13912 | contraction | 17970320 | 17984232 |
chr1 | chr1 | 19822394 | 19822394 | 41968 | contraction | 19519200 | 19561168 |
chr1 | chr1 | 25484693 | 25484694 | 5860 | contraction | 25641447 | 25647307 |
chr1 | chr1 | 25978671 | 25978673 | 226 | contraction | 27616766 | 27616992 |
chr1 | chr1 | 30904862 | 30904869 | 166 | contraction | 30293973 | 30294139 |
chr1 | chr1 | 32013518 | 32013519 | 9089 | contraction | 31333977 | 31343066 |
chr1 | chr1 | 32598786 | 32598787 | 1952 | contraction | 18832015 | 18833967 |
old mumandco_v2.4.2 haplotype2-chr1 vs. haplotype1-chr1 (inverted):
ref_chr | query_chr | ref_start | ref_stop | size | SV_type | query_start | query_stop |
---|---|---|---|---|---|---|---|
chr1 | chr1 | 4794578 | 4890703 | 96125 | duplication | 2549810 | 2549811 |
chr1 | chr1 | 6419492 | 6432519 | 13027 | duplication | 7035943 | 7193751 |
chr1 | chr1 | 11599672 | 12011191 | 411519 | duplication | 12790274 | 12790275 |
chr1 | chr1 | 13854641 | 13854779 | 138 | duplication | 14990751 | 14990787 |
chr1 | chr1 | 16046772 | 16112494 | 65722 | duplication | 17303064 | 17303064 |
chr1 | chr1 | 17970320 | 17984232 | 13912 | duplication | 19237291 | 19237292 |
chr1 | chr1 | 18832015 | 18833967 | 1952 | duplication | 32598786 | 32598787 |
chr1 | chr1 | 19519200 | 19561168 | 41968 | duplication | 19822394 | 19822394 |
chr1 | chr1 | 25641447 | 25647307 | 5860 | duplication | 25484693 | 25484694 |
chr1 | chr1 | 27616766 | 27616992 | 226 | duplication | 25978671 | 25978673 |
chr1 | chr1 | 30293973 | 30294139 | 166 | duplication | 30904862 | 30904869 |
chr1 | chr1 | 31333977 | 31343066 | 9089 | duplication | 32013518 | 32013519 |
As you can see, it appears that the program correctly attributed what were previously identified as duplications between haplotype2-chr1 vs. haplotype1-chr1 (inverted) as contractions in the new output. All but two duplications (second and third row of the last table), which are not among the contractions. Do you have an idea why this small lack? Thanks for your active support.
P.S. I have yet to verify the previously discussed inversions through dot plot analysis.
from mumandco.
Unfortunately I don't have any idea at the moment why some duplications are not labelled as contractions in the opposite as it should be performing the same analysis on essentially the same alignment.
I will try play with a little further before it becomes the latest version, but I'm not sure I will figure it out.
No worries about the inversions, I can wait. If you can, I'd be interested to know about the duplications/contractions too. Would be nice to see!
Thanks Gabriele
from mumandco.
Related Issues (20)
- How to understanding the inversion result? HOT 3
- SV call from multiple genomes comparison HOT 8
- multiple threads? HOT 7
- awk: cmd. line:1: fatal: division by zero attempted HOT 4
- syntax error near unexpected token `newline' HOT 6
- Return to 'multiple threads?' closed issue HOT 5
- --threads has no default value HOT 1
- What does it mean 'imprecise' calls for the VCF HOT 2
- MUMandCO v3.8, aborted using example files HOT 1
- support for other alingment formats HOT 1
- Aligning different size genomes HOT 1
- awk: cmd. line:1: fatal: division by zero attempted HOT 8
- The insertion problem in the final output HOT 7
- problems in inversion
- VCF header format error
- Excesively long execution time HOT 3
- Variable number of variants found over runs HOT 2
- What does the label 'complicated' mean? HOT 2
- No insertions but many deletions found HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mumandco.