Comments (5)
Hi,
The directions of these entries are different and the physical distances between them are too far. The last two entries are close enough, but their TE coordinates substantially overlap (4910-7166 vs 6988-8240), thus they can not be considered as a single element.
Thanks!
Shujun
from edta.
Hey Shujun,
Thanks for the clarification! So if a substantial overlap is detected, then they cannot be considered a single element.
However, it is still a bit unclear to me how this can translate into the final annotation of this region, that looks like this:
Chr5 EDTA Mutator_TIR_transposon 19872566 19873827 10111 - . ID=TE_homo_95784;Name=VANDAL21;classification=DNA/MULE-MuDR;sequence_ontology=SO:0002280;identity=0.963;method=homology;ID=TE_homo_98670;sequence_ontology=SO:0002280
Chr5 EDTA Mutator_TIR_transposon 19873825 19874206 3057 - . ID=TE_homo_95785;Name=VANDAL21;classification=DNA/Mutator;sequence_ontology=SO:0002280;identity=0.968;method=homology;ID=TE_homo_98671;sequence_ontology=SO:0002280
Chr5 EDTA Mutator_TIR_transposon 19873941 19877095 12213 - . ID=TE_homo_95786;Name=VANDAL21;classification=DNA/MULE-MuDR;sequence_ontology=SO:0002280;identity=0.966;method=homology;ID=TE_homo_98672;sequence_ontology=SO:0002280
Chr5 EDTA Mutator_TIR_transposon 19877284 19883063 18267 + . ID=TE_homo_95787;Name=VANDAL21;classification=DNA/Mutator;sequence_ontology=SO:0002280;identity=0.976;method=homology;ID=TE_homo_98673;sequence_ontology=SO:0002280
Chr5 EDTA Mutator_TIR_transposon 19883061 19884298 9665 + . ID=TE_homo_95788;Name=VANDAL21;classification=DNA/Mutator;sequence_ontology=SO:0002280;identity=0.964;method=homology;ID=TE_homo_98674;sequence_ontology=SO:0002280
Where at least in two cases the overlap is not substantial and the direction is the same.
Many thankss for your support Shujun! :)
from edta.
The gff rows you pasted seem to contain extra information compared to the RM out rows. To combine rows, both physical coordinate, direction, and the TE coordinate, divergence need to be considered. If the physical coordinate, direction, and divergence meet the criteria, but the TE coordinate overlaps substantially, they are still considered two elements. If the the TE coordinates have a large distance in between and are in the agreeable directions (first piece has smaller 5' coordinates), they are still considered a single element. In such a case, the annotated TE has a large deletion.
Shujun
from edta.
Hi, Shujun
Sorry for jumping into this conversation. What we don't understand is why even meet all the standard in the script, but some rows still not tjoins?
Here is the code and small working example I used:
perl combine_RMrows.pl -rmout test -maxgap 35 -maxdiv 3.5
, so same family, same strand, gap less than 35 bp and two elements divergence less than 3.5 will be joined, right?
But looking for these three rows:
# before joining
SW perc perc perc query position in query matching repeat position in repeat
score div. del. ins. sequence begin end (left) repeat class/family begin end (left) ID
30291 4.5 0.2 0.4 Chr3 17485555 17489789 (8669366) + VANDAL12 DNA/Mutator 1 4200 (9966) 64678 *
38777 2.6 0.5 0.2 Chr3 17489775 17494536 (8664619) + VANDAL12 DNA/Mutator 3442 7944 (4030) 64679
26487 1.4 0.2 0.0 Chr3 17494533 17497540 (8661615) + VANDAL12 DNA/Mutator 8849 11860 (114) 64680 *
# after joining
SW_score perc_div. perc_del. perc_ins. query_sequence query_begin query_end query_remain strand matching_repeat repeat_class/family repeat_begin repeat_end repeat_remain ID
30291 4.5 0.2 0.4 Chr3 17485555 17489789 8669366 + VANDAL12 DNA/Mutator 1 4200 (9966) 64678
34020 2.1 0.4 0.1 Chr3 17489775 17497540 8661615 + VANDAL12 DNA/Mutator 3442 11860 (114) 64679_64680
So the 64679_64680
(the ID column) was joined, but why 64678
didn't joined with 64679_64680
?
✅ Same family (VANDAL12)
✅ Same Strand (+)
✅ Overlapped (17485555-17489789 with 17489775-17497540; overlapped 14bp). How large overlap of this script will be ignored? We think it's not a substantial overlap.
✅ Divergence (4.5-2.1=2.4)
from edta.
For anyone interested in these merging, the case I pasted here didn't merge is because the overlap in the repeat consensus of last four column. 1-4200
overlapped 800 bp with 3442-11860
from edta.
Related Issues (20)
- I just found that this script in the (../../share/RepeatMasker/) folder will not have this error, maybe can copy the input file, I think it can try?If you tried, can you tell me the result? HOT 3
- TIR not found? HOT 1
- 文件缺失 HOT 1
- Stuck by BLAST in LTR finding HOT 2
- PanEDTA test output
- [No LINE, EDTA 2.2.0] Empty LINE file after RM2
- LINE and SINE results files has 0 bp!
- ERROR: TE annotation stats results not found in B.purpurea.fasta.mod.EDTA.TE.fa.stat! HOT 1
- '调用失败' HOT 8
- Statistical genome size
- solve '*.mod.EDTA.TEanno.sum' empty HOT 3
- Unusual Output and Failure During Regular Annotation
- TE_XXX in gff3 from panEDTA
- panEDTA timing out on large genomes HOT 2
- For the RepeatModeler step why throw logs to 2>null?
- Using CDS from multiple species
- several issues in the ruuning of EDTA2
- test data HOT 4
- Test file failed
- Locate source of annotation
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from edta.