Comments (12)
Hi Luohao,
I tried LTR_retriever on fruitfly, mouse, micro- and mega- bats, and human, and it worked similarly as in plants, although most of these species have much less LTR content in their genomes. LAI requires a minimum of 5% total LTR and 0.1% intact LTR sequences present in the genome for the purpose of accurate evaluation, so you may need to check these two values.
For classification of LTR superfamilies, LTR_retriever uses models trained from rice LTR classifications, so the same model may not be applicable to invertebrate genomes. However, the classification information is not the major factor to identify LTR elements. You may need to do the classification yourself based on the identified LTR elements.
Best,
Shujun
from ltr_retriever.
Hi Shujun,
Thanks for your email. In amphioxus it seems LTR content is less than 1%, that's might be the reason.
On another note, LTR_retirever annotated 25.27% LTRs (according to the .tbl file) in a tilapia genome while the actual portion should be about 4%. I wonder if it has a lot false positives? The LAI score is also unexpectedly low for a Pacbio assembly: 3.01. Below is the script I used, would you have any suggestions for reducing false positives?
`/apps/genometools/1.5.9/bin/gt suffixerator -db $genome -indexname gt_index/$g -suf -lcp -des -ssp -sds -dna
/apps/genometools/1.5.9/bin/gt ltrharvest -index gt_index/$g -maxlenltr 7000 -maxtsd 6 -mintsd 4 -seqids yes -vic 10 -similar 90 -seed 20 > $g.harvest.scn
/apps/genometools/1.5.9/bin/gt ltrharvest -index gt_index/$g -maxlenltr 7000 -maxtsd 6 -mintsd 4 -seqids yes -vic 10 -similar 90 -seed 20 -motif TGCA -motifmis 1 > $g.harvest.motif.scn
/scratch/luohao/software/LTR_Finder/source/ltr_finder -D 15000 -d 1000 -L 7000 -l 100 -p 20 -C -M 0.9 $genome > $g.finder.scn
perl /scratch/luohao/software/mgescan-1.1/mgescan/ltr/find_ltr.pl -seq=$genome -min-ltr=100 -max-ltr=7000 -min_iden=90
/scratch/luohao/software/LTR_retriever-2.0/LTR_retriever -genome $g -nonTGCA $g.harvest.scn -inharvest $g.harvest.motif.scn -infinder $g.finder.scn -threads=20`
Thanks!
from ltr_retriever.
Hi Shujun,
Thanks for your email. In amphioxus it seems LTR content is less than 1%, that's might be the reason.
On another note, LTR_retirever annotated 25.27% LTRs (according to the .tbl file) in a tilapia genome while the actual portion should be about 4%. I wonder if it has a lot false positives? The LAI score is also unexpectedly low for a Pacbio assembly: 3.01. Below is the script I used, would you have any suggestions for reducing false positives?
`/apps/genometools/1.5.9/bin/gt suffixerator -db $genome -indexname gt_index/$g -suf -lcp -des -ssp -sds -dna
/apps/genometools/1.5.9/bin/gt ltrharvest -index gt_index/$g -maxlenltr 7000 -maxtsd 6 -mintsd 4 -seqids yes -vic 10 -similar 90 -seed 20 > $g.harvest.scn
/apps/genometools/1.5.9/bin/gt ltrharvest -index gt_index/$g -maxlenltr 7000 -maxtsd 6 -mintsd 4 -seqids yes -vic 10 -similar 90 -seed 20 -motif TGCA -motifmis 1 > $g.harvest.motif.scn/scratch/luohao/software/LTR_Finder/source/ltr_finder -D 15000 -d 1000 -L 7000 -l 100 -p 20 -C -M 0.9 $genome > $g.finder.scn
perl /scratch/luohao/software/mgescan-1.1/mgescan/ltr/find_ltr.pl -seq=$genome -min-ltr=100 -max-ltr=7000 -min_iden=90
/scratch/luohao/software/LTR_retriever-2.0/LTR_retriever -genome $g -nonTGCA $g.harvest.scn -inharvest $g.harvest.motif.scn -infinder $g.finder.scn -threads=20`
Thanks!
Hi Luohao,
Where did you download the mgescan-1.1? Can you give me the url? I have download three mgescan packages, but all of them did not work.
Thank you very Much!
Zhennan
from ltr_retriever.
from ltr_retriever.
For MGEScan_LTR please refer to #8 and #19. Let me know if you need further help, thanks!
Shujun
from ltr_retriever.
Sorry for delay response (somehow I thought I did).
Your commands look good, but I have no idea about the total LTR content of amphioxus. If you suspect high proportions of false positives, you may manually curate a couple of them to verify (try NCBI blast and see what are they). If you do find some, please post example sequences here with 100bp extended on up- and downstreams, which would help to debug.
If LTR content is too low, then LAI is not accurate. You may plot out regional LAI values in the *.LAI file to see if there is any uneven distribution. Using long reads is not a guarantee of assembly quality, which is also depended on a lot of things.
Shujun
from ltr_retriever.
from ltr_retriever.
@lurebgi I am curious how the 4% LTR in tilapia is estimated?
from ltr_retriever.
from ltr_retriever.
@lurebgi Repbase is a database for known TEs. The sequence of LTR elements varies wildly between species, so using other species's LTR sequence to identify the tilapia LTR sequence should be an underestimate. RepeatModeler is a general method for TE identification. It has some attempts to classify TEs but also not accurate in our experience. RepeatModeler can work as a supplement after some good identifications, but Repbase is not a good approach for LTR finding.
from ltr_retriever.
from ltr_retriever.
@lurebgi Thanks for sharing the paper. I read the method section. TE annotations were based on RepeatModeler or RepeatScout, so this is kind of a loop. Since both methods are copy-number based, low copy number TEs will be missed out. You may try to figure what new elements are annotated by LTR_retriever. I'll be happy to see how it works/fails.
from ltr_retriever.
Related Issues (20)
- No candidate is found in the file(s) you specified. HOT 7
- Repeatmodeler 、RepeatMasker and LTR_retriever HOT 4
- Serious error ⇒ Dependency checking: Error: The RMblast engine is not installed in RepeatMasker! HOT 3
- Can one use EDTA genome.mod.pass.list and genome.mod.EDTA_TEanno.out output in LAI HOT 1
- No candidate is found in the file(s) you specified HOT 2
- Use of uninitialized value $seq in string eq at call_seq_by_list.pl line 129 HOT 10
- Are LTRs (insertion time of 0) the likely cause of somatic variation? HOT 2
- LTR_retriever warning HOT 1
- Total length annotated by RepeatMasker is longer than LTR_retriever HOT 3
- RE: Masking LTR HOT 2
- A question about LAI in autopolyploid HOT 2
- How to discover "sequencing gaps" in genome assemblies. HOT 3
- ask for help? panLTR HOT 1
- Running Ltr_retriever...Died at /app/RepeatModeler-2.0.4/LTRPipeline line 693. HOT 2
- No tbl file issue HOT 5
- LAI is not applicable on the current genome assembly HOT 3
- conda install errors HOT 1
- I can´t install LTR_retriever via conda HOT 4
- Invalid value for shared scalar HOT 10
- awk: fatal: cannot open file `genome.fa.mod.retriever.scn.extend.fa.rexdb.cls.tsv' for reading (No such file or directory) HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ltr_retriever.