GithubHelp home page GithubHelp logo

invertebrate support about ltr_retriever HOT 12 CLOSED

oushujun avatar oushujun commented on May 20, 2024
invertebrate support

from ltr_retriever.

Comments (12)

oushujun avatar oushujun commented on May 20, 2024

Hi Luohao,

I tried LTR_retriever on fruitfly, mouse, micro- and mega- bats, and human, and it worked similarly as in plants, although most of these species have much less LTR content in their genomes. LAI requires a minimum of 5% total LTR and 0.1% intact LTR sequences present in the genome for the purpose of accurate evaluation, so you may need to check these two values.

For classification of LTR superfamilies, LTR_retriever uses models trained from rice LTR classifications, so the same model may not be applicable to invertebrate genomes. However, the classification information is not the major factor to identify LTR elements. You may need to do the classification yourself based on the identified LTR elements.

Best,
Shujun

from ltr_retriever.

lurebgi avatar lurebgi commented on May 20, 2024

Hi Shujun,

Thanks for your email. In amphioxus it seems LTR content is less than 1%, that's might be the reason.

On another note, LTR_retirever annotated 25.27% LTRs (according to the .tbl file) in a tilapia genome while the actual portion should be about 4%. I wonder if it has a lot false positives? The LAI score is also unexpectedly low for a Pacbio assembly: 3.01. Below is the script I used, would you have any suggestions for reducing false positives?

`/apps/genometools/1.5.9/bin/gt suffixerator -db $genome -indexname gt_index/$g -suf -lcp -des -ssp -sds -dna
/apps/genometools/1.5.9/bin/gt ltrharvest -index gt_index/$g -maxlenltr 7000 -maxtsd 6 -mintsd 4 -seqids yes -vic 10 -similar 90 -seed 20 > $g.harvest.scn
/apps/genometools/1.5.9/bin/gt ltrharvest -index gt_index/$g -maxlenltr 7000 -maxtsd 6 -mintsd 4 -seqids yes -vic 10 -similar 90 -seed 20 -motif TGCA -motifmis 1 > $g.harvest.motif.scn

/scratch/luohao/software/LTR_Finder/source/ltr_finder -D 15000 -d 1000 -L 7000 -l 100 -p 20 -C -M 0.9 $genome > $g.finder.scn

perl /scratch/luohao/software/mgescan-1.1/mgescan/ltr/find_ltr.pl -seq=$genome -min-ltr=100 -max-ltr=7000 -min_iden=90

/scratch/luohao/software/LTR_retriever-2.0/LTR_retriever -genome $g -nonTGCA $g.harvest.scn -inharvest $g.harvest.motif.scn -infinder $g.finder.scn -threads=20`

Thanks!

from ltr_retriever.

wangzhennan14 avatar wangzhennan14 commented on May 20, 2024

Hi Shujun,

Thanks for your email. In amphioxus it seems LTR content is less than 1%, that's might be the reason.

On another note, LTR_retirever annotated 25.27% LTRs (according to the .tbl file) in a tilapia genome while the actual portion should be about 4%. I wonder if it has a lot false positives? The LAI score is also unexpectedly low for a Pacbio assembly: 3.01. Below is the script I used, would you have any suggestions for reducing false positives?

`/apps/genometools/1.5.9/bin/gt suffixerator -db $genome -indexname gt_index/$g -suf -lcp -des -ssp -sds -dna
/apps/genometools/1.5.9/bin/gt ltrharvest -index gt_index/$g -maxlenltr 7000 -maxtsd 6 -mintsd 4 -seqids yes -vic 10 -similar 90 -seed 20 > $g.harvest.scn
/apps/genometools/1.5.9/bin/gt ltrharvest -index gt_index/$g -maxlenltr 7000 -maxtsd 6 -mintsd 4 -seqids yes -vic 10 -similar 90 -seed 20 -motif TGCA -motifmis 1 > $g.harvest.motif.scn

/scratch/luohao/software/LTR_Finder/source/ltr_finder -D 15000 -d 1000 -L 7000 -l 100 -p 20 -C -M 0.9 $genome > $g.finder.scn

perl /scratch/luohao/software/mgescan-1.1/mgescan/ltr/find_ltr.pl -seq=$genome -min-ltr=100 -max-ltr=7000 -min_iden=90

/scratch/luohao/software/LTR_retriever-2.0/LTR_retriever -genome $g -nonTGCA $g.harvest.scn -inharvest $g.harvest.motif.scn -infinder $g.finder.scn -threads=20`

Thanks!

Hi Luohao,
Where did you download the mgescan-1.1? Can you give me the url? I have download three mgescan packages, but all of them did not work.

Thank you very Much!
Zhennan

from ltr_retriever.

lurebgi avatar lurebgi commented on May 20, 2024

from ltr_retriever.

oushujun avatar oushujun commented on May 20, 2024

@wangzhennan14

For MGEScan_LTR please refer to #8 and #19. Let me know if you need further help, thanks!

Shujun

from ltr_retriever.

oushujun avatar oushujun commented on May 20, 2024

@lurebgi

Sorry for delay response (somehow I thought I did).

Your commands look good, but I have no idea about the total LTR content of amphioxus. If you suspect high proportions of false positives, you may manually curate a couple of them to verify (try NCBI blast and see what are they). If you do find some, please post example sequences here with 100bp extended on up- and downstreams, which would help to debug.

If LTR content is too low, then LAI is not accurate. You may plot out regional LAI values in the *.LAI file to see if there is any uneven distribution. Using long reads is not a guarantee of assembly quality, which is also depended on a lot of things.

Shujun

from ltr_retriever.

lurebgi avatar lurebgi commented on May 20, 2024

from ltr_retriever.

oushujun avatar oushujun commented on May 20, 2024

@lurebgi I am curious how the 4% LTR in tilapia is estimated?

from ltr_retriever.

lurebgi avatar lurebgi commented on May 20, 2024

from ltr_retriever.

oushujun avatar oushujun commented on May 20, 2024

@lurebgi Repbase is a database for known TEs. The sequence of LTR elements varies wildly between species, so using other species's LTR sequence to identify the tilapia LTR sequence should be an underestimate. RepeatModeler is a general method for TE identification. It has some attempts to classify TEs but also not accurate in our experience. RepeatModeler can work as a supplement after some good identifications, but Repbase is not a good approach for LTR finding.

from ltr_retriever.

lurebgi avatar lurebgi commented on May 20, 2024

from ltr_retriever.

oushujun avatar oushujun commented on May 20, 2024

@lurebgi Thanks for sharing the paper. I read the method section. TE annotations were based on RepeatModeler or RepeatScout, so this is kind of a loop. Since both methods are copy-number based, low copy number TEs will be missed out. You may try to figure what new elements are annotated by LTR_retriever. I'll be happy to see how it works/fails.

from ltr_retriever.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.