GithubHelp home page GithubHelp logo

Comments (10)

liu3zhenlab avatar liu3zhenlab commented on May 20, 2024 4

(EDTA) v1.5
Successfully finished a maize genome:
TElib: 55Mb
Thanks Shujun.

from edta.

QiushiLi avatar QiushiLi commented on May 20, 2024 3

Thanks for developing EDTA, and this is for your reference.

Fish genome ~480 Mb, EDTA.pl (commit 82b16f6) job finished in 7h.
Linux version 3.10.0-957.27.2.el7.x86_64 (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) -t 24

Step raw_fa size
TIR 32.3 Mb
MITE 20 Kb
LTR 988 Kb
Helitron 1.7 Mb
TElib 30.4 Mb

Cheers,
Qiushi

from edta.

philippbayer avatar philippbayer commented on May 20, 2024 3

Hi

Finally, my rerun without splitting EDTA_raw.pl into subsets finished too :)

It's an amphidiploid plant genome, about 1GB in size, run with 15 threads, commit 757a96d (the one where MITEHunter is turned off)

It ran for 242.8 hours (10.1 days) if run sequentially (so no splitting up of EDTA_raw.pl steps)

Here's the whole log so you can see which steps took how long:

Wed Sep  4 21:07:28 AWST 2019   EDTA_raw: Check files and dependencies, prepare working directories.

Wed Sep  4 21:07:28 AWST 2019   Start to find LTR candidates.

Wed Sep  4 21:07:28 AWST 2019   Identify LTR retrotransposon candidates from scratch.

Tue Sep 10 00:34:18 AWST 2019   Finish finding LTR candidates.

Tue Sep 10 00:34:18 AWST 2019   Start to find TIR candidates.

Tue Sep 10 00:34:18 AWST 2019   Identify TIR candidates from scratch.

Species: others
rm: cannot remove `./TIR-Learner/*': No such file or directory
Finish finding TIR candidates.

Fri Sep 13 09:13:15 AWST 2019   Start to find MITE candidates.

Fri Sep 13 09:13:15 AWST 2019   Identify MITE candidates from scratch.

Fri Sep 13 09:13:15 AWST 2019   Warning: Because MITE-Hunter is too slow and only contribute limited new TIR candidates, it is taken down temporary until a better solution is found.

Fri Sep 13 09:13:15 AWST 2019   Finish finding MITE candidates.

Fri Sep 13 09:13:15 AWST 2019   Start to find Helitron candidates.

Fri Sep 13 09:13:15 AWST 2019   Identify Helitron candidates from scratch.

Fri Sep 13 23:32:36 AWST 2019   Finish finding Helitron candidates.

Fri Sep 13 23:32:36 AWST 2019   Execution of EDTA_raw.pl is finished!
Sun Sep 15 07:17:49 AWST 2019   EDTA basic and advcanced filters finished.

Sun Sep 15 07:17:49 AWST 2019   Perform EDTA final steps to generate a non-redundant comprehensive TE library:

                                Skip the RepeatModeler step (-sensitive 0).
Run EDTA.pl -step final -sensitive 1 if you want to use RepeatModeler.

Sun Sep 15 14:01:06 AWST 2019   EDTA final stage finished! Check out the final EDTA TE library: ragoo.fasta.EDTA.TElib.fa

Sizes:

size raw_fa size
TIR 55M
MITE 55M (copy of TIR due to no MITEHunter)
LTR 12M
Helitron 23M
TElib 46M

The final TElib has 22,921 sequences.

Running a final RepeatMasker run with the final TElib gives me about 75% of the genome being repeats, which makes me very happy :) I should check how many of these repeats are known within Dfam and Repbase, my guess is not that many as a previous run with an older Repbase version gave me a much lower count of repeats

MUCH LATER EDIT:

I've now rerun the final stage with -sensitive 1, which turns on RepeatModeler, which takes forever. WIth the same genome as above:

Mon Sep 30 17:04:54 AWST 2019   Perform EDTA final steps to generate a non-redundant comprehensive TE library:

                                Use RepeatModeler to identify any remaining TEs that are missed by structure-based methods.

Sat Oct  5 14:51:32 AWST 2019   EDTA final stage finished! Check out the final EDTA TE library: ragoo.fasta.EDTA.TElib.fa 

So yeah, adding this step to my pipeline added 5 days of runtime! What this did is add 4,186 repeats of type 'unknown' to my TElib.fa, but these are all tiny (the total filesize increased from 46MB to 47MB). @oushujun is this a problem on my end? Should RepeatModeler have assigned classes to those repeats?

from edta.

elsemikk avatar elsemikk commented on May 20, 2024 1

Thank you so much for making this awesome tool! It makes TA annotation so much easier and I will for sure use it for my next genomes.

I tried it on a bird (~1.2Gb) with -sensitive 0 and a custom protein library and got this result:

step threads Real time User time Number
Raw Helitron 8 605m16s 521m34s 146
Raw TIR 8 1425m38s 8438m38s 5108
Raw LTR 8 49.51s 81m16s 157
Rest of pipeline 23 79m27 1334m50s 5278

(It probably would have been faster if I had given more threads to the TIR search instead of splitting equally between the three searches!)

Thank you,
Else

from edta.

hkchi avatar hkchi commented on May 20, 2024 1

Thank you for the awesome tool. Successfully finished three plant genomes (~3-3.5 Gb):

Step time size
Raw LTR 114-128h 542-695Mb
Raw TIR 56-99h 743-790kb
Raw Helitron 36-60h 186-196Mb
Rest of pipeline 52-53h 56-66Mb

Raw steps run with EDTA v1.7.8 from bioconda, the rest with EDTA v1.8.5. All run with 48 threads (Intel Platinum 8160F Skylake @ 2.1Ghz, RAM=192000M), --sensitive 0 in the FINAL step and CDS provided.

Cheers,
Kaichi

from edta.

oushujun avatar oushujun commented on May 20, 2024 1

Thank you all for using and promoting EDTA since its early stage. As for now, the program has gained some popularity, and works using EDTA have been showing up online. This thread can be retired. Below are some studies citing EDTA:

Identification of key tissue-specific, biological processes by integrating enhancer information in maize gene regulatory networks

Homologous chromosomes in asexual rotifer Adineta vaga suggest automixis

The transposable elements of the Drosophila serrata reference panel

Chromosome-level Genome Assembly of a Regenerable Maize Inbred Line A188

The genome of the Xingu scale-backed antbird (Willisornis vidua nigrigula) reveals lineage-specific adaptations

Improved Brassica oleracea JZS assembly reveals significant changing of LTR-RT dynamics in different morphotypes

from edta.

Wenwen012345 avatar Wenwen012345 commented on May 20, 2024 1

Dear all, Shujun

Thank you for the awesome tool. Successfully finished one plant (Rhododendron)genome (~850M):

size raw_fa size Percent
TIR 143M 15.7%
LTR 660M 43.2%
Helitron 15M 1.6%
TElib 16.9M 60.96%(Total)

command
EDTA.pl --genome genome.fa --anno 1 --threads 40 --step all --species others

The sub-item time is not very clear (it was interrupted several times in the middle), and the total time is about 20h.

Sincerely,
Wen

from edta.

sunnycqcn avatar sunnycqcn commented on May 20, 2024

I recently finished one plant genome about 1.8 Gb.
But I have a little bit confused the repeatmasker statistic results. It only reported LTR and DNA element content. I can nont check any secondary element content, like SINES.

file name: STH.fa
sequences: 3037
total length: 1601564480 bp (1584403095 bp excl N/X-runs)
GC level: 41.61 %
bases masked: 1309747987 bp ( 81.78 %)

           number of      length   percentage
           elements*    occupied  of sequence

SINEs: 0 0 bp 0.00 %
ALUs 0 0 bp 0.00 %
MIRs 0 0 bp 0.00 %

LINEs: 0 0 bp 0.00 %
LINE1 0 0 bp 0.00 %
LINE2 0 0 bp 0.00 %
L3/CR1 0 0 bp 0.00 %

LTR elements: 1584413 1074016988 bp 67.06 %
ERVL 0 0 bp 0.00 %
ERVL-MaLRs 0 0 bp 0.00 %
ERV_classI 0 0 bp 0.00 %
ERV_classII 0 0 bp 0.00 %

DNA elements: 832180 278509658 bp 17.39 %
hAT-Charlie 0 0 bp 0.00 %
TcMar-Tigger 0 0 bp 0.00 %

Unclassified: 78986 11993249 bp 0.75 %

Total interspersed repeats:1364519895 bp 85.20 %

Small RNA: 0 0 bp 0.00 %

Satellites: 0 0 bp 0.00 %
Simple repeats: 0 0 bp 0.00 %
Low complexity: 0 0 bp 0.00 %

  • most repeats fragmented by insertions or deletions
    have been counted as one element

The query species was assumed to be homo
RepeatMasker Combined Database: Dfam_3.0, RepBase-20170127

run with rmblastn version 2.9.0+
The query was compared to classified sequences in "STH.fa.EDTA.TElib.fa"

Wed May 6 11:50:19 EDT 2020 Dependency checking:
All passed!
Wed May 6 11:50:45 EDT 2020 Obtain raw TE libraries using various structure-based programs:
Thu May 7 03:10:34 EDT 2020 Obtain raw TE libraries finished.

Thu May 7 03:10:34 EDT 2020 Perform EDTA advcance filtering for raw TE candidates and generate the stage 1 library:

Thu May 7 06:13:03 EDT 2020 EDTA advcance filtering finished.

Thu May 7 06:13:03 EDT 2020 Perform EDTA final steps to generate a non-redundant comprehensive TE library:

			Use RepeatModeler to identify any remaining TEs that are missed by structure-based methods.

			RepeatModeler is finished, but no consensi.fa.classified files found.

			Skipping the CDS cleaning step (-cds [File]) since no CDS file is provided.

Sun May 10 02:08:29 EDT 2020 EDTA final stage finished! Check out the final EDTA TE library: STH.fa.EDTA.TElib.fa
Sun May 10 02:08:29 EDT 2020 Perform post-EDTA analysis for whole-genome annotation:

Sun May 10 09:50:57 EDT 2020 TE annotation using the EDTA library has finished! Check out:
Whole-genome TE annotation (total TE: 79.16%): STH.fa.EDTA.TEanno.gff
Low-threshold TE masking for MAKER gene annotation (masked: 48.06%): STH.fa.MAKER.masked

Sun May 10 09:51:05 EDT 2020 Evaluate the level of inconsistency for whole-genome TE annotation (slow step):

Mon May 18 06:27:53 EDT 2020 Evaluation of TE annotation finished! Check out these files:

			Overall: STH.fa.EDTA.TE.fa.stat.all.sum
			Nested: STH.fa.EDTA.TE.fa.stat.nested.sum
			Non-nested: STH.fa.EDTA.TE.fa.stat.redun.sum

Confusion matrix of STH.fa.EDTA.TE.fa.stat for the all category
DNA/D DNA/DTA DNA/DTC DNA/DTH DNA/DTM DNA/DTT DNA/Heeaning DNA/HelitrTM DNA/Helitron DNAa Gypsy L LT LT/Copia LTR/ LTR/Cocovering LTR/Copia LTR/Copig LTR/Coping LTR/G LTR/Gypnknown LTR/Gypof LTR/Gypsy LTR/ering LTR/u LTR/uly LTR/unTR/unknown LTR/ung LTR/unking LTR/unkn LTR/unknowR/unknown LTR/unknown LTR/unknownring LTTR/Gypsy LTy Lof Ly MITE/DTA MITE/DTC MITE/DTH MITE/DTM MITE/DTT covering Misclas_rate
DNA/D 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.0000
DNA/DTA 0 115616 10068 2813 17086 57 0 0 18704 0 0 0 0 0 0 0 10275 0 0 0 0 0 6465 0 0 0 0 0 0 0 0 4218 0 0 0 0 0 2144 159 44 2403 2 0 0.3917
DNA/DTC 0 10936 61128 1089 7734 413 0 0 9215 0 0 0 0 0 0 0 3738 0 0 0 0 0 1445 0 0 0 0 0 0 0 0 2208 0 0 0 0 0 172 937 728 2618 16 0 0.4029
DNA/DTH 0 2528 1434 29335 3219 11 0 0 5818 0 0 0 0 0 0 0 1661 0 0 0 0 0 561 0 0 0 0 0 0 0 0 1438 0 0 0 0 0 275 1 699 471 0 0 0.3818
DNA/DTM 0 12351 6859 3702 348436 121 0 0 18988 0 0 0 0 0 0 0 25783 0 0 0 0 0 9673 0 0 0 0 0 0 0 0 6997 0 0 0 0 0 613 77 855 14894 59 0 0.2247
DNA/DTT 0 110 518 25 222 4388 0 0 202 0 0 0 0 0 0 0 138 0 0 0 0 0 3 0 0 0 0 0 0 0 0 709 0 0 0 0 0 0 0 2 856 4 0 0.3886
DNA/Heeaning 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.0000
DNA/HelitrTM 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.0000
DNA/Helitron 0 22063 11264 8713 96179 137 0 0 830187 0 0 0 0 0 0 0 154178 0 0 0 0 0 31254 0 0 0 0 0 0 0 0 20428 0 0 0 0 0 86 99 707 104423 93 0 0.3513
DNAa 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.0000
Gypsy 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.0000
L 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1.0000
LT 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.0000
LT/Copia 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1.0000
LTR/ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.0000
LTR/Cocovering 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.0000
LTR/Copia 0 9424 17496 2292 67110 66 0 0 29216 0 0 0 0 0 0 0 972675 0 0 0 0 0 40664 0 0 0 0 0 0 0 0 128559 0 0 0 0 0 1222 8 317 955 4 0 0.2341
LTR/Copig 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1.0000
LTR/Coping 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.0000
LTR/G 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.0000
LTR/Gypnknown 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.0000
LTR/Gypof 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.0000
LTR/Gypsy 0 10301 1774 622 26632 6 0 0 14217 0 0 0 0 0 0 0 49642 0 0 0 0 0 2675171 0 0 0 0 0 0 1 0 1111152 0 0 0 0 0 155 243 100 1210 1 0 0.3125
LTR/ering 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1.0000
LTR/u 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.0000
LTR/uly 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.0000
LTR/unTR/unknown 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.0000
LTR/ung 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.0000
LTR/unking 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1.0000
LTR/unkn 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1.0000
LTR/unknowR/unknown 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.0000
LTR/unknown 0 6511 14195 3411 34422 135 0 0 28722 0 0 0 0 0 0 0 148343 0 0 0 0 0 799980 0 0 0 0 0 0 0 0 2486668 0 0 0 0 0 456 16 316 940 136 0 0.2944
LTR/unknownring 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.0000
LTTR/Gypsy 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.0000
LTy 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.0000
Lof 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1.0000
Ly 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1.0000
MITE/DTA 0 2087 149 431 682 0 0 0 276 0 0 0 0 0 0 0 741 0 0 0 0 0 80 0 0 0 0 0 0 0 0 349 0 0 0 0 0 14944 0 14 503 0 0 0.2622
MITE/DTC 0 75 1473 0 79 0 0 0 105 0 0 0 0 0 0 0 23 0 0 0 0 0 103 0 0 0 0 0 0 0 0 11 0 0 0 0 0 0 6397 0 3 0 0 0.2264
MITE/DTH 0 41 226 930 666 1 0 0 508 0 0 0 0 0 0 0 129 0 0 0 0 0 30 0 0 0 0 0 0 0 0 79 0 0 0 0 0 6 0 5830 278 0 0 0.3317
MITE/DTM 0 998 1429 212 3682 232 0 0 534 0 0 0 0 0 0 0 598 0 0 0 0 0 400 0 0 0 0 0 0 0 0 854 0 0 0 0 0 362 2 511 33915 0 0 0.2244
MITE/DTT 0 1 0 0 43 73 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 0 0 0 0 0 1 0 0 0 327 0 0.2937
covering 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1.0000

from edta.

yangyi09 avatar yangyi09 commented on May 20, 2024

Hi,

Thanks for developing such a cool software,I have successfully annotated the TEs of an insect genome.

But I want to known if the library constructed by EDTA could be used as the consensus sequences for calculating the age of individual TEs?

Thank you,
Yang yi

from edta.

oushujun avatar oushujun commented on May 20, 2024

from edta.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.