Comments (5)
At least for human, (TTAGGG)n only occurs at the 3'-end of a chromosome. I assume Chr1_chal_sis
is very long and you are only showing part of the sequence. If this is the case, seqtk telo
wouldn't consider (TTATTGGG)n as part of a telomere because it is on the 5'-end.
from seqtk.
Just checked the paper. They were using short reads. The inverted motif could be an assembly artifact. It would be more convincing if we have HiFi reads.
from seqtk.
We do have a HiFi assembly made with hifiasm using PacBio long reads. And, as you say, the test text was a very truncated view of the Chr, thanks for pointing out the folly of using it as a reliable test.
I found the 8mer from the short read paper but confirmed its existence in our HiFi assembly using grep for the 8mer.
I'll go back to that and get additional info.
Thank you.
from seqtk.
its existence in our HiFi assembly using grep for the 8mer.
Where are TTATTGGG 8-mers? Are they located on the 3'-ends of contigs, or following (CCCAATAA)n on the 5'-end like the example you have shown?
from seqtk.
At the bottom or very near bottom.
I made a 8Mbp test file named tst2 with the CCCAATAA at top TTATTGGG at bottom and revcomp version tst2_rc to flip them and these were the results:
$ seqtk telo -m CCCAATAA tst2
Chr1r_chal_sis 0 960 8403000
960 8403000
$ seqtk telo -m CCCAATAA tst2_rc
Chr1r_chal_sis 8402040 8403000 8403000
960 8403000
$ seqtk telo -m TTATTGGG tst2
0 8403000
$ seqtk telo -m TTATTGGG tst2_rc
0 8403000
$ grep TTATTGGG tst2_rc -n
70019:GGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTG
70020:GGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTG
70021:GGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTG
70022:GGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTG
70023:GGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTG
70024:GGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTG
70025:GGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTG
70026:GGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTGGGTTATTG
I used seqtk to fold at 120 chars for the file, so first TTATTGGG hit is at 8,402,280 bases in.
I'm sure you have more important things, I just thought having tried in this non-standard case that I would pass it along. I can always use a squishy regex with grep to find as I have been doing with the vertebrate 6mer.
As an aside, I do hope that telomere awareness can be used in assemblers and scaffolders as an option. We run a telomere script after every assembly or use of yahs and show the records and their telos as TOP, TOP_NEAR, BOTTOM, BOTTOM_NEAR or MIDDLE.
Thanks for taking the time.
from seqtk.
Related Issues (20)
- subseq empty output
- seqtk sample: with out without replacement? HOT 1
- `seqtk seq` segfaults on 10G scaffolds HOT 4
- seqtk sample not working as expected HOT 2
- seqtk sample can't properly output fastq.gz HOT 1
- ERROR: the 2nd file has fewer records HOT 1
- The output file size of seqtk subseq is zero HOT 1
- Question: DNA string compressing HOT 1
- seqtk produces different number of reads for paired end files HOT 1
- Problem with seqtk sample HOT 1
- output file contains only one amino acid HOT 1
- seqtk comp count CpG
- Seqtk to count sequences same SeqID
- `seqtk hpc input.fq` ignores the quality and converts to .fa
- Is the "sample" feature subsampling without replacement? HOT 1
- DNS Resolution Warning with Singularity Container
- buggy behavior with seqtk subseq command HOT 1
- converting fasta to fastq HOT 2
- Quality scores HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from seqtk.