GithubHelp home page GithubHelp logo

Comments (8)

stsmall avatar stsmall commented on August 29, 2024

I also noticed that the contigs.reduced.fa has double ">" in the header and the final output is incomplete.
$ tail contigs.reduced.fa
'>>flattened_line_165493:0-1609'
CTGTGCCAGAAGTTATTCGAGGAACCAAGTTCTTTACCCTGTTTGAGAAAATGTAAAATTTTCCTCATTTTTGAGCAATGTAGCAAAAATTTTAAATGTGATATATTTTGATGGGAATGTGTTGCAATAAGTTTCTTTTCACTTAAATAGTGCTTTAAAGTAAAACAAGAATAAAAAATCAGAAAAAAAATTTTAAAAAATTTTCAACTCGAAAATTTCATAAGGGGTACCCCTTGCCATTTTTTTGAGAAATTTTGCAAAAAAATAAAATTGTTTTATTTTAACGCCAATGGGTTGCAATGAGTTTCTTTTCACTCTAATAATGCTTGAAATAAAAAAAAGCGTGAAAAATAATAAAAAAATCAAAAAATTCAAAAAAAAAAATGAAAATTTTCCTCATTTTTGCACCAACTTTCGCCTATAACTCGGTCGGTACCCAACCGATCGCCAATCTTTAACCTGTGGTCGATAGATGGCACCAATGGCTACATTTTCTTCTCGGACAGTCATGCCCTCAGATGTCTGTGCCAGAAGTTATTCGAGGAACCAAGTTCCTTACCCTGTTTGAGAAAATGTAAAATTTTCCTCAGTTTTGAGCAATGCAGCAAAAATTTTAAATGTGATATATTTTGATGGGAATTTGTTGCAATAAGTTTCTTTTCACTTAAATAGTGCTTTAAATTAAAAAAAGAATAAAAAATCAGAAAAAAATTTCAAAAAAATTTTCCACTCCAAAATTTCATAAGGCTTACCCCTTACGATTTTTTTAGGAAATTTTGCAAAAAAAAATAAAATTGTTTTATTTTAATCCCAATTGGTTGCAATGAGTTTCTTTTCACTCTAATAATGCTTGAAATAAAAAAGAGAGTGAAAAATAATAAAAAAATCAAAAATTTCAAAAGGTTTGAAAATTTCATGAGGTTATAGCCTTACTTTTTTCTTGGGAAATTTAGCAAAATTTTATGAATATACGTATTGGTCGTATTGACGTAAATTGACGAGTTGGTTTCCTGTCTGGGAACATCTGGAAACAGGGTCCTTATTGATCTCGTCATGTTCTACCAAAATGGAAACTTTTGTTAGTTCACGTGTTTCTTCCTAGGTGGCGCTTTGCCACATGTTCGTGTCCTAGCTAACTAGGATGCTCTGATCACCACAGTACTCCTCTCCTAGACCGGTCACCGCTGTTACCGATAGCGCAGAGGGATTATAACTCTGCGTTGTGATCGCAGTCGGTCGATATGTTATCGTCCTCCTTTTCTCCGGTCTCCGGTGATGCTGGACTGGACGTCGGCTCTTGGGTTCGAGATGGATACTCGGCCGATGGCGGGTTGTCCGTTGGTGTTTTTCGTCTTCGCGTCGGGTCGGGTGGATCGTCTGTTTTGTAGGCCGATTTCAGGCAATGTTGGCGGTTTCAGGCGGTTGTCCATTTATCAGAATCTTGAAAAACTTAAGGCTTCGCTGAAGTATCCAGTATGATCCTTGGTAGGGTGCTTCGGGCCAACGAGAAAAGTGGTCGATGATTGTCCGGCAGTAGCGGTTGCCGTTGCTGCCTGCAAAGGTCCGATGATGTCTTCGTTGACGTGAGAGAACTTTCGCTCATTGACAC
'>% '

from redundans.

lpryszcz avatar lpryszcz commented on August 29, 2024

Hi, I suppose it's wrongly formatted FASTA file. Can you send me the file?
Or at least do samtools faidx file.fasta or grep -n '^>' file.fasta and make sure all header lines are properly formatted?

from redundans.

stsmall avatar stsmall commented on August 29, 2024

Hi,
You are right. Samtools faidx complains that there are duplicate
sequence IDs. However when I do 'grep -c ">" FOO.fa | wc -l | uniq' the
number of headers is the same as just 'grep -c ">"'. So it seems to be
an issue with my fasta files. I shared the fasta file anyway but I will
test samtools1.3 (running 1.2) and shortening header names tomorrow.

https://www.dropbox.com/s/1seutpqnbiikxzn/Anfunestus_MW.redundans.in.fasta.gz?dl=0

thanks,
scott

On 7/10/16 12:34, Leszek wrote:

Hi, I suppose it's wrongly formatted FASTA file. Can you send me the
file?
Or at least do |samtools faidx file.fasta| or |grep -C '^>'
file.fasta| and make sure all header lines are properly formatted?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#13 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ACtzP_VDQHS4y0rmlLIM-eErSEGEghlWks5qUR8qgaJpZM4JGeDN.

Every gun that is made, every warship launched,
every rocket fired signifies,in the final sense,
a theft from those who hunger and are
not fed, those who are cold and are
not clothed. This world in arms is
not spending money alone. It is
spending the sweat of its
laborers, genius of its
scientists, the hopes
of its children.
--Dwight D. Eisenhower

from redundans.

scapella avatar scapella commented on August 29, 2024

Hi there,

Just to make your life easier when counting duplicated entries, you shoud
do something like "grep ">" INPUT.fa | sort -u | wc -l" vs "grep -c ">"
INPUT.fa". Without ordering, you will not remove duplicated headers.

Hope it helps to clarify.

Cheers,

S

Salva

On Sun, Jul 10, 2016 at 8:27 PM, stsmall [email protected] wrote:

Hi,
You are right. Samtools faidx complains that there are duplicate
sequence IDs. However when I do 'grep -c ">" FOO.fa | wc -l | uniq' the
number of headers is the same as just 'grep -c ">"'. So it seems to be
an issue with my fasta files. I shared the fasta file anyway but I will
test samtools1.3 (running 1.2) and shortening header names tomorrow.

https://www.dropbox.com/s/1seutpqnbiikxzn/Anfunestus_MW.redundans.in.fasta.gz?dl=0

thanks,
scott

On 7/10/16 12:34, Leszek wrote:

Hi, I suppose it's wrongly formatted FASTA file. Can you send me the
file?
Or at least do |samtools faidx file.fasta| or |grep -C '^>'
file.fasta| and make sure all header lines are properly formatted?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#13 (comment),

or mute the thread
<
https://github.com/notifications/unsubscribe/ACtzP_VDQHS4y0rmlLIM-eErSEGEghlWks5qUR8qgaJpZM4JGeDN
.

Every gun that is made, every warship launched,
every rocket fired signifies,in the final sense,
a theft from those who hunger and are
not fed, those who are cold and are
not clothed. This world in arms is
not spending money alone. It is
spending the sweat of its
laborers, genius of its
scientists, the hopes
of its children.
--Dwight D. Eisenhower


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#13 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAH4h1ECJ6WFi_AK4l73Jd7dY7zrDvLrks5qUTmVgaJpZM4JGeDN
.

from redundans.

stsmall avatar stsmall commented on August 29, 2024

ah yes of course, you are right. I always forget to sort when using
uniq. sort -u is a better choice.

On 7/10/16 17:01, Salvador Capella wrote:

Hi there,

Just to make your life easier when counting duplicated entries, you shoud
do something like "grep ">" INPUT.fa | sort -u | wc -l" vs "grep -c ">"
INPUT.fa". Without ordering, you will not remove duplicated headers.

Hope it helps to clarify.

Cheers,

S

Salva

On Sun, Jul 10, 2016 at 8:27 PM, stsmall [email protected] wrote:

Hi,
You are right. Samtools faidx complains that there are duplicate
sequence IDs. However when I do 'grep -c ">" FOO.fa | wc -l | uniq' the
number of headers is the same as just 'grep -c ">"'. So it seems to be
an issue with my fasta files. I shared the fasta file anyway but I will
test samtools1.3 (running 1.2) and shortening header names tomorrow.

https://www.dropbox.com/s/1seutpqnbiikxzn/Anfunestus_MW.redundans.in.fasta.gz?dl=0

thanks,
scott

On 7/10/16 12:34, Leszek wrote:

Hi, I suppose it's wrongly formatted FASTA file. Can you send me the
file?
Or at least do |samtools faidx file.fasta| or |grep -C '^>'
file.fasta| and make sure all header lines are properly formatted?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub

#13 (comment),

or mute the thread
<

https://github.com/notifications/unsubscribe/ACtzP_VDQHS4y0rmlLIM-eErSEGEghlWks5qUR8qgaJpZM4JGeDN

.

Every gun that is made, every warship launched,
every rocket fired signifies,in the final sense,
a theft from those who hunger and are
not fed, those who are cold and are
not clothed. This world in arms is
not spending money alone. It is
spending the sweat of its
laborers, genius of its
scientists, the hopes
of its children.
--Dwight D. Eisenhower


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub

#13 (comment),
or mute the thread

https://github.com/notifications/unsubscribe/AAH4h1ECJ6WFi_AK4l73Jd7dY7zrDvLrks5qUTmVgaJpZM4JGeDN
.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#13 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ACtzP7CPqZfzcaxER5_n9FcxSFvn2BIoks5qUV3BgaJpZM4JGeDN.

Every gun that is made, every warship launched,
every rocket fired signifies,in the final sense,
a theft from those who hunger and are
not fed, those who are cold and are
not clothed. This world in arms is
not spending money alone. It is
spending the sweat of its
laborers, genius of its
scientists, the hopes
of its children.
--Dwight D. Eisenhower

from redundans.

lpryszcz avatar lpryszcz commented on August 29, 2024

Thanks for the feedback, I'll try to add input checking and exception for malformed fasta entries. Stay tuned!

from redundans.

stsmall avatar stsmall commented on August 29, 2024

Hi,
So I cleaned up the duplicates and samtools faidx completes without errors. When I run redundans.py I am still getting an index error. Same one as before:

Traceback (most recent call last):
File "/afs/crc.nd.edu/user/s/ssmall2/programs_that_work/redundans/redundans.py", line 403, in
main()
File "/afs/crc.nd.edu/user/s/ssmall2/programs_that_work/redundans/redundans.py", line 398, in main
o.verbose, o.log)
File "/afs/crc.nd.edu/user/s/ssmall2/programs_that_work/redundans/redundans.py", line 263, in redundans
limit = get_read_limit(reducedFname, readLimit, verbose, log)
File "/afs/crc.nd.edu/user/s/ssmall2/programs_that_work/redundans/redundans.py", line 99, in get_read_limit
stats = fasta_stats(open(fasta))
File "/afs/crc.nd.edu/user/s/ssmall2/programs_that_work/redundans/fasta_stats.py", line 18, in fasta_stats
faidx = FastaIndex(handle)
File "/afs/crc.nd.edu/user/s/ssmall2/programs_that_work/redundans/FastaIndex.py", line 37, in init
self._generate_index()
File "/afs/crc.nd.edu/user/s/ssmall2/programs_that_work/redundans/FastaIndex.py", line 70, in _generate_index
stats = self.get_stats(header, seq, offset)
File "/afs/crc.nd.edu/user/s/ssmall2/programs_that_work/redundans/FastaIndex.py", line 186, in get_stats
linebases, linebytes = len(seq[0].strip()), len(seq[0])
IndexError: list index out of range

from redundans.

lpryszcz avatar lpryszcz commented on August 29, 2024

Hi, I have reproduced your issue. Simply you have empty sequences in your FASTA, like in the example below:

>NODE_243_length_56_cov_106_ID_485
GTTGTCCAGTTGGTTGTCCAGTTGGTTGTCCAGTTGGTTGTCCAGTTGGTTGTCCT
>NODE_244_length_56_cov_213_ID_487
>NODE_245_length_56_cov_149_ID_489
GGACAACCAACTGGACAACCAACTGGACAACCAACTGGACAACCAACAGGACAACC

I have added a piece of code to handle it. Just pull latest vervsion

git pull origin master

Nevertheless, you should remove them from your FASTA.

from redundans.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.