GithubHelp home page GithubHelp logo

Comments (12)

chacalle avatar chacalle commented on June 16, 2024

So the NCBI and BioPython don't actually let you pull genbank files via accession numbers. Issue was started on the BioPython page to track changes for this. biopython/biopython#926

from fauna.

pkundert avatar pkundert commented on June 16, 2024

Any updates on the change-over from GI to accession numbers? I use a similar script to automate generating knock-out vectors. Even after updating xcode and biopython, I still see this error message:

File "dictyko.py", line 624, in
gbrecord = locus_maps(gene, flank) ###returns outfile name to use in primer stuff
File "dictyko.py", line 89, in locus_maps
gi_id, ORF_start, ORF_end, strand = fetch_gene_coordinates(gene)
File "dictyko.py", line 59, in fetch_gene_coordinates
rec = Entrez.read(handle)
File "/Library/Python/2.7/site-packages/Bio/Entrez/init.py", line 376, in read
record = handler.read(handle)
File "/Library/Python/2.7/site-packages/Bio/Entrez/Parser.py", line 205, in read
self.parser.ParseFile(handle)
File "/Library/Python/2.7/site-packages/Bio/Entrez/Parser.py", line 513, in externalEntityRefHandler
self.dtd_urls.append(url)
UnboundLocalError: local variable 'url' referenced before assignment

from fauna.

chacalle avatar chacalle commented on June 16, 2024

@pkundert I'd recommend asking on the BioPython page. I haven't hear anything yet.
biopython/biopython#926

from fauna.

sidneymbell avatar sidneymbell commented on June 16, 2024

I think this issue has now become pressing; running dengue_upload, my accessions list and query are being formed correctly, but this returns giList==[].
https://github.com/nextstrain/fauna/blob/master/vdb/parse.py#L195

I'm investigating now (starting with the biopython issue thread @chacalle mentioned above), but wanted to give people a heads up in the meantime.

from fauna.

sidneymbell avatar sidneymbell commented on June 16, 2024

Seems like people are on it, but it's also a bit of a mess for the time being:
https://ncbiinsights.ncbi.nlm.nih.gov/2016/07/15/ncbi-is-phasing-out-sequence-gis-heres-what-you-need-to-know/comment-page-1/#comment-35754

from fauna.

chacalle avatar chacalle commented on June 16, 2024

@sidneymbell Hey Sidney, I can try helping with this later. Do you know if running update on test_vdb is also failing? python vdb/zika_update.py -db test_vdb -v zika. I feel like people would be creating issues on biopython if this is failing for others as well.

from fauna.

sidneymbell avatar sidneymbell commented on June 16, 2024

Hey @chacalle - I wondered about that as well. I'm planning to spend the morning investigating in more detail, and will certainly start with the zika implementation to see if it's just something specific about the way my code interacts with the base scripts.

It's definitely failing at the step where it tries to run the query with GI numbers (the query itself is being created and formatted correctly), and it hasn't in the past, which makes me rather suspicious though. I'll update here with what I figure out today. Thanks!

from fauna.

sidneymbell avatar sidneymbell commented on June 16, 2024

@chacalle --
So, the good news is it's a false alarm. It is failing on the esearch step (just returning an empty ID list with an error message that sounds a whole lot like it's a GI number issue), but luckily I don't think it's the case (I totally leapt to conclusions here).

The less awesome news is that I'm pretty sure it's related to the number of accessions. This doesn't make a whole lot of sense given that, from the docs

Increasing retmax allows more of the retrieved UIDs to be included in the XML output, up to a maximum of 100,000 records.

and retmax == 10**9 for our queries (in my case, n==6000). But, it's reproducible.

Shouldn't hard to fix, I'll patch it and submit a PR for your thoughts. Thanks for looking at this, and sorry for the confusion!

from fauna.

pawlowac avatar pawlowac commented on June 16, 2024

Has there been any solution to this? Entrez (efetch/epost) won't accept accession.version, but most of their results are given as an accession. Otherwise, is there a way to replace thousands of accession.version with GI numbers?

from fauna.

trvrb avatar trvrb commented on June 16, 2024

After upgrading to biopython 1.68

pip install biopython --upgrade
Successfully installed biopython-1.68

--update_citations is working again for me. I don't know if the underlying bug is actually resolved however,

from fauna.

chacalle avatar chacalle commented on June 16, 2024

@trvrb --update_citations stopped working? It seems like biopython 1.68 was released in August 2016 (http://biopython.org/wiki/Download) so I don't think the underlying problem is fixed. I wonder why it wasn't working.

According to this comment (https://ncbiinsights.ncbi.nlm.nih.gov/2016/07/15/ncbi-is-phasing-out-sequence-gis-heres-what-you-need-to-know/comment-page-1/#comment-35754) they are supposed to blog about it when they do finally change things.

from fauna.

trvrb avatar trvrb commented on June 16, 2024

Oh. This was entirely me then. Thanks for the update.

from fauna.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.