Comments (12)
So the NCBI and BioPython don't actually let you pull genbank files via accession numbers. Issue was started on the BioPython page to track changes for this. biopython/biopython#926
from fauna.
Any updates on the change-over from GI to accession numbers? I use a similar script to automate generating knock-out vectors. Even after updating xcode and biopython, I still see this error message:
File "dictyko.py", line 624, in
gbrecord = locus_maps(gene, flank) ###returns outfile name to use in primer stuff
File "dictyko.py", line 89, in locus_maps
gi_id, ORF_start, ORF_end, strand = fetch_gene_coordinates(gene)
File "dictyko.py", line 59, in fetch_gene_coordinates
rec = Entrez.read(handle)
File "/Library/Python/2.7/site-packages/Bio/Entrez/init.py", line 376, in read
record = handler.read(handle)
File "/Library/Python/2.7/site-packages/Bio/Entrez/Parser.py", line 205, in read
self.parser.ParseFile(handle)
File "/Library/Python/2.7/site-packages/Bio/Entrez/Parser.py", line 513, in externalEntityRefHandler
self.dtd_urls.append(url)
UnboundLocalError: local variable 'url' referenced before assignment
from fauna.
@pkundert I'd recommend asking on the BioPython page. I haven't hear anything yet.
biopython/biopython#926
from fauna.
I think this issue has now become pressing; running dengue_upload
, my accessions list and query
are being formed correctly, but this returns giList==[]
.
https://github.com/nextstrain/fauna/blob/master/vdb/parse.py#L195
I'm investigating now (starting with the biopython issue thread @chacalle mentioned above), but wanted to give people a heads up in the meantime.
from fauna.
Seems like people are on it, but it's also a bit of a mess for the time being:
https://ncbiinsights.ncbi.nlm.nih.gov/2016/07/15/ncbi-is-phasing-out-sequence-gis-heres-what-you-need-to-know/comment-page-1/#comment-35754
from fauna.
@sidneymbell Hey Sidney, I can try helping with this later. Do you know if running update on test_vdb
is also failing? python vdb/zika_update.py -db test_vdb -v zika
. I feel like people would be creating issues on biopython if this is failing for others as well.
from fauna.
Hey @chacalle - I wondered about that as well. I'm planning to spend the morning investigating in more detail, and will certainly start with the zika implementation to see if it's just something specific about the way my code interacts with the base
scripts.
It's definitely failing at the step where it tries to run the query with GI numbers (the query itself is being created and formatted correctly), and it hasn't in the past, which makes me rather suspicious though. I'll update here with what I figure out today. Thanks!
from fauna.
@chacalle --
So, the good news is it's a false alarm. It is failing on the esearch
step (just returning an empty ID list with an error message that sounds a whole lot like it's a GI number issue), but luckily I don't think it's the case (I totally leapt to conclusions here).
The less awesome news is that I'm pretty sure it's related to the number of accessions. This doesn't make a whole lot of sense given that, from the docs
Increasing retmax allows more of the retrieved UIDs to be included in the XML output, up to a maximum of 100,000 records.
and retmax == 10**9
for our queries (in my case, n==6000
). But, it's reproducible.
Shouldn't hard to fix, I'll patch it and submit a PR for your thoughts. Thanks for looking at this, and sorry for the confusion!
from fauna.
Has there been any solution to this? Entrez (efetch/epost) won't accept accession.version, but most of their results are given as an accession. Otherwise, is there a way to replace thousands of accession.version with GI numbers?
from fauna.
After upgrading to biopython 1.68
pip install biopython --upgrade
Successfully installed biopython-1.68
--update_citations
is working again for me. I don't know if the underlying bug is actually resolved however,
from fauna.
@trvrb --update_citations
stopped working? It seems like biopython 1.68 was released in August 2016 (http://biopython.org/wiki/Download) so I don't think the underlying problem is fixed. I wonder why it wasn't working.
According to this comment (https://ncbiinsights.ncbi.nlm.nih.gov/2016/07/15/ncbi-is-phasing-out-sequence-gis-heres-what-you-need-to-know/comment-page-1/#comment-35754) they are supposed to blog about it when they do finally change things.
from fauna.
Oh. This was entirely me then. Thanks for the update.
from fauna.
Related Issues (20)
- Geographic error? HOT 2
- Switch out `xlrd` HOT 1
- fauna downloads fail with Python 3.10
- PhantomJS not found on PATH - installation via npm install HOT 2
- Set `serum_id` to `lot_number` for CDC titer imports HOT 4
- feat: BV-BRC support HOT 1
- serum_passage_category should be set to "egg" instead of "cell" for CDC human pool data like "L21/22 H3-EGG HUMAN POOL" HOT 7
- Assign correct host to titers from non-ferret hosts (e.g., human and mouse)
- Geolocation assignments fail for duplicate location names HOT 2
- Replace nextstrain remote with aws commands
- Automate backup of Fauna databases to S3 HOT 4
- Support ingest of individual-level human serology data for seasonal flu viruses
- Revisit tdb/upload's `index_fields` HOT 1
- Suggest using direct clinical sample sequence for MEX_CIENI551 Zika genome
- Annotate titer TSVs with source and passage
- fauna uploads fail in python 3 unicode error HOT 1
- argument parser in upload.py HOT 3
- Migrate to pandas 0.17 HOT 6
- Fauna installation fails for some users who don't run `npm install` inside of `/chateau` HOT 3
- fauna doesn't work with rethinkdb 2.4 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fauna.