billgreenwald / pubmed-batch-download Goto Github PK
View Code? Open in Web Editor NEWBatch download articles based on PMID (Pubmed ID)
License: MIT License
Batch download articles based on PMID (Pubmed ID)
License: MIT License
Mechanize, after Ruby version 1.9, throws the error
too many connection resets (due to end of file reached - EOFError) after 0 requests on 26040640
for some websites. A workaround is needed to be able to grab documents from particular websites.
** fetching of reprint 28341702 failed from error Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
Hi Bill,
Is it possible to use a pmf with the Ruby version of the script?
Thanks!
To enable using the script from another directory, it'd be good to change
require './pdfetch.rb'
to
require_relative './pdfetch.rb'
Thank you,
David
I got a list of PMIDs and most of the PMIDs return below error.
failed from error Invalid URL 'DYO5YSKQsvZXXy6uuDK4U4OqcUzpL1eBPhVPgvooI9ZjD1OcNxvES35gEbcFgwaa': No scheme supplied. Perhaps you meant http://DYO5YSKQsvZXXy6uuDK4U4OqcUzpL1eBPhVPgvooI9ZjD1OcNxvES35gEbcFgwaa?
Any idea?
Hello!
I am having trouble downloading Elsevier papers, even though I can access them through my academic network. Here are the PMIDs:
30898248
29934065
28325353
28256256
I have many more. Any help you can give is greatly appreciated!
Hi,
I'm trying to do a test of the program and am using your test file.
$ python fetch_pdfs.py -pmf example_pmf.tsv -out test1
However I'm getting a connection error - it seems that eutils.ncbi.nlm.nih.gov is no longer available...
Trying to fetch pmid 27547345
** fetching of reprint 27547345 failed from error HTTPConnectionPool(host='eutils.ncbi.nlm.nih.gov', port=80): Max retries exceeded with url: /entrez/eutils/elink.fcgi?dbfrom=pubmed&id=27547345&retmode=ref&cmd=prlinks (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2aaab511fb38>: Failed to establish a new connection: [Errno -2] Name or service not known'))
Thanks.
I'm getting the following error. Has anyone else experienced this as well? Or is this likely a user error on my part?
Trying to fetch pmid 31619796 Trying genericCitationLabelled Trying pubmed_central_v2 Trying acsPublications Trying uchicagoPress Trying nejm Trying futureMedicine Trying science_direct ** fetching of reprint 31619796 failed from error Invalid URL 'DirectEmailBox-inPage': No schema supplied. Perhaps you meant http://DirectEmailBox-inPage?
I'm using
`
$ ~/anaconda3/bin/conda --version
conda 4.8.3
$ git log
commit 75220d9 (HEAD -> master, origin/master, origin/HEAD)
Author: Bill Greenwald [email protected]
Date: Sat Oct 12 14:50:03 2019 -0700 Update README.md
`
Hey, this is awesome!
Thanks for writing it :)
Have you considered adding a license?
Cheers
Hi, thank you for your program! With all of my PMID's I get one of the following errors:
** fetching of reprint 33191945 failed from error Invalid URL 'voSN1zD2LAqLbgiL7dZrDuKtt2DeC6Ln3TW51UJm5FtsTdsf5zb1XYxjdjTAq5zn': No schema supplied. Perhaps you meant http://voSN1zD2LAqLbgiL7dZrDuKtt2DeC6Ln3TW51UJm5FtsTdsf5zb1XYxjdjTAq5zn?
Trying to fetch pmid 33013186
*
** fetching of reprint 30793269 failed from error Failed to parse: โ
Trying to fetch pmid 32388849
Trying genericCitationLabelled
Trying pubmed_central_v2
Trying acsPublications
Trying uchicagoPress
Trying nejm
Trying futureMedicine
Do you know how I can fix this?
I'm getting the same error regardless of what I do.
python fetch_pdfs.py -pmf example_pmf.tsv
Trying to fetch pmid 28514316
** Reprint 28514316 cannot be fetched as ovid is not supported by the requests package.
python fetch_pdfs.py -pmid 30374447
Trying to fetch pmid 28514316
** Reprint 28514316 cannot be fetched as ovid is not supported by the requests package.
the pubmed ID it is even requesting is incorrect....
Hi, I'm new to Bash (working on Linux Mint 17.2) and not a Ruby user. I installed Ruby 2.1.2 via RVM and the installation went fine. I ran bash setup.sh
and the pubmedid2pdf.rb
script, but obtained the error below:
[12:33] ~/.../Pubmed-Batch-Download$ ruby pubmedid2pdf.rb 26830047,26728431
/home/joanna/.rvm/rubies/ruby-2.1.2/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:54:in `require': cannot load such file -- camping (LoadError)
from /home/joanna/.rvm/rubies/ruby-2.1.2/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:54:in `require'
from /home/joanna/Dropbox/Sketchbook/ruby/Pubmed-Batch-Download/pdfetch.rb:27:in `<top (required)>'
from /home/joanna/.rvm/rubies/ruby-2.1.2/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:54:in `require'
from /home/joanna/.rvm/rubies/ruby-2.1.2/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:54:in `require'
from pubmedid2pdf.rb:37:in `<main>'
Would appreciate if you could suggest how this could be fixed or if I made a mistake somewhere. This is a great tool, and many thanks in advance.
Traceback (most recent call last):
File "fetch_pdfs.py", line 252, in
if type(e)==requests.ConnectionError and '104' in e[0][1][0]:
TypeError: argument of type 'int' is not iterable
python fetch_pdfs.py -pmids 26633170,23682673,25040501,24628937,27174497,27547345,22610656,23858657,24998529,27859194,26991916,26742956,22268844,27547334,16299005,26658101,24458119,24850527,25859332,17522077,22739706,24628897,24232381,23127184,27329944,25480711,25253712,20574680,19333624,24131615,14761053,25704464,26507115,25754608,26655157,28308115,27551374,21777248,24372301,28568420,28309130,22711559,19874617,27777723,26199373,22680336,16004288,26949084,23624924,23339242,22074778,19763848,22666114,27680661,19324745,24138122,23603953,21833640,25002701,24933810,18724731,26070638,28312167,17750894,18707428,16670987,25664897,4066794,21546431,19663992,12803910,24800839,20636902,27038018,25948688,25165527,27648239,24266037,26482059,18593688,27146894,11222244,21636492,23002269,10860912,26987770,25002705,24743567,28311501,23294438,28310242,21237765,23134452,27870050,24372761,21653461,19704675,28565336,19367315,15271088,19910534,23963860,12858276,20576739,28564966,28565464,24287813,25272164,21484398,25347541,28313987,25130655,26817765,22151952,15255098,22652419,21134082,17652341,26573095,24766107,20408751,17711841,28313163,26578721,18289396,28547066,19131378,19121112,19324662,24317664,11080108,27767040,10205070,28310724,22805583,24193000,19412706,21642227,26878831,21632396,26421845,28309726,20592812,25903102,19218583,19001427,21789530,20345818,20047872,28310543,24464206,10568781,20676914,22438504,10431223,20954889,28547089,22519776,11607153,12659040,22156401,19429671,15596454,16371444,19398446,27851814,27714795,28307360,28308328,12437082,19654608,19050951,19516075,28593665,19153768,21636399,22476079,21170748,19126635,28312388,11539321,19218577,16615203,9299797,28565680,14652688,16133196,18637960,16866959,16593140,28564904,28568165,21669711,29673012,18761503,21669696,16866958,14551828,20961923,17879195,17416914,28312462,19443460,18707369,21755150,21636368,17427121,17300430,21665640,28698790,28309456,27864223,28312030,15696741,11222245,28311108,21642173,29880773,17203434,28877178,18426489,20952615,19739370,18031491,29134400,28568788,19158031,29280577,28313078,28428861,21653420,15696748,15280895,11353709,10860920,12207039,28626040,15212378,29532921,28204486,29765587,28960844,29658115,29346506,29468326,28904775,28428199,27915467,28798863,28135774,28647753,28861252,28822496,29947735,29917223,28079938,28504871,29464694,29893413,29878057,29878055,29882762,29445017 -maxRetries 3
Hey Bill,
The code is working fine and whenever possible, the files are getting downloaded. However, all of these pdfs seem corrupt.
To make sure I am not doing anything wrong, I created a virtual environment and downgraded all packages to what you were using when you developed this, still the issue persists,
Some PMIDs to replicate the issue would be the sample ones in your Readme.
Thanks in advance.
I got the same error for all PMIDs I tried so far.
Eg, ** fetching of reprint 123 failed from error list index out of range
I use version pubmed-batch-download 3.0.0, python 3.7.4.
I am far from python savvy so if this is a simple error on my part, I apologize.
I have installed the necessary software and get the following error.
$ python3.7 ~/Repositories/Pubmed-Batch-Download/fetch_pdfs.py -pmids 31336898
Output directory of fetched_pdfs did not exist. Created the directory.
Trying to fetch pmid 31336898
** fetching of reprint 31336898 failed from error 'NoneType' object has no attribute 'readline'
Hello
I just installed the 2 required packages and tried to fetch a couple of refs (using either my PMID or the example_pmf.tsv) but I get the following errors:
Any suggestions?
thanks!
$ python fetch_pdfs.py -pmf example_pmf.tsv ~/anaconda2/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible. utils.DeprecatedIn23, Trying to fetch pmid 27547345 ** fetching of reprint 27547345 failed from error 'NoneType' object has no attribute 'readline'
I have a list of Article title for which I wanted to extract PMID from NCBI, can I do it in one go?
Hi Bill,
I get two other types of error messages for papers I can access if I click through from pubmed.
I suspect that the "badstatusline" error may relate to the fact that I am running the queries from within WSL.
Some example papers are
25176136 - an open NEJM paper
26030325 - a PubMedCentral paper
17074775 - a European heart journal paper
I have given an example of each type of error message
Trying to fetch pmid 25176136
Trying genericCitationLabelled
Trying pubmed_central
Trying acsPublications
Trying uchicagoPress
Trying science_direct
** fetching of reprint 25176136 failed from error Invalid URL '': No schema supplied. Perhaps you meant http://?
Trying to fetch pmid 26030325
** fetching of reprint 26030325 failed from error ('Connection aborted.', BadStatusLine("''",))
Trying to fetch pmid 17074775
** fetching of reprint 17074775 failed from error ('Connection aborted.', BadStatusLine("''",))
Hello,
I am getting an index out of range error for certain PMIDs.
I am able to download about 1 in 3 PMIDs I am seeking and the most common error is "index out of range"
Could the loop indexing be longer than the list of possible errors?
Many thanks
When fetching the physiology articles, I get:
python fetch_pdfs.py -pmid 11045978 Trying to fetch pmid 11045978 Trying genericCitationLabelled Trying pubmed_central_v2 Trying acsPublications Trying uchicagoPress Trying nejm Trying futureMedicine ** fetching reprint using the 'future medicine' finder... ** fetching of reprint 11045978 failed from error HTTPSConnectionPool(host='www.physiology.orghttps', port=443): Max retries exceeded with url: //www.physiology.org/doi/pdf/10.1152/ajpheart.2000.279.5.H2405 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x10e25a588>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known',))
This happens for virtually or their papers. Can you help?
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.