andycasey / ads Goto Github PK

Python tool for ADS

License: MIT License

Python 100.00%

ads's Introduction

A Python Module to Interact with NASA's ADS that Doesn't Suck™

If you're in astro research, then you pretty much need NASA's ADS. It's tried, true, and people go crazy on the rare occasions when it goes down.

Quickstart

>>> import ads >>> ads.config.token = 'secret token' >>> papers = ads.SearchQuery(q="supernova", sort="citation_count") >>> for paper in papers: >>> print(paper.title) [u'Maps of Dust Infrared Emission for Use in Estimation of Reddening and Cosmic Microwave Background Radiation Foregrounds'] [u'Measurements of Omega and Lambda from 42 High-Redshift Supernovae'] [u'Observational Evidence from Supernovae for an Accelerating Universe and a Cosmological Constant'] [u'First-Year Wilkinson Microwave Anisotropy Probe (WMAP) Observations: Determination of Cosmological Parameters'] [u'Abundances of the elements: Meteoritic and solar']

Running Tests

> cd /path/to/ads > python -m unittest discover

ads's People

Stargazers

Watchers

ads's Issues

Getting associated articles

I'm trying to access the 'Associated articles' section for e.g.:

http://adsabs.harvard.edu/abs/2013AJ....146...62M

In this case, I should be able to find the erratum. Is there a recommended way to do this?

Make stub data accessible

There are some stub data for both expected SolrQuery and MetricsQuery, in the repository. It'd be useful to have them accessible via the import.

eg.,

from ads.tests import solr_stub_data

The context being that we have a lot of users testing on the API directly and hitting their API rate limit, rather than using the stub data you've got available in the package.

'Article' object has no attribute 'id'

I didn't have much time to investigate, but this is the error I'm seeing with latest master:

In [2]: import ads
In [3]: ads.config.token = '.....'
In [4]: x = list(ads.SearchQuery(author='chyla', fl='orcid_pub', rows=10))
In [5]: a = x[0]
In [7]: a.author
---------------------------------------------------------------------------
APIResponseError                          Traceback (most recent call last)
<ipython-input-7-e176e79c7643> in <module>()
----> 1 a.author

/dvt/workspace/ads/python/lib/python2.7/site-packages/werkzeug/utils.pyc in __get__(self, obj, type)
     71         value = obj.__dict__.get(self.__name__, _missing)
     72         if value is _missing:
---> 73             value = self.func(obj)
     74             obj.__dict__[self.__name__] = value
     75         return value

/dvt/workspace/ads/ads/search.py in author(self)
    137     @cached_property
    138     def author(self):
--> 139         return self._get_field('author')
    140 
    141     @cached_property

/dvt/workspace/ads/ads/search.py in _get_field(self, field)
    117         """
    118         if not hasattr(self, "id") or self.id is None:
--> 119             raise APIResponseError("Cannot query an article without an id")
    120         sq = next(SearchQuery(q="id:{}".format(self.id), fl=field))
    121         # If the requested field is not present in the returning Solr doc,

APIResponseError: 'Cannot query an article without an id'

In [8]: a.id
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-8-5026dfbe4bec> in <module>()
----> 1 a.id

AttributeError: 'Article' object has no attribute 'id'

Thanks a lot for the very useful updates you made last weeks. Is there any chance that it would be reflected by a new version on pypi? My colleagues are not that familiar with github versioning system, while pip -U is so efficient.
Thanks a lot,
Christophe

All queries fail with SSLError

Hi Andy,

Just installed ads, but can't seem to do anything yet. All the queries I try (even using sandbox) end up with a SSLError: ('bad handshake: WantWriteError()',).

Expected Behavior

As per the documentation, I expect that the following will work:

>>> import ads.sandbox as ads
>>> q = ads.SearchQuery(q='star')
>>> for paper in q:
>>>     print(paper.title, paper.citation_count)

Current Behavior

But I get the following traceback:

WantWriteError                            Traceback (most recent call last)
/Users/tiago/anaconda/envs/python3/lib/python3.6/site-packages/requests/packages/urllib3/contrib/pyopenssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname)
    435             try:
--> 436                 cnx.do_handshake()
    437             except OpenSSL.SSL.WantReadError:

/Users/tiago/anaconda/envs/python3/lib/python3.6/site-packages/OpenSSL/SSL.py in do_handshake(self)
   1425         result = _lib.SSL_do_handshake(self._ssl)
-> 1426         self._raise_ssl_error(self._ssl, result)
   1427

/Users/tiago/anaconda/envs/python3/lib/python3.6/site-packages/OpenSSL/SSL.py in _raise_ssl_error(self, ssl, result)
   1150         elif error == _lib.SSL_ERROR_WANT_WRITE:
-> 1151             raise WantWriteError()
   1152         elif error == _lib.SSL_ERROR_ZERO_RETURN:

WantWriteError:

During handling of the above exception, another exception occurred:

followed by a few other SSLError: ('bad handshake: WantWriteError()',) errors.

Your Environment

ads from pip (v. 0.12.3)
Tried with anaconda in python 2.7.12 and 3.6.0 (all latest versions).
pyopenssl 16.2.0
requests 2.13.0

Extension of Query types

What is the general thought on extending the types of Query objects in the library? I see that @andycasey already added a commented line with LibraryQuery.

For example, there are a few other services that can be leveraged (IF there is need, which I'm not saying there necessarily is in this list):

Libraries (https://github.com/adsabs/biblib-service)
myADS (https://github.com/adsabs/myads)
Visualisations (https://github.com/adsabs/vis-services)
Object (https://github.com/adsabs/object_service)
Graphics (https://github.com/adsabs/graphics_service)
Recommender (https://github.com/adsabs/recommender_service)

Non-lazy loading leads to missing properties in SeachQuery

A lazy query is giving me normal, valid responses with all of the properties in place and seemingly valid. Adding fl parameters however breaks some of those parameters (example below) giving "Unknown author" and "Unknown year" responses.

Steps to Reproduce (for bugs)

bib = '2010ApJ...725L..91K'
papers = ads.SearchQuery(bibcode=bib)
for pp in papers:
    print(pp, pp.bibcode)

papers = ads.SearchQuery(bibcode=bib, fl=['bibcode'])
for pp in papers:
    print(pp, pp.bibcode)

Produces the output:

<Kelley, Luke Zoltan et al. 2010, 2010ApJ...725L..91K> 2010ApJ...725L..91K
<Unknown author Unknown year, 2010ApJ...725L..91K> 2010ApJ...725L..91K

Your Environment

print(ads.__version__)
print(sys.version)
> 0.12.3
> 3.5.2 |Anaconda custom (x86_64)| (default, Jul  2 2016, 17:52:12) 
  [GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)]

`ads.SearchQuery` returns a status 500 APIResponseError

As of a few days ago, any query executed via ads.SearchQuery is failing for me with a Status 500 (Internal Server Error). In contrast, queries executed directly via curl work fine. Can anyone reproduce this?

Queries via curl work fine:

curl -H "Authorization: Bearer XXXXXX" 'https://api.adsabs.harvard.edu/v1/search/query?q=supernova'
{"responseHeader":{"status":0,"QTime":1202,"params":{"q":"supernova","x-amzn-trace-id":"Root=1-5ac45571-9b4e04a76ed0f5a5582825ba","fl":"id","start":"0","rows":"10","wt":"json"}},"response":{"numFound":189894,"start":0,"docs":[{"id":"13974266"},{"id":"14024550"},{"id":"13449936"},{"id":"14287281"},{"id":"14287265"},{"id":"14221554"},{"id":"14862719"},{"id":"14287247"},{"id":"14287150"},{"id":"14496802"}]}}

Same query via ads.SearchQuery fails:

Python 3.6.2 |Anaconda custom (64-bit)| (default, Jul 20 2017, 13:51:32) 

In [1]: import ads

In [2]: q = ads.SearchQuery(q="supernova")

In [3]: for paper in q:
   ...:     print(paper.title)
   ...: 
---------------------------------------------------------------------------
APIResponseError                          Traceback (most recent call last)
<ipython-input-3-d676df0a2251> in <module>()
----> 1 for paper in q:
      2     print(paper.title)

/home/gb/dev/ads/ads/search.py in __next__(self)
    499         # explicitly
    500         if self.response is None:
--> 501             self.execute()
    502 
    503         try:

/home/gb/dev/ads/ads/search.py in execute(self)
    534 
    535         self.response = SolrResponse.load_http_response(
--> 536             self.session.get(self.HTTP_ENDPOINT, params=q)
    537         )
    538 

/home/gb/dev/ads/ads/base.py in load_http_response(cls, http_response)
     92         """
     93         if not http_response.ok:
---> 94             raise APIResponseError(http_response.text)
     95         c = cls(http_response)
     96         c.response = http_response

APIResponseError: '{"responseHeader":{"status":500,"QTime":0,"params":{"json":"x-amzn-trace-id=Root%3D1-5ac4550e-a56a60d4d40c08349e8c5556&rows=10&q=identifier%3A2017AJ....153..265W&start=0&wt=json&fl=id"}},"error":{"msg":"java.lang.String cannot be cast to java.util.Map","trace":"java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map\\n\\tat org.apache.solr.request.json.ObjectUtil.mergeObjects(ObjectUtil.java:108)\\n\\tat org.apache.solr.request.json.RequestUtil.mergeJSON(RequestUtil.java:262)\\n\\tat org.apache.solr.request.json.RequestUtil.processParams(RequestUtil.java:177)\\n\\tat org.apache.solr.util.SolrPluginUtils.setDefaults(SolrPluginUtils.java:180)\\n\\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:150)\\n\\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:2213)\\n\\tat org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)\\n\\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460)\\n\\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:303)\\n\\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254)\\n\\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)\\n\\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)\\n\\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\\n\\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\\n\\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\\n\\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)\\n\\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)\\n\\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\\n\\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)\\n\\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\\n\\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\\n\\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\\n\\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\\n\\tat org.eclipse.jetty.server.Server.handle(Server.java:518)\\n\\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\\n\\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\\n\\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\\n\\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\\n\\tat org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\\n\\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)\\n\\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)\\n\\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)\\n\\tat org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\\n\\tat java.lang.Thread.run(Thread.java:748)\\n","code":500}}\n'

Feature Request: search by arxiv-id

There are situations in which I have an arxiv ID, and I would like to get the ADS entry (for example) to retrieve the bibcode for the paper. Currently, I can query arxiv (using something like https://github.com/lukasschwab/arxiv.py), find the DOI from there, then query ads using the doi to finally get the entry and bibcode.

It would be awesome if there was a way to directly query by arxiv id, like the html api (i.e. http://adsabs.harvard.edu/cgi-bin/bib_query?arXiv:XXXX.XXXXX).

Unexpected `APIResponseError` when accessing `article.pub`

I'm seeing the following strange behaviour in both Python 2 and 3. I'm struggling to understand why? / cc @vsudilov

In [1]: import ads

In [2]: article = list(ads.SearchQuery(q="2015rasc.conf..166A", fl="*"))[0]

In [3]: print(article.id)
11058573

In [4]: print(article.pub)
---------------------------------------------------------------------------
APIResponseError                          Traceback (most recent call last)
<ipython-input-4-d5d34feddeac> in <module>()
----> 1 print(article.pub)

/home/gb/bin/anaconda/envs/p2/lib/python2.7/site-packages/werkzeug/utils.pyc in __get__(self, obj, type)
     69         value = obj.__dict__.get(self.__name__, _missing)
     70         if value is _missing:
---> 71             value = self.func(obj)
     72             obj.__dict__[self.__name__] = value
     73         return value

/home/gb/bin/anaconda/envs/p2/lib/python2.7/site-packages/ads-0.0.809-py2.7.egg/ads/search.pyc in pub(self)
    173     @cached_property
    174     def pub(self):
--> 175         return self._get_field('pub')
    176 
    177     @cached_property

/home/gb/bin/anaconda/envs/p2/lib/python2.7/site-packages/ads-0.0.809-py2.7.egg/ads/search.pyc in _get_field(self, field)
    119             raise APIResponseError("Cannot query an article without an id")
    120         sq = SearchQuery(q="id:{}".format(self.id), fl=field)
--> 121         value = next(sq).__getattribute__(field)
    122         self._raw[field] = value
    123         return value

/home/gb/bin/anaconda/envs/p2/lib/python2.7/site-packages/werkzeug/utils.pyc in __get__(self, obj, type)
     69         value = obj.__dict__.get(self.__name__, _missing)
     70         if value is _missing:
---> 71             value = self.func(obj)
     72             obj.__dict__[self.__name__] = value
     73         return value

/home/gb/bin/anaconda/envs/p2/lib/python2.7/site-packages/ads-0.0.809-py2.7.egg/ads/search.pyc in pub(self)
    173     @cached_property
    174     def pub(self):
--> 175         return self._get_field('pub')
    176 
    177     @cached_property

/home/gb/bin/anaconda/envs/p2/lib/python2.7/site-packages/ads-0.0.809-py2.7.egg/ads/search.pyc in _get_field(self, field)
    117         """
    118         if not hasattr(self, "id") or self.id is None:
--> 119             raise APIResponseError("Cannot query an article without an id")
    120         sq = SearchQuery(q="id:{}".format(self.id), fl=field)
    121         value = next(sq).__getattribute__(field)

APIResponseError: 'Cannot query an article without an id'

python 3 problem

When I install ads using python 3.4 some modules are not found (see error message below). I think that the imports should be referred by their relative paths of the submodules.

Traceback (most recent call last):
  File "examples/top-cited-astronomers.py", line 7, in <module>
    import ads
  File "/home/mdevalbo/lib/python3.4/ads-0.0.808dev-py3.4.egg/ads/__init__.py", line 9, in <module>
ImportError: No module named 'network'

Connection error when building citation tree

7d5dca1 from the examples fails in building a citation tree. Output below:

[willettk@x-131-212-237-196 citation-tree]$ python citation_tree.py 
Traceback (most recent call last):
  File "citation_tree.py", line 13, in <module>
    paper.build_citation_tree(depth=2)
  File "//anaconda/lib/python2.7/site-packages/ads/core.py", line 294, in build_citation_tree
    bibcode=article.bibcode), **kwargs) for article in level]
  File "//anaconda/lib/python2.7/site-packages/ads/core.py", line 405, in search
    return query(*args, **kwargs)
  File "//anaconda/lib/python2.7/site-packages/ads/core.py", line 340, in __init__
    r = requests.get(ADS_HOST + "/search/", params=metadata_payload)
  File "//anaconda/lib/python2.7/site-packages/requests/api.py", line 69, in get
    return request('get', url, params=params, **kwargs)
  File "//anaconda/lib/python2.7/site-packages/requests/api.py", line 50, in request
    response = session.request(method=method, url=url, **kwargs)
  File "//anaconda/lib/python2.7/site-packages/requests/sessions.py", line 465, in request
    resp = self.send(prep, **send_kwargs)
  File "//anaconda/lib/python2.7/site-packages/requests/sessions.py", line 573, in send
    r = adapter.send(request, **kwargs)
  File "//anaconda/lib/python2.7/site-packages/requests/adapters.py", line 415, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', gaierror(8, 'nodename nor servname provided, or not known'))

Requesting "bibstem" always returns None

When you do

papers = ads.SearchQuery(q="supernova", sort="citation_count", fl=['id', 'bibcode', 'title', 'bibstem'])

explicitly requesting bibstem to be returned, only None is returned.

Expected Behavior

Journal abbreviation are expected, rather than None

Current Behavior

None is always returned

Steps to Reproduce (for bugs)

papers = ads.SearchQuery(q="supernova", sort="citation_count", fl=['id', 'bibcode', 'title', 'bibstem'])
for paper in papers:
    print paper.bibstem

Number of returned results doesn't correspond to 'rows' key value in SearchQuery

Example code:

In [5]: papers = ads.SearchQuery(q="supernova", sort="citation_count", rows=10)  
In [6]: print(len(list(papers)))
40

Not massively important, but a bit surprising anyway. Any explanation? Thanks!

How to acknowledge use?

How would you like the ADS Python package to be acknowledged?
Could you please spell that out in the README?

Maybe it's possible to get an entry in http://ascl.net/ or write a page or two for http://joss.theoj.org/ ?

I'm writing up an ADASS conference proceeding paper for a project where we're using this ads Python package extensively. Dor now I'm just putting a link to https://github.com/andycasey/ads . Please let me know if there's something else you prefer or if that is the best way.

search using both cursorMark and timeAllowed

Simple searches using SearchQuery return an APIResponseError org.apache.solr.common.SolrException: Can not search using both cursorMark and timeAllowed

Expected Behavior

>>> import ads


>>> papers = ads.SearchQuery(q="supernova", sort="citation_count")

>>> for paper in papers:
>>>    print(paper.title)

Current Behavior

APIResponseError: u'{"responseHeader":{"status":400,"QTime":15,"params":{"q":"supernova","fl":"id,author,first_author,bibcode,id,year,title","cursorMark":"*","sort":"citation_count desc,id desc","rows":"50","wt":"json"}},"error":{"metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"],"msg":"Can not search using both cursorMark and timeAllowed","code":400}}\n'

Your Environment

This error occurs with python 2.7 and 3.6

ads version used (hint: python -c "import ads; print(ads.__version__)")
0.12.2

Issue when trying to avoid lazy loading

The following example still results in a lazy loading warning:

import ads

papers = ads.SearchQuery(bibcode='*ApJ*', title="Erratum*", rows=5, max_pages=1, fl=['title', 'bibcode'])

with open('errata.txt', 'w') as f:
    for p in papers:
        f.write('{0} {1}\n'.format(p.bibcode, p.title[0]))

The warning is:

/Users/tom/miniconda3/envs/dev35/lib/python3.5/site-packages/ads/utils.py:23: UserWarning: You are lazy loading attributes via 'title', and so are making multiple calls to the API. This will impact your overall rate limits.

Am I using fl= incorrectly?

Internal server error in downloading articles

58a5434 raises an HTTPError; not sure which specific article it's failing at. Maybe add an error check if this article can't be found?

[willettk@x-131-212-237-196 monthly-institute-publications]$ python stromlo_publications.py 
There were 20 articles found. Sorting and downloading..
Traceback (most recent call last):
  File "stromlo_publications.py", line 50, in <module>
    [ads.retrieve_article(article, output_filename=article.bibcode + ".pdf") for article in sorted_articles]
  File "//anaconda/lib/python2.7/site-packages/ads/core.py", line 535, in retrieve_article
    arxiv_r.raise_for_status()
  File "//anaconda/lib/python2.7/site-packages/requests/models.py", line 851, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error

trunk: set sane default `fl`

fl should have sane defaults (bibcode, author, first_author,...), as the adsws-api returns only an internal integer id for the record if fl is not specified

Sphinx-y documentation

Need dem docs.

Bibtex issue

Expected Behavior

Hello,

I would like to extract the url of the ADS page associated to each entry. I got it from the bibtex entry, but bibtex does always return none when query from the field. More specifically, querying with the following:
papers = list(ads.SearchQuery(author=myname, max_pages=6,fl=['citation_count', 'bibcode', 'title', 'bibtex','year','author']))

does always provide a None for papers[0].bibtex (as well as other entries)

Am I making some error?

Thanks!

Alex

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

Version used (hint: python -c "import ads; print(ads.__version__)")
Python version (hint: python -V)

[meta] tag pip releases

andycasey/ads should start tagging releases corresponding to pip versions. This allows for more straightforward bug reports and investigations, see e.g. #76, #77.

This issue does not necessarily relate to how often pip is updated unless @andycasey want's to comment on that.

Expected Behavior

As a maintainer, I want to know which version of the code a bug/feature report stems from.

Current Behavior

We currently have no git tags, so correlating pip release with commit is extremely difficult.

Possible Solution

Every pip release should result in a git tag. Update the issues.md template file to ask for ads.__version__, which should also be updated on every pip release.

access the Physics abstracts

Hi,

Is there any way to ask the SearchQuery to be done including the Physics abstracts, like in the web form?
Thanks,
Christophe

README on PyPI not very readable

The PyPI page for ADS currently shows raw Markdown from the README: https://pypi.python.org/pypi/ads/0.12.2

See http://stackoverflow.com/questions/26737222/pypi-description-markdown-doesnt-work

Would you be OK with a README change in format to RST?
(pandoc can do it mostly automatic)

Or implement #12 and move most of the README content to docs?

Default fields used in SolrQuery

Currently, the default fields used in a SolrQuery are the following:

DEFAULT_FIELDS = ["author", "first_author", "bibcode", "id", "year",
                      "title", "reference", "citation"]

We've had several users that have run into rate limit problems, as they're requesting information that wasn't retrieved in the initial search, and are lazy loaded, for example "volume". Some ways to fix this are:

Force them to fill in the 'fl’ keyword
Extend the defafult fields
Update the help pages, and emphasise the use of ‘fl’

Option 1 means new users can simply go, and start using the article object straight away, but requires some thought.

Option 2 isn't as appealing, as such operations as 'citation' can be lengthy in terms of time, and can lead to time outs if your connection is terrible. This is a major annoyance if you simply need the bibcode.

Option 3 is the least invasive, but means they have to read the docs....

If you have no preference, we're happy to open a PR to address this.

Exception handling in Unicode representation of Articles

In the article method __unicode__(), the article properties first_author, bibcode and year are used. This can yield an exception if the fields are not included in the original search query; generally for first_author as no getter exists, or if deferred loading for bibcode and year fails.

Cf. pull request #55 for a more detailed discussion of the issue.

qmetrics: metrics also possible with query

I'm interested in the amount of papers published on a certain subject.

Just to let you know to get this working I added a function called qmetrics to my installed core.py (and line 22 and init.py):

def qmetrics(query, dates=None, database="astronomy or physics", rows=20,
    metadata=False):
    """ Retrieves metrics for a given query. """

    payload = _build_payload(query=query, database=database, rows=rows)

Only problem is that it gives an error when the return is zero. I'm currently using a try/except to catch that.

TypeError: 'module' object is not callable in top-cited-astronomers.py

I just forked the project and I am getting this error:

$ python top-cited-astronomers.py

Traceback (most recent call last):
File "top-cited-astronomers.py", line 10, in
most_cited_papers = ads.search(sort="citations", filter="database:astronomy", rows=200)

TypeError: 'module' object is not callable

Rows parameter and 500 Server Errors

Peter Weilbacher (AIP) reports:

I noticed that your example script
https://github.com/andycasey/ads/blob/master/examples/monthly-institute- publications/stromlo_publications.py
does not run. I get

$ python stromlo_publications.py
There were 20 articles found. Sorting and downloading..
Traceback (most recent call last):
File "stromlo_publications.py", line 50, in
[ads.retrieve_article(article, output_filename=article.bibcode + ".pdf") for article in sorted_articles]
File "/home/pmw/.local/lib/python2.7/site-packages/ads/core.py", line 535, in retrieve_article
arxiv_r.raise_for_status()
File "/home/pmw/.local/lib/python2.7/site-packages/requests/models.py", line 831, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error

And mentions the rows parameter is not behaving as expected. This may be due to a bug that has arisen in my code, or it may be due to the rollout of the new ADS API (which is expected around now..)

Pre-requesting BibTeX field returns None

Over the years, my BibTeX library has been slowly mangled by using Papers. I've ended up with different styles of citekeys, and a host of other problems. I decided recently to take the nuclear option by managing my BibTeX library manually. Stumbling across this project was a boon, as I realized I could write up a simple Python script to take my existing database, grab the relevant bibcodes, and generate a clean database of BibTeX entries from ADS.

See my current attempt at https://github.com/lowderchris/ads-bib.

So, with a block of prose out of the way, I'm running into an issue with requesting the BibTeX data. I loop through a list of bibcodes, querying ADS as,

q = list(ads.SearchQuery(bibcode=bc))

I can then throw these BibTeX entries to file with,

with open('bib-lib.bib','a') as bibfile:
    for paper in q:
        print(paper.bibtex, file=bibfile)

While testing with a single bibcode, I noticed that it was flagging for lazily loading attributes, notably the BibTeX attribute. Helpfully, the README for this project had just the solution, so I appended the query as,

q = list(ads.SearchQuery(bibcode=bc,fl=['author','title','bibtex']))

While running with this change, the code returned an BibTeX entry of 'None'. Without requesting the BibTeX entry beforehand, everything output as expected. I decided to ignore the warning temporarily, and run the script over a sample of about 200 bibcodes as a test. Unfortunately I'm somehow running into a 'Rate limit exceeded' issue now, even though querying with my API token returned ~4700 remaining inquiries out of 5000... So I can't debug any more at the moment (at least until midnight?), but I figured I would inquire here. Thanks in advance!

I'm using Python 3 for this bit of code.

Author field still "lazy loading" even after being explicitly added to fields

Expected Behavior

If specified in the fl= attribute, a field should not be lazy-loaded when it is accessed.

Current Behavior

The author field appears to still be lazy-loaded, even if it is passed in fl= (the warning message regarding lazy-loading still appears).

Possible Solution

It's possible the field is not being lazy-loaded even though the message indicates this is so.

Steps to Reproduce (for bugs)

Form a query with the author field specified in fl=

allpapers = ads.SearchQuery(bibcode='2011ApJ...727..107G',
            fl=['id', 'bibcode', 'author'])

Attempt to access the author field, e.g.

for paper in allpapers:
    print(paper.author)

Context

If the warning message is factual, the API limit is being reached much more quickly than it should be. Also, the number of queries to the server is being doubled, halving the speed of the script.

Your Environment

Python 3.5

Bibtex method no longer works (no attribute bibcode)

Example

papers = list(ads.SearchQuery(aff="*stromlo*"))
first_paper = papers[0]
first_paper.bibtex

Gives AttributeError: 'Article' object has no attribute 'bibcode'

Article equality operator inconsistencies and bibcode getter

In pull request #55, the following issue appeared:

The test fails for Python 3 only because of Python 2 and 3 differences in the hasattr() function used in the equality operator for articles. Since bibcode is now a deferred property, it will raise an APIResponseError as it cannot dynamically fetch the bibcode property of the LHS article in test_equals(), as that article is empty and hence has no ID set. This exception is, in contrast to Python 2, not caught by Python 3's hasattr() and propagates. The test expects a TypeError raised by the equality operator though.

I think the equality operator for articles should be fixed to behave identical for both Python 2 and 3.

Currently the test issue is resolved by explicitly setting bibcode to None, but the behaviour of the equality operator still differs between Python versions. PR #55 contains a suggested fix by @jonnybazookatone.

Migrate to adsws-api

Hi @andycasey and other contributors,

the ADS would like to update your client be compatible with the new version of our api. The docs at https://github.com/adsabs/adsabs-dev-api are still 80% accurate, but some fields are different and the authentication mechanism will be slightly different. The docs will be updated to reflect these changes.

I was planning on doing the implementation myself and issue a PR for your review. Any concerns or objections to that?

Request for author metrics search return

Hi Andy,

I'm wondering if you would be able to incorporate more parameters into the metric function in ads/core.py. Ideally this would also behave such as the ads.query function and return values based on constrains such as number of rows, start and end dates, database to use, etc.

Cheers,
Valerio

Including libraries in the API

Expected Behavior

I've been thinking about including libraries in the API, or at least deciding how it should function.

Possible Solution

Possible behaviour could be:

>>> lq = LibraryQuery()
>>> libraries = lq.execute()
>>> print(libraries)
[Library<'Library', 'Description', 'Public', 'id: f8dsuf8dsf'>),
Library<....>,
Library<....>,]

The library would have a library object, for which it would have accessible attributes:

>>> lib = libraries[0]
>>> lib.name
'Library'

The attached bibcodes would be Article objects:

>>> lib.bibcodes
[Article<"Author et al., 2012">,
Article<"Author et al. 2014">

You could also request specific libraries if you know their id:

>>> lq = LibraryQuery(id='59fier9fre')
>>> my_library = lq.execute()
Library<'Lib', 'id:59fier9fre'>

If I have time, I'd also include add/removal of bibcodes in the first release:

>>> lib.remove(['2012MNRAS'])
200, success
>>> lib.add([Article<"Author et al. 2014">])
200, success

Context

Include an API to ADS libraries that would have the following abilities:

Collect all of a users libraries with bibcodes
Query specific libraries via their id
Allow add/remove

What the first release would not include:

modification of public/private
permission handling
description/title modification
fancy handling of errors from failed add/remove

Retrieve article through url?

I usually need to grab articles from my Zotero database and get their bibtex from ADS.

Is this package able to fetch the bibtex if I pass the ads url to the article?

Big Queries supported?

Are big queries supported?
The examples are kind of blank regarding this.
I found some BIGQUERY_URL in the config.py but that's about it.

How to store Article objects locally?

I'm working on a gamma-ray data collection where we have data for a few 100 papers:
https://github.com/gammapy/gamma-cat

Now we'd like to build a custom webpage and Python client to browse and analyse that data.

As one part of information we'd like to have basic information (bibcode, title, date, authors, maybe abstract) for these few 100 papers locally available and not make queries to ADS.

My idea was that I use this Python package to do the queries and then for each Article store __dict__ in a JSON or YAML file to have it available locally and accessible from the webpage.
Is that the best way, or would it be useful enough to have a to_dict and maybe even from_dict method on Article that does the serialisation and possible clever things?

Is that legal or a violation of ADS to extract that much information and store it locally / distribute it via a Github repo / second web page (http://gamma-sky.net/)?

Response status 500: TokenStream contract violation: close() call missing

I successfully ran my query earlier today, but now, despite restarting my Jupyter notebook, I am getting a server error response:

<ipython-input-5-0a2c53b10f62> in yearly_counts(query, years, acknowledgements)
     11         papers = ads.SearchQuery(q=full_query % year,
     12                                  fq=filter_query)
---> 13         papers.execute()
     14         count = int(papers.response.numFound)
     15         total_papers = ads.SearchQuery(q=modifiers % year)

/Users/nuneziglesiasj/anaconda/envs/ana3/lib/python3.5/site-packages/ads/search.py in execute(self)
    424         """
    425         self.response = SolrResponse.load_http_response(
--> 426             self.session.get(self.HTTP_ENDPOINT, params=self.query)
    427         )
    428 

/Users/nuneziglesiasj/anaconda/envs/ana3/lib/python3.5/site-packages/ads/base.py in load_http_response(cls, HTTPResponse)
     92         """
     93         if not HTTPResponse.ok:
---> 94             raise APIResponseError(HTTPResponse.text)
     95         c = cls(HTTPResponse.text)
     96         c.response = HTTPResponse

---------------------------------------------------------------------------
APIResponseError                          Traceback (most recent call last)
<ipython-input-7-db47aabb9c53> in <module>()
      1 results = {name: combine_results([yearly_counts(query)
      2                                   for query in queries])
----> 3            for name, queries in languages.items()}

<ipython-input-7-db47aabb9c53> in <dictcomp>(.0)
      1 results = {name: combine_results([yearly_counts(query)
      2                                   for query in queries])
----> 3            for name, queries in languages.items()}

<ipython-input-7-db47aabb9c53> in <listcomp>(.0)
      1 results = {name: combine_results([yearly_counts(query)
----> 2                                   for query in queries])
      3            for name, queries in languages.items()}

<ipython-input-5-0a2c53b10f62> in yearly_counts(query, years, acknowledgements)
     11         papers = ads.SearchQuery(q=full_query % year,
     12                                  fq=filter_query)
---> 13         papers.execute()
     14         count = int(papers.response.numFound)
     15         total_papers = ads.SearchQuery(q=modifiers % year)

/Users/nuneziglesiasj/anaconda/envs/ana3/lib/python3.5/site-packages/ads/search.py in execute(self)
    424         """
    425         self.response = SolrResponse.load_http_response(
--> 426             self.session.get(self.HTTP_ENDPOINT, params=self.query)
    427         )
    428 

/Users/nuneziglesiasj/anaconda/envs/ana3/lib/python3.5/site-packages/ads/base.py in load_http_response(cls, HTTPResponse)
     92         """
     93         if not HTTPResponse.ok:
---> 94             raise APIResponseError(HTTPResponse.text)
     95         c = cls(HTTPResponse.text)
     96         c.response = HTTPResponse

APIResponseError: '{"responseHeader":{"status":500,"QTime":6,"params":{"q":"Matlab year:2005","fl":"id,author,first_author,bibcode,id,year,title","start":"0","fq":["database:astronomy","property:refereed"],"rows":"50","wt":"json"}},"error":{"msg":"java.lang.IllegalStateException: TokenStream contract violation: close() call missing","trace":"java.lang.RuntimeException: java.lang.IllegalStateException: TokenStream contract violation: close() call missing\\n\\tat org.apache.solr.search.AqpExtendedDismaxQParser$AqpExtendedSolrQueryParser.getQuery(AqpExtendedDismaxQParserPlugin.java:1320)\\n\\tat org.apache.solr.search.AqpExtendedDismaxQParser$AqpExtendedSolrQueryParser.getAliasedQuery(AqpExtendedDismaxQParserPlugin.java:1150)\\n\\tat org.apache.solr.search.AqpExtendedDismaxQParser$AqpExtendedSolrQueryParser.getQueries(AqpExtendedDismaxQParserPlugin.java:1203)\\n\\tat org.apache.solr.search.AqpExtendedDismaxQParser$AqpExtendedSolrQueryParser.getAliasedQuery(AqpExtendedDismaxQParserPlugin.java:1118)\\n\\tat org.apache.solr.search.AqpExtendedDismaxQParser$AqpExtendedSolrQueryParser.getFieldQuery(AqpExtendedDismaxQParserPlugin.java:1032)\\n\\tat org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:542)\\n\\tat org.apache.solr.parser.QueryParser.Term(QueryParser.java:299)\\n\\tat org.apache.solr.parser.QueryParser.Clause(QueryParser.java:185)\\n\\tat org.apache.solr.parser.QueryParser.Query(QueryParser.java:107)\\n\\tat org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:96)\\n\\tat org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:152)\\n\\tat org.apache.solr.search.AqpExtendedDismaxQParser.parseEscapedQuery(AqpExtendedDismaxQParserPlugin.java:286)\\n\\tat org.apache.solr.search.AqpExtendedDismaxQParser.parse(AqpExtendedDismaxQParserPlugin.java:185)\\n\\tat org.apache.solr.search.QParser.getQuery(QParser.java:141)\\n\\tat org.apache.lucene.queryparser.flexible.aqp.builders.AqpAdsabsSubQueryProvider$30.parse(AqpAdsabsSubQueryProvider.java:886)\\n\\tat org.apache.lucene.queryparser.flexible.aqp.builders.AqpSubQueryTreeBuilder.build(AqpSubQueryTreeBuilder.java:30)\\n\\tat org.apache.lucene.queryparser.flexible.aqp.builders.AqpSubQueryTreeBuilder.build(AqpSubQueryTreeBuilder.java:13)\\n\\tat org.apache.lucene.queryparser.flexible.aqp.builders.AqpFunctionQueryNodeBuilder.build(AqpFunctionQueryNodeBuilder.java:20)\\n\\tat org.apache.lucene.queryparser.flexible.aqp.builders.AqpFunctionQueryNodeBuilder.build(AqpFunctionQueryNodeBuilder.java:10)\\n\\tat org.apache.lucene.queryparser.flexible.core.builders.QueryTreeBuilder.processNode(QueryTreeBuilder.java:186)\\n\\tat org.apache.lucene.queryparser.flexible.core.builders.QueryTreeBuilder.process(QueryTreeBuilder.java:125)\\n\\tat org.apache.lucene.queryparser.flexible.core.builders.QueryTreeBuilder.process(QueryTreeBuilder.java:118)\\n\\tat org.apache.lucene.queryparser.flexible.core.builders.QueryTreeBuilder.process(QueryTreeBuilder.java:118)\\n\\tat org.apache.lucene.queryparser.flexible.core.builders.QueryTreeBuilder.build(QueryTreeBuilder.java:218)\\n\\tat org.apache.lucene.queryparser.flexible.aqp.builders.AqpQueryTreeBuilder.build(AqpQueryTreeBuilder.java:69)\\n\\tat org.apache.lucene.queryparser.flexible.aqp.builders.AqpQueryTreeBuilder.build(AqpQueryTreeBuilder.java:35)\\n\\tat org.apache.lucene.queryparser.flexible.core.QueryParserHelper.parse(QueryParserHelper.java:258)\\n\\tat org.apache.lucene.queryparser.flexible.aqp.AqpQueryParser.parse(AqpQueryParser.java:237)\\n\\tat org.apache.solr.search.AqpAdsabsQParser.parse(AqpAdsabsQParser.java:285)\\n\\tat org.apache.solr.search.QParser.getQuery(QParser.java:141)\\n\\tat org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:148)\\n\\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:197)\\n\\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\\n\\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1962)\\n\\tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)\\n\\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)\\n\\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)\\n\\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)\\n\\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)\\n\\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)\\n\\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)\\n\\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)\\n\\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)\\n\\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)\\n\\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)\\n\\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)\\n\\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)\\n\\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)\\n\\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)\\n\\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)\\n\\tat org.eclipse.jetty.server.Server.handle(Server.java:368)\\n\\tat org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)\\n\\tat org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)\\n\\tat org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)\\n\\tat org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)\\n\\tat org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)\\n\\tat org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)\\n\\tat org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628)\\n\\tat org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)\\n\\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)\\n\\tat org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)\\n\\tat java.lang.Thread.run(Thread.java:745)\\nCaused by: java.lang.IllegalStateException: TokenStream contract violation: close() call missing\\n\\tat org.apache.lucene.analysis.Tokenizer.setReader(Tokenizer.java:90)\\n\\tat org.apache.lucene.analysis.Analyzer$TokenStreamComponents.setReader(Analyzer.java:307)\\n\\tat org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:145)\\n\\tat org.apache.lucene.queryparser.flexible.aqp.processors.AqpAdsabsExpandAuthorSearchProcessor.getSynonyms(AqpAdsabsExpandAuthorSearchProcessor.java:252)\\n\\tat org.apache.lucene.queryparser.flexible.aqp.processors.AqpAdsabsExpandAuthorSearchProcessor.doExpansion(AqpAdsabsExpandAuthorSearchProcessor.java:158)\\n\\tat org.apache.lucene.queryparser.flexible.aqp.processors.AqpAdsabsExpandAuthorSearchProcessor.expandNodes(AqpAdsabsExpandAuthorSearchProcessor.java:127)\\n\\tat org.apache.lucene.queryparser.flexible.aqp.processors.AqpAdsabsExpandAuthorSearchProcessor.postProcessNode(AqpAdsabsExpandAuthorSearchProcessor.java:86)\\n\\tat org.apache.lucene.queryparser.flexible.core.processors.QueryNodeProcessorImpl.processIteration(QueryNodeProcessorImpl.java:99)\\n\\tat org.apache.lucene.queryparser.flexible.core.processors.QueryNodeProcessorImpl.process(QueryNodeProcessorImpl.java:90)\\n\\tat org.apache.lucene.queryparser.flexible.aqp.processors.AqpAdsabsExpandAuthorSearchProcessor.process(AqpAdsabsExpandAuthorSearchProcessor.java:64)\\n\\tat org.apache.lucene.queryparser.flexible.core.processors.QueryNodeProcessorPipeline.process(QueryNodeProcessorPipeline.java:90)\\n\\tat org.apache.lucene.queryparser.flexible.core.QueryParserHelper.parse(QueryParserHelper.java:255)\\n\\tat org.apache.lucene.queryparser.flexible.aqp.AqpQueryParser.parse(AqpQueryParser.java:237)\\n\\tat org.apache.solr.search.AqpAdsabsQParser.parse(AqpAdsabsQParser.java:285)\\n\\tat org.apache.solr.search.AqpExtendedDismaxQParser$AqpExtendedSolrQueryParser.getQuery(AqpExtendedDismaxQParserPlugin.java:1287)\\n\\t... 61 more\\n","code":500}}\n'

## Steps to Reproduce (for bugs)

With a valid API key in ~/.ads/dev_key, run the following notebook:
https://github.com/jni/programming-languages-in-astronomy/blob/master/programming-languages-in-ADS.ipynb

Context

Your Environment

Version used 0.12.1
Python version 3.5.2
ADS Token response:

< X-RateLimit-Limit: 5000
< X-RateLimit-Remaining: 4586
< X-RateLimit-Reset: 1477872000

Thanks!

How to get a pandas DataFrame with results?

I'd like to do some analysis with a list of articles obtained with:

articles = list(ads.SearchQuery(author='something'))

histogram of articles per year
most-cited articles
scatter plot of number of authors versus publication date

For this it would be really helpful to have articles as a pandas DataFrame.
I'm trying to get this now, but it's a bit painful with missing values as None, ...

If someone here has working examples or even a utility function to put (part of?) the info in the list of articles into a pandas.DataFrame), it would be great if you could share it.

ADS_DEV_KEY environment variable and ads.config.token

Hi,

Thanks a lot for your code, it save hours (days) of work for me and my colleagues!
It seems that once the ADS_DEV_KEY environment variable is defined, any manual change using ads.config.token is not taken into account. It would perhaps be easier when dealing with various token to have the possibility to change it from ads.config.token, rather than updating the environment variable.
Best regards,
Christophe

Large query fails to return more than 2000 articles

I don't seem to get all the articles desired when executing a large query. Example:

In [1]: import ads
In [2]: qry = ads.SearchQuery(q='pub:"Monthly Notices of the Royal Astronomical Society" pubdate:"2015"', rows=9999999)
In [3]: articles = list(qry)
In [4]: len(articles)
Out[4]: 2000

The number 2000 is suspicious, and arguably wrong because the api response says there are more:

In [5]: print(qry.response.numFound)
3140

I figured that SearchQuery.__next__() likely thought that the max_pages limit had been reached, but when I repeat the query with a high max_pages limit, I see an IndexError.

In [1]: import ads

In [2]: qry = ads.SearchQuery(q='pub:"Monthly Notices of the Royal Astronomical Society" pubdate:"2015"', rows=9999999, max_pages=99999999)

In [3]: articles = list(qry)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/home/gb/dev/ads/ads/search.py in __next__(self)
    388         try:
--> 389             cur = self._articles[self.__iter_counter]
    390             # If no more articles, check to see if we should query for the

IndexError: list index out of range

During handling of the above exception, another exception occurred:

IndexError                                Traceback (most recent call last)
<ipython-input-3-24c62670914a> in <module>()
----> 1 articles = list(qry)

/home/gb/dev/ads/ads/search.py in __next__(self)
    405             self.execute()
    406             print(self.__iter_counter)
--> 407             cur = self._articles[self.__iter_counter]
    408 
    409         self.__iter_counter += 1

IndexError: list index out of range

Can anyone reproduce?

(I am trying to count the fraction of articles that appear on arXiv as a function of journal and year, I hope that's ok.)

Incorporating highlights into the API

The search engine has the capability of returning highlighted pieces of text for searches, for example:

q='abstract:"Gamma-ray burst"'

when requested, Solr will return the relevant highlighted text that resulted in the document:

  "highlighting": {
    "401713": {
      "abstract": [
        "The hypothesis on the <em>γ-ray burst</em> generation in the process of the collapse of surpermassive bodies"
      ]
    },
 ....
 }

This form is highlights: {"id": ["highlights requested, abstract, title, etc."]}. There are a few users that have requested access to this.

Proposed API

The highlights are query dependent, and so my first thought is to keep them connected to the SolrQuery class, and not within the Article, as then the Article class will have state related to its parent query, which it has no concept of. So you could foresee something as simple as:

class SearchQuery(object):
    def __init__(self):
        self._highlights = {}
    def __next___():
        self._highlights = response['response']['highlights']
    def highlights(self, article):
        return self._highlights.get(article.id, None)

and then you would access it via the API as:

>>> q = ads.SearchQuery(q='star', hl=True, hl_fl=['abstract'])
>>> p = list(q)
>>> a = p[0]
>>>
>>> q.highlights(a)
["The hypothesis on the <em>γ-ray burst</em> generation in the process of the collapse of surpermassive bodies"]
>>> 
>>> for article in p:
>>>    print 'bibcode: {}, query:{}, highlights: {}'.format(article.bibcode, q.query, q.highlights(article))

Alternative options are welcome, such as a highlights class that is filled and attached to the SearchQuery class., or something else smarter that retains the above prerequisites.

Issues with Article class

Just as an FYI. It would be weird to have something like:

>>> q = ads.SearchQuery(q='star', hl=True, hl_fl=['abstract'])
>>> p = list(q)
>>> p[0].highlights
>>> ["The hypothesis on the <em>γ-ray burst</em> generation in the process of the collapse of surpermassive bodies"]

as this article class could have many highlights depending on the query was, so you'd have to keep track of query and article.

Article.volume don't save None value and have to reconnect to ADS

Hi,

I found some very strange behavior. The following code illustrate the problem:
https://github.com/Morisset/SNI_ads/blob/master/test_get_volume.py
I think the problem comes from werkzeug.utils.cached_property which does not receive a correct value the first time the call to ADS is done. It should store None in the dict, but it stores nothing. Then it has to reconnect to ADS to obtain the correct value (not sure this is clear...).
Ch.

Use SOLR cursors for pagination of large set of results

The search engine behind the ADS API is SOLR, which has performance problem with queries that require "deep pagination," i.e. the retrieval of records way down the list of results. The issues related to the degraded performance are explained here: https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results

To improve the situation, SOLR implements the notion of cursors (also explained in the page above) which mitigate the pagination problem for queries generating a large set of results which need retrieving. I believe an implementation of multi-page retrieval based on cursors would be more robust for a few reasons:

better performance: avoids recomputing long result list just to fetch items way down in the list (which could generate a timeout)
consistent results: if the index is modified during the follow-up queries, some results might be duplicated or skipped using the current approach, whereas this should not happen with the user of cursors

I tested this approach against our search engine and it seems to work but does require sort to include the unique key id as a tie-breaker when creating a list of result (e.g. sort=date asc,id asc).

Accented characters in author names not encoded properly

Expected Behavior

Author names should have special characters encoded properly (e.g. accented characters like á, ñ, etc.).

Current Behavior

Author names are returned with characters mangled, e.g. "Hรถflich, Peter", "MĂŠndez, J.", "Gonzรกlez-Gaitรกn, S."

Possible Solution

It might be possible to post-process strings to encode them properly, but it's not clear what the encoding is currently.

Steps to Reproduce (for bugs)

Retrieve the author field for a query where the author name contains an accent, an example publication for this is "2011ApJ...727..107G"

Context

Author names should have the proper accents.

Your Environment

Python 3.5

Get Bibtex

When working with LaTeX you'll need the Bibtex snippet that NASA/ADS provides. Is there a command to obtain it already built in? If not, could it be added?

limit-request-line parameter

Hi
I am running the following query to get citations of a large list of papers

# list_of_ids looks like this ['arXiv:astro-ph/0206283', 'arXiv:1511.04236']
ads_results = ads.SearchQuery(q=' OR '.join(list_of_ids), fl=['citation_count', 'identifier'])
ads_results.max_pages = 100
for ads_item in ads_results:
    print ads_item.identifier, ads_item.citation_count

This all works, but if I make the list of papers very large, like 200 papers or so I am getting the following error

Traceback (most recent call last):
...
raise APIResponseError(HTTPResponse.text)
ads.exceptions.APIResponseError: u'\n \n <title>Bad Request</title>\n \n \n

Bad Request

\n Request Line is too large (4607 > 4094)\n \n\n'

which google tells me is caused by the --limit-request-line flag for request. Is there a way to change this flag in this package?
thanks

SearchQuery choking on certain ArXiv queries

Expected Behavior

A valid bibcode passed to SearchQuery should return an Article object.

Current Behavior

Certain publications, in particular those with ArXiv bibcodes, seem to return nothing.

Steps to Reproduce (for bugs)

import os
import ads

path = 'ads.key'
if os.path.isfile(path):
        with open(path, 'r') as f:
                    ads.config.token = f.read().splitlines()[0]

q = list(ads.SearchQuery(bibcode='2014ApJ...793...38A'))
print(q)

returns an article object, as expected, but change the query to the following bibcode:

q = list(ads.SearchQuery(bibcode='2016arXiv160902927L'))

and an empty list is returned.

Context

This prevents a large number of ADS queries from successfully completing (15 out of 72 that I tested).

Your Environment

Python version 3.5

andycasey / ads Goto Github PK

ads's Introduction

A Python Module to Interact with NASA's ADS that Doesn't Suck™

Quickstart

Running Tests

ads's People

Stargazers

Watchers

Forkers

ads's Issues

Expected Behavior

Current Behavior

Your Environment

Steps to Reproduce (for bugs)

Your Environment

Expected Behavior

Current Behavior

Steps to Reproduce (for bugs)

Expected Behavior

Current Behavior

Your Environment

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

Expected Behavior

Current Behavior

Possible Solution

TypeError: 'module' object is not callable

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

Expected Behavior

Possible Solution

Context

Context

Your Environment

Proposed API

Issues with Article class

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

Bad Request

Expected Behavior

Current Behavior

Steps to Reproduce (for bugs)

Context

Your Environment

Recommend Projects

Recommend Topics

Recommend Org

Jobs