GithubHelp home page GithubHelp logo

requests-cache / requests-cache Goto Github PK

View Code? Open in Web Editor NEW
1.3K 15.0 139.0 4.42 MB

Persistent HTTP cache for python requests

Home Page: https://requests-cache.readthedocs.io

License: BSD 2-Clause "Simplified" License

Python 100.00%
cache dynamodb http mongodb performance redis requests sqlite web webscraping

requests-cache's Introduction

Build Codecov Documentation Code Shelter

PyPI Conda PyPI - Python Versions PyPI - Downloads

Summary

requests-cache is a persistent HTTP cache that provides an easy way to get better performance with the python requests library.

Complete project documentation can be found at requests-cache.readthedocs.io.

Features

  • 🍰 Ease of use: Keep using the requests library you're already familiar with. Add caching with a drop-in replacement for requests.Session, or install globally to add transparent caching to all requests functions.
  • 🚀 Performance: Get sub-millisecond response times for cached responses. When they expire, you still save time with conditional requests.
  • 💾 Persistence: Works with several storage backends including SQLite, Redis, MongoDB, and DynamoDB; or save responses as plain JSON files, YAML, and more
  • 🕗 Expiration: Use Cache-Control and other standard HTTP headers, define your own expiration schedule, keep your cache clutter-free with backends that natively support TTL, or any combination of strategies
  • ⚙️ Customization: Works out of the box with zero config, but with a robust set of features for configuring and extending the library to suit your needs
  • 🧩 Compatibility: Can be combined with other popular libraries based on requests

Quickstart

First, install with pip:

pip install requests-cache

Then, use requests_cache.CachedSession to make your requests. It behaves like a normal requests.Session, but with caching behavior.

To illustrate, we'll call an endpoint that adds a delay of 1 second, simulating a slow or rate-limited website.

This takes 1 minute:

import requests

session = requests.Session()
for i in range(60):
    session.get('https://httpbin.org/delay/1')

This takes 1 second:

import requests_cache

session = requests_cache.CachedSession('demo_cache')
for i in range(60):
    session.get('https://httpbin.org/delay/1')

With caching, the response will be fetched once, saved to demo_cache.sqlite, and subsequent requests will return the cached response near-instantly.

Patching

If you don't want to manage a session object, or just want to quickly test it out in your application without modifying any code, requests-cache can also be installed globally, and all requests will be transparently cached:

import requests
import requests_cache

requests_cache.install_cache('demo_cache')
requests.get('https://httpbin.org/delay/1')

Headers and Expiration

By default, requests-cache will keep cached responses indefinitely. In most cases, you will want to use one of the two following strategies to balance cache freshness and performance:

Define exactly how long to keep responses:

Use the expire_after parameter to set a fixed expiration time for all new responses:

from requests_cache import CachedSession
from datetime import timedelta

# Keep responses for 360 seconds
session = CachedSession('demo_cache', expire_after=360)

# Or use timedelta objects to specify other units of time
session = CachedSession('demo_cache', expire_after=timedelta(hours=1))

See Expiration for more features and settings.

Use Cache-Control headers:

Use the cache_control parameter to enable automatic expiration based on Cache-Control and other standard HTTP headers sent by the server:

from requests_cache import CachedSession

session = CachedSession('demo_cache', cache_control=True)

See Cache Headers for more details.

Settings

The default settings work well for most use cases, but there are plenty of ways to customize caching behavior when needed. Here is a quick example of some of the options available:

from datetime import timedelta
from requests_cache import CachedSession

session = CachedSession(
    'demo_cache',
    use_cache_dir=True,                # Save files in the default user cache dir
    cache_control=True,                # Use Cache-Control response headers for expiration, if available
    expire_after=timedelta(days=1),    # Otherwise expire responses after one day
    allowable_codes=[200, 400],        # Cache 400 responses as a solemn reminder of your failures
    allowable_methods=['GET', 'POST'], # Cache whatever HTTP methods you want
    ignored_parameters=['api_key'],    # Don't match this request param, and redact if from the cache
    match_headers=['Accept-Language'], # Cache a different response per language
    stale_if_error=True,               # In case of request errors, use stale cache data if possible
)

Next Steps

To find out more about what you can do with requests-cache, see:

requests-cache's People

Contributors

aaron-mf1 avatar andrewkittredge avatar borisdan avatar cenviity avatar chengguangnan avatar christopher-dg avatar craigls avatar dependabot[bot] avatar edgarrmondragon avatar fdemmer avatar glensc avatar jkwill87 avatar jsemric avatar jwcook avatar libbkmz avatar mbarkhau avatar meowcoder avatar mgax avatar mgedmin avatar michaelbeaumont avatar olivierdalang avatar olk-m avatar parkerhancock avatar reclosedev avatar rgant avatar shiftinv avatar thatguystone avatar themiurgo avatar underyx avatar yetanothernerd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

requests-cache's Issues

Redis storage

I think Redis storage for this cache would be great. I am willing to write one in the future, but I'll probably wait for #7 to be resolved first. I am filling this issue also as a reminder for myself / opportunity for others to express support for this feature / opportunity for others to implement it & contribute faster than me ;-)

Do not invalidate cache if update fails

Hi,

I started using your great library and had an idea. Right now it seems that when the cache of a request is found expired, we immediately delete it and try to update it right away, but this could eventually fail, leaving us with no data at all. Would it be possible to only delete the existing cache if the update is successful? This way we could still rely on older data.

Looking at the code in core.py lines 103-104, this looks like a trivial change. This behavior could even be optional.

What do you think?

Ignore some parameters

In my project I need a cache that ignores some parameters and hits even if those are different from a previous request. In my specific case, the parameter to be ignored is an api_key of an external service (requests with different api_keys return the same answer).

I am forking and adding this feature. Let me know if you'd like to see this upstream and I'll make a pull-request.

Thread safety for sqlite backend

Following exception is raised when request_cache with sqlite backend is used in multithreaded environment:

ProgrammingError: SQLite objects created in a thread can only be used in that same thread....

expire_after parameter, Timestamp and Python 3 (vs Py2)

Hello,

I noticed something interesting but I can't say if it's a Pandas bug, a Numpy bug (about timedelta64) or a requests-cache bug:

import pandas as pd
td=pd.to_timedelta('00:00:15')
import requests_cache
session = requests_cache.CachedSession(cache_name='cache', expire_after=td)

works fine with Python 2 but with Python 3, it raises :

TypeError: unsupported type for timedelta seconds component: numpy.timedelta64

see pandas-dev/pandas#9353

Kind regards

Caching POST-requests does not take into account the POST-params

I'm using a webservice where there are no GET-request, only POSTs. the situation is that for the same payload, you get the same results back. It would be very handy to be able to cache these POST-requests,
This seems already supported by allowable_methods=('GET', 'POST'), but the POST-params are not taken into account.

Incompatible with requests > 2.4.1 due to new json param

This PR seems to have broken compatibility - https://github.com/kennethreitz/requests/pull/2258

  File "/Users/gkisel/.virtualenvs/my_venv/lib/python2.7/site-packages/requests/api.py", line 99, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/Users/gkisel/.virtualenvs/my_venv/lib/python2.7/site-packages/requests/api.py", line 49, in request
    response = session.request(method=method, url=url, **kwargs)
TypeError: request() got an unexpected keyword argument 'json'

Conditionally cache responses?

Hello,

I'm excited to find the requests-cache module! It's (almost) exactly what I've been looking for.

I have a sort of special use case for this and I'm not quite sure how to proceed.

I'm developing a client for a web service. In the event that my client should throw an exception I would like to associate the exception data, any logs I've accumulated and the cached response together into the sqlite database for debugging and unit testing purposes. I do not want to cache the responses for normal operation because the data is very volatile.

Essentially, I'm only trying to reproduce the environment in which the exception was thrown for unit testing and I think that requests-cache can do 90% of this work for me.

I'm considering doing the following:

  1. Sub-class CachedSession to only store responses but never retrieve them from the cache for normal operation.
  2. Sub-class one of the back-ends to store the extra fields that I need.

Is there an easier, or better way to do this? Any advice would be greatly appreciated.

Thank you.

Caching a query ignoring params order

Currently, the following query has to be cached twice due to python's non-deterministic dict ordering: requests.get("http://example.com/", params={"max": 10, "start": 0})

How can I get around this in requests-cache? I have very large queries with half a dozen params and I'm doing hundreds of them; every time I run my tests again, everything gets cached again and the db contains massive duplicated data.

Requests API changed

When using with requests==1.0.3, I got an error on import time:

...
import requests_cache
  File "/home/.../local/lib/python2.7/site-packages/requests_cache/__init__.py", line 28, in <module>
    from requests_cache.core import(
  File "/home/.../local/lib/python2.7/site-packages/requests_cache/core.py", line 22, in <module>
    _original_request_send = Request.send
AttributeError: type object 'Request' has no attribute 'send'

As Requests are now in version 1 and that means API should not change any more, I think requests-cache can be safely updated for the future. It is even possible that the API changed in such way monkey patching is not necessary any more, but I did not explored it in detail.


Anyway, this is a problem that could (and should) be fixed immediately at least by pinning install_requires=['requests'] in setup.py to the last version of Requests which works with requests-cache smoothly.

Using mongodb gridfs as a backend?

Mongo GridFS supports documents that are more than 16MB.

Here are some code snippets.

        from pymongo import MongoClient
        db = MongoClient().gridfs
        fs = gridfs.GridFS(db)
fs.put(resp.content, url=doc['url'], status_code=resp.status_code)
fs.find_one({'url': doc['url'], 'status_code': 200})

Python 3.4 Not Working

Does this package work on 3.4? I am trying it and it seems to not be caching anything when I use python 3.4

No expiry date on requests and no access/update times in the DB

Without an expiry date or last access/update time in the DB, I can perform any house keeping. I have a sqlite DB that is growing infinitely, with void requests being retained forever. Can we have a column added to the DB that records when the entry was accessed or when it will expire, updating it each time the data is loaded from cache? I could then setup a house keeping cron to discard invalid request items from the cache.

expire_after parameter should also accept datetime.timedelta

Hello,

it will be nice if expire_after parameter could also accept datetime.timedelta
(instead of seconds).

import requests
import requests_cache
import datetime
requests_cache.install_cache('cache', backend='sqlite', expire_after=datetime.timedelta(hours=1))
request.get('http://www.google.fr')
request.get('http://www.google.fr')

You should modify core.py

    if self._cache_expire_after is not None:
        difference = datetime.utcnow() - timestamp
        if difference > timedelta(seconds=self._cache_expire_after):
            self.cache.delete(cache_key)
            return send_request_and_cache_response()

to

    if self._cache_expire_after is not None:
        difference = datetime.utcnow() - timestamp
        if difference > self._cache_expire_after:
            self.cache.delete(cache_key)
            return send_request_and_cache_response()

assuming (that self._cache_expire_after is a timedelta)

and

class CachedSession(OriginalSession):
    def __init__(self, ...):
        ...
        self._cache_expire_after = expire_after
        ...

to

class CachedSession(OriginalSession):
    def __init__(self, ...):
        ...
        try:
            self._cache_expire_after = timedelta(seconds=expire_after)
        except:
            self._cache_expire_after = expire_after
        ...

Kind regards

simplify HTTPBIN_URL definition

Hi,
in tests/test_cache.py this can be applied to simplify HTTPBIN_URL definition:

-if 'HTTPBIN_URL' not in os.environ:
-    os.environ['HTTPBIN_URL'] = 'http://httpbin.org/'
-
-HTTPBIN_URL = os.environ.get('HTTPBIN_URL')
+HTTPBIN_URL = os.getenv('HTTPBIN_URL', 'http://httpbin.org/')

Add __repr__ (and/or __str__) method

Hello,

it will be nice to provide a __repr__ (and/or __str__) method (at least for CachedSession)
for example it should display expire_after, backend, cache_name in a "human" way.

it will help when dealing with requests-cache in IPython notebooks

Kind regards

Something breaks my cache file, then exceptions are encountered

I haven't yet worked out what triggers this, but from time to time my my sqlite cache appears to break, after which all calls to requests.get fail with the exception below. Deleting the cache file obviously fixes the problem but that's not ideal! Have you seen this before?

Traceback (most recent call last):
  File "tool/main.py", line 188, in <module>
    main()
  File "tool/main.py", line 62, in main
    for xls_url in get_excel_urls(download_url(INDEX_URL)):
  File "tool/main.py", line 101, in download_url
    response = requests.get(url)
  File "/home/venv/local/lib/python2.7/site-packages/requests/api.py", line 55, in get
    return request('get', url, **kwargs)
  File "/home/venv/local/lib/python2.7/site-packages/requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/venv/local/lib/python2.7/site-packages/requests_cache/core.py", line 111, in request
    hooks, stream, verify, cert)
  File "/home/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 335, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/venv/local/lib/python2.7/site-packages/requests_cache/core.py", line 89, in send
    response, timestamp = self.cache.get_response_and_time(cache_key)
  File "/home/venv/local/lib/python2.7/site-packages/requests_cache/backends/base.py", line 64, in get_response_and_time
    if key not in self.responses:
  File "/home/venv/lib/python2.7/_abcoll.py", line 348, in __contains__
    self[key]
  File "/home/venv/local/lib/python2.7/site-packages/requests_cache/backends/storage/dbdict.py", line 171, in __getitem__
    return pickle.loads(bytes(super(DbPickleDict, self).__getitem__(key)))
  File "/home/venv/lib/python2.7/copy_reg.py", line 50, in _reconstructor
    obj = base.__new__(cls, state)
TypeError: ('dict.__new__(CaseInsensitiveDict): CaseInsensitiveDict is not a subtype of dict', <function _reconstructor at 0x7fe49d3039b0>, (<class 'requests.structures.CaseInsensitiveDict'>, <type 'dict'>, {'Accept-Encoding': 'gzip, deflate, compress', 'Accept': '*/*', 'User-Agent': 'python-requests/1.2.0 CPython/2.7.4 Linux/3.8.0-23-generic'}))

Regression: 0.4.1 appears to blow up when caching compressed objects

I've verified that this bug doesn't occur in 0.4.0.

Versions from pip freeze:

requests==1.2.3
requests-cache==0.4.1

Code to reproduce:

import requests
import requests_cache
requests_cache.install_cache()
requests.get('http://www.eurococoa.com/en/x/271/latest-stats')

Exception encountered:

Traceback (most recent call last):
  File "tool/main.py", line 156, in <module>
    sys.exit(main())
  File "tool/main.py", line 56, in main
    pdf_url = extract_pdf_url(download_url(INDEX_URL))
  File "tool/main.py", line 93, in download_url
    response = requests.get(url)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 55, in get
    return request('get', url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests_cache/core.py", line 111, in request
    hooks, stream, verify, cert)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 335, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests_cache/core.py", line 91, in send
    return send_request_and_cache_response()
  File "/usr/local/lib/python2.7/dist-packages/requests_cache/core.py", line 85, in send_request_and_cache_response
    self.cache.save_response(cache_key, response)
  File "/usr/local/lib/python2.7/dist-packages/requests_cache/backends/base.py", line 40, in save_response
    self.responses[key] = self.reduce_response(response), datetime.utcnow()
  File "/usr/local/lib/python2.7/dist-packages/requests_cache/backends/storage/dbdict.py", line 168, in __setitem__
    sqlite.Binary(pickle.dumps(item)))
cPickle.UnpickleableError: Cannot pickle <type 'zlib.Decompress'> objects

cPickle.UnpickleableError since 0.4.1

Since 0.4.1, I hit this error:

Traceback (most recent call last):
File "/usr/bin/howdoi-python2.7", line 9, in
load_entry_point('howdoi==1.1.4', 'console_scripts', 'howdoi')()
File "/usr/lib64/python2.7/site-packages/howdoi/howdoi.py", line 217, in command_line_runner
print(howdoi(args).encode('utf-8', 'ignore'))
File "/usr/lib64/python2.7/site-packages/howdoi/howdoi.py", line 176, in howdoi
return get_instructions(args) or 'Sorry, couldn't find any help with that topic\n'
File "/usr/lib64/python2.7/site-packages/howdoi/howdoi.py", line 143, in get_instructions
links = get_links(args['query'])
File "/usr/lib64/python2.7/site-packages/howdoi/howdoi.py", line 68, in get_links
result = get_result(url)
File "/usr/lib64/python2.7/site-packages/howdoi/howdoi.py", line 59, in get_result
return requests.get(url, headers={'User-Agent': random.choice(USER_AGENTS)}, proxies=get_proxies()).text
File "/usr/lib64/python2.7/site-packages/requests/api.py", line 55, in get
return request('get', url, *_kwargs)
File "/usr/lib64/python2.7/site-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, *_kwargs)
File "/usr/lib64/python2.7/site-packages/requests_cache/core.py", line 111, in request
hooks, stream, verify, cert)
File "/usr/lib64/python2.7/site-packages/requests/sessions.py", line 335, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib64/python2.7/site-packages/requests_cache/core.py", line 91, in send
return send_request_and_cache_response()
File "/usr/lib64/python2.7/site-packages/requests_cache/core.py", line 85, in send_request_and_cache_response
self.cache.save_response(cache_key, response)
File "/usr/lib64/python2.7/site-packages/requests_cache/backends/base.py", line 40, in save_response
self.responses[key] = self.reduce_response(response), datetime.utcnow()
File "/usr/lib64/python2.7/site-packages/requests_cache/backends/storage/dbdict.py", line 168, in setitem
sqlite.Binary(pickle.dumps(item)))
cPickle.UnpickleableError: Cannot pickle <type 'zlib.Decompress'> objects

The howdoi developers asked me to report this upstream.

Forcibly refresh existing request in cache

Using requests_cache to access a website, which gives me a transient "We're down for scheduled maintenance!" message, which is now in the cache.

I'd like to retry if I get that message - which I could do by turning the cache off - but that doesn't fix the cache.

So I'd like to be able to say to requests-cache: "The cache is wrong: fetch a fresh version and cache that."

I suspect the best way of doing this is to pass a parameter to request (core:105) and then if that's true, send (around about core:90ish) should just return send_request_and_cache_response.

I don't feel confident that this approach wouldn't have subtle issues, so I'm mostly wondering whether this sounds like a reasonable approach?

module object has no attribute _RawStore

Hi !
I was hit by this issue that I solved by deleting my caches :/ I'd like to know if there's a way to avoid that.
I was refactoring my code, moving a module up to another one. Suddenly when I ran my scripts I got:

File "/home/vince/.virtualenvs/abelujo/local/lib/python2.7/site-packages/requests/api.py", line 50, in request
    response = session.request(method=method, url=url, **kwargs)
  File "/home/vince/.virtualenvs/abelujo/local/lib/python2.7/site-packages/requests_cache/core.py", line 111, in request
    hooks, stream, verify, cert)
  File "/home/vince/.virtualenvs/abelujo/local/lib/python2.7/site-packages/requests/sessions.py", line 465, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/vince/.virtualenvs/abelujo/local/lib/python2.7/site-packages/requests_cache/core.py", line 89, in send
    response, timestamp = self.cache.get_response_and_time(cache_key)
  File "/home/vince/.virtualenvs/abelujo/local/lib/python2.7/site-packages/requests_cache/backends/base.py", line 64, in get_response_and_time
    if key not in self.responses:
  File "/home/vince/.virtualenvs/abelujo/lib/python2.7/_abcoll.py", line 388, in __contains__
    self[key]
  File "/home/vince/.virtualenvs/abelujo/local/lib/python2.7/site-packages/requests_cache/backends/storage/dbdict.py", line 171, in __getitem__
    return pickle.loads(bytes(super(DbPickleDict, self).__getitem__(key)))
AttributeError: 'module' object has no attribute '_RawStore'

That problem goes away if I delete my cache databases. It looks like the database had information about a module or about a path and didn't find it back when unpickling an object. Fortunately they're not vital but still, I'd like to keep them. Do you have an idea of why this pb occured, and how I could keep my cached content ?

Regards,

requests.post()

Hello,

For some reasons when I init the request_cache plugin, the following request fails:

requests_cache.install_cache(cache_name='seo_cache', backend='sqlite', expire_after=86400)
requests.post(url, json=data, headers=headers)

If I comment the first line, it works great... Any idea on how to fix this ?

Thanks!

Here is the error log:

2016-07-13 07:21:12 r = requests.post(URL, json=data, headers=headers)
2016-07-13 07:21:12   File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 94, in post
2016-07-13 07:21:12     
2016-07-13 07:21:12 return request('post', url, data=data, json=json, **kwargs)
2016-07-13 07:21:12   File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 49, in request
2016-07-13 07:21:12     
2016-07-13 07:21:12 return session.request(method=method, url=url, **kwargs)
2016-07-13 07:21:12 TypeError
2016-07-13 07:21:12 : 
2016-07-13 07:21:12 request() got an unexpected keyword argument 'json'

Take into account content negotiation with GET-requests

POST-headers are apparently taken into account when caching requests, but not for GET-requests.

In the case of content negotiation with GET-requests, it would be useful if the cache could take into account at least the Accept-header, allowing to case responses in different formats for the same URI.

Default Cache

Running requests on a machine without sqllite properly installed:

Python 2.7.5 (default, Jun 24 2013, 23:19:27)
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests, requests_cache
>>> requests.get('http://www.google.com')
<Response [200]>
>>> requests_cache.install_cache('/tmp/requests_cache')
>>> requests.get('http://www.google.com')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 55, in get
    return request('get', url, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 43, in request
    session = sessions.Session()
  File "/usr/local/lib/python2.7/site-packages/requests_cache/core.py", line 155, in <lambda>
    **backend_options)
  File "/usr/local/lib/python2.7/site-packages/requests_cache/core.py", line 63, in __init__
    (backend, ', '.join(backends.registry.keys())))
ValueError: Unsupported backend "sqllite" try one of: memory

From what I understand, 'sqllite' is being used as the default:

def install_cache(cache_name='cache', backend='sqlite', ...

However, sqllite is not guaranteed to be installed:

try:
    # Heroku doesn't allow the SQLite3 module to be installed
    from .sqlite import DbCache
    registry['sqlite'] = DbCache
except ImportError:
    DbCache = None

Couple ways to move forward:

  1. make memory default
  2. error out earlier when they specify an unsupported backend (as you can see from the example, it errors out on the request call, not the install_cache)
  3. other thoughts?

Can't create a backend using a backend instance.

According to the documentation, I should be able to pass an instance of a custom backend to install_cache (http://requests-cache.readthedocs.org/en/latest/user_guide.html#persistence).

However, install_cache calls backends.create_backend which only seems to support the registered names, so a ValueError is thrown:

def create_backend(backend_name, cache_name, options):
    if backend_name is None:
        backend_name = _get_default_backend_name()
    try:
        return registry[backend_name](cache_name, **options)
    except KeyError:
        raise ValueError('Unsupported backend "%s" try one of: %s' %
                         (backend_name, ', '.join(registry.keys())))

Am I missing something?

Request throttling example is broken in Python 3

I added () to the print statement in the throttling example here:
https://requests-cache.readthedocs.org/en/latest/user_guide.html#usage

(see https://gist.github.com/danvk/f8c4d9d5cf9fde627ebb)

I'm getting this error:

$ python cache-issue.py
Traceback (most recent call last):
  File "cache-issue.py", line 23, in <module>
    s.get('http://httpbin.org/delay/get')
  File "/Users/danvk/.virtualenvs/osm-segments/lib/python3.5/site-packages/requests/sessions.py", line 480, in get
    return self.request('GET', url, **kwargs)
  File "/Users/danvk/.virtualenvs/osm-segments/lib/python3.5/site-packages/requests_cache/core.py", line 126, in request
    **kwargs
  File "/Users/danvk/.virtualenvs/osm-segments/lib/python3.5/site-packages/requests/sessions.py", line 468, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/danvk/.virtualenvs/osm-segments/lib/python3.5/site-packages/requests_cache/core.py", line 99, in send
    return send_request_and_cache_response()
  File "/Users/danvk/.virtualenvs/osm-segments/lib/python3.5/site-packages/requests_cache/core.py", line 91, in send_request_and_cache_response
    response = super(CachedSession, self).send(request, **kwargs)
  File "/Users/danvk/.virtualenvs/osm-segments/lib/python3.5/site-packages/requests/sessions.py", line 582, in send
    r = dispatch_hook('response', hooks, r, **kwargs)
  File "/Users/danvk/.virtualenvs/osm-segments/lib/python3.5/site-packages/requests/hooks.py", line 31, in dispatch_hook
    _hook_data = hook(hook_data, **kwargs)
TypeError: hook() got an unexpected keyword argument 'proxies'

I'm using requests 2.9.1 and requests_cache 0.4.12.

Make redislite optional

Hi,

would it be possible to make the dependency on redislite optional ?
It be can be difficult to compile on some platforms.

TypeError: HTTPHeaderDict is not a subtype of dict

It appears that some time in the last couple of months something changed within the Python/Requests/Requests-Cache stack which caused this error to be thrown for all of my apps which use requests-cache and which had caches which were created before the change.

Deleting all the caches caused the problems to go away, but cleaner handling would be nice. If the cache can't be made robust to changes to the underlying stack, perhaps at least a better error message could be given.

At a glance it appears that the problem is likely related to this PR & commit for urllib3 which changed the superclass of HTTPHeaderDict from dict to MutableMapping:
urllib3/urllib3#679
urllib3/urllib3@64adf9f
but it's not clear to me if the breaking change that they made was to a public API and whether it was avoidable, so I'm just going to dump my report here and let the two projects hash it out.

  File "/Users/tfmorris/anaconda/lib/python2.7/site-packages/requests/api.py", line 69, in get
    return request('get', url, params=params, **kwargs)
  File "/Users/tfmorris/anaconda/lib/python2.7/site-packages/requests/api.py", line 50, in request
    response = session.request(method=method, url=url, **kwargs)
  File "/Users/tfmorris/anaconda/lib/python2.7/site-packages/requests_cache/core.py", line 128, in request
    **kwargs
  File "/Users/tfmorris/anaconda/lib/python2.7/site-packages/requests/sessions.py", line 468, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/tfmorris/anaconda/lib/python2.7/site-packages/requests_cache/core.py", line 99, in send
    response, timestamp = self.cache.get_response_and_time(cache_key)
  File "/Users/tfmorris/anaconda/lib/python2.7/site-packages/requests_cache/backends/base.py", line 69, in get_response_and_time
    if key not in self.responses:
  File "/Users/tfmorris/anaconda/lib/python2.7/_abcoll.py", line 388, in __contains__
    self[key]
  File "/Users/tfmorris/anaconda/lib/python2.7/site-packages/requests_cache/backends/storage/dbdict.py", line 163, in __getitem__
    return pickle.loads(bytes(super(DbPickleDict, self).__getitem__(key)))
  File "/Users/tfmorris/anaconda/lib/python2.7/copy_reg.py", line 50, in _reconstructor
    obj = base.__new__(cls, state)
TypeError: ('dict.__new__(HTTPHeaderDict): HTTPHeaderDict is not a subtype of dict', <function _reconstructor at 0x10073da28>, (<class 'requests.packages.urllib3._collections.HTTPHeaderDict'>, <type 'dict'>, {'access-control-max-age': ('Access-Control-Max-Age', '86400'), 'transfer-encoding': ('Transfer-Encoding', 'chunked'), 'access-control-allow-method': ('Access-Control-Allow-Method', 'GET, OPTIONS'), 'server': ('Server', 'nginx/1.1.19'), 'x-ol-stats': ('X-OL-Stats', '"IB 1 0.005 TT 0 0.005"'), 'connection': ('Connection', 'keep-alive'), 'date': ('Date', 'Thu, 03 Sep 2015 23:36:25 GMT'), 'access-control-allow-origin': ('Access-Control-Allow-Origin', '*'), 'content-type': ('Content-Type', 'application/json')}))

Race condition sqlite dbdict saving

When stress testing my site I found a small race condition. I found some errors in my log IntegrityError: column key is not unique. This happens when some concurrent users request the same url.

I fixed it with this monkey patch for now.

def fixes_race_condition__setitem__(self, key, item):
    """
    Monkey patch to prevent a race condition where for two keys are tried to save at the same time
    """
    with self.connection(True) as con:
        if con.execute("select key from `%s` where key=?" %
                       self.table_name, (key,)).fetchone():
            con.execute("update `%s` set value=? where key=?" %
                        self.table_name, (item, key))
        else:
            try:
                con.execute("insert into `%s` (key,value) values (?,?)" %
                            self.table_name, (key, item))
            except IntegrityError, e:
                # Will only happen when at the exact same time two keys are saved, so we are save to just skip this
                pass

requests_cache.backends.storage.dbdict.DbDict.__setitem__ = fixes_race_condition__setitem__

Choosing home folder for cache

Hi there!!

This isn't much of an issue as I am making do by simply giving the cache files unique names, but I am working on a server that manages sessions for multiple uses and so I'd like to store the cache files in their own user directories. I wonder if this is manageable.

edit: I'm using sqlite

Thank you

SQLAlchemy support / MySQL, PostgreSQL, Oracle, Microsoft SQL Server... support

Hello,

it will be nice if a connection to SQLAlchemy (engine) could be pass to requests-cache (maybe using backend argument). cache_name could be the name of the table.
see http://docs.sqlalchemy.org/en/rel_0_9/core/engines.html

It will provide more DB support (MySQL, PostgreSQL, Oracle, Microsoft SQL Server...)

Pandas is doing something similar:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.sql.read_sql.html

APSScheduler also
https://apscheduler.readthedocs.org/en/latest/

Kind regards

testing redis and mongo

Hi,
it seems the tests for redis and mongo just try to access those 2 application on their default address/port (as no specific connection is specified nor it's possible to do so). I think the best way is to create and ad-hoc instance for both redis and mongo (just for the tests) and destroy it right after they are finished - what do you think?

What happens to expired items?

Great module. Works as expected and is very performant.

When using with Sqlite, when a record expires, does the module manage purging those records in the underlying data-store?

if so, is there an API to some details of when we should expect the cycles to be spent to do that?
If not, it might be useful to post some code samples to the docs to help people manage the expired items, since the store sizes can get out of control pretty quickly with expired items.

Thanks!
Todd

AttributeError: '_Store' object has no attribute 'read' when stream argument is passed to get

Hello,

Run this code 2 times

import requests
import requests_cache

requests_cache.install_cache('cache', backend='sqlite', expire_after=60*5)

response = requests.get('http://finance.yahoo.com/d/quotes.csv?s=AAPL+F&f=l1srs7t1p2', stream=True)

for line in response.iter_lines():
    print(line)

It will run fine at first try but it will raise AttributeError: '_Store' object has no attribute 'read'

Kind regards

TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled

Hello,

I get this strange error
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled

import requests
import requests_cache

filename_cache = "cache"
requests_cache.install_cache(filename_cache, backend='sqlite', expire_after=60) # expiration seconds
url = "http://www.google.fr"
response = requests.get(url)
print(response.text)

Any idea ?

Kind regards

No caching of 3xx responses

Hi & thanks for making requests-cache! I noticed that 3xx are not cached*, probably because they are handled internally by the requests module. Haven't looked into it due to lack of time, will possibly do so later.

  • Even though installed_cache is called with allowable_codes=(200,403,404,301,302,303,401,500,503)

`pymongo (3.0.3)` does not have `Connection` class

Python 3.4.3 (default, Mar 25 2015, 17:13:50) 
[GCC 4.9.2 20150304 (prerelease)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests_cache.backends.mongo
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/projects/citation/env/lib/python3.4/site-packages/requests_cache/backends/mongo.py", line 10, in <module>
    from .storage.mongodict import MongoDict, MongoPickleDict
  File "/root/projects/citation/env/lib/python3.4/site-packages/requests_cache/backends/storage/mongodict.py", line 15, in <module>
    from pymongo import Connection
ImportError: cannot import name 'Connection'
>>> import pymongo
>>> dir(pymongo)
['ALL', 'ASCENDING', 'CursorType', 'DESCENDING', 'DeleteMany', 'DeleteOne', 'GEO2D', 'GEOHAYSTACK', 'GEOSPHERE', 'HASHED', 'IndexModel', 'InsertOne', 'MAX_SUPPORTED_WIRE_VERSION', 'MIN_SUPPORTED_WIRE_VERSION', 'MongoClient', 'MongoReplicaSetClient', 'OFF', 'ReadPreference', 'ReplaceOne', 'ReturnDocument', 'SLOW_ONLY', 'TEXT', 'UpdateMany', 'UpdateOne', 'WriteConcern', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '_cmessage', 'auth', 'bulk', 'client_options', 'collection', 'command_cursor', 'common', 'cursor', 'cursor_manager', 'database', 'errors', 'get_version_string', 'has_c', 'helpers', 'ismaster', 'message', 'mongo_client', 'mongo_replica_set_client', 'monitor', 'monotonic', 'network', 'operations', 'periodic_executor', 'pool', 'read_preferences', 'response', 'results', 'server', 'server_description', 'server_selectors', 'server_type', 'settings', 'son_manipulator', 'ssl_support', 'thread_util', 'topology', 'topology_description', 'uri_parser', 'version', 'version_tuple', 'write_concern']

Cache images

Hi,

Is it ok to cache requests that return images ?
Or sqlite will just explose when it will have 50k cached requests ?

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.