GithubHelp home page GithubHelp logo

cachetools's Introduction

cachetools

Latest PyPI version

Documentation build status

Test coverage

Libraries.io SourceRank

License

Code style: black

This module provides various memoizing collections and decorators, including variants of the Python Standard Library's @lru_cache function decorator.

from cachetools import cached, LRUCache, TTLCache

# speed up calculating Fibonacci numbers with dynamic programming
@cached(cache={})
def fib(n):
    return n if n < 2 else fib(n - 1) + fib(n - 2)

# cache least recently used Python Enhancement Proposals
@cached(cache=LRUCache(maxsize=32))
def get_pep(num):
    url = 'http://www.python.org/dev/peps/pep-%04d/' % num
    with urllib.request.urlopen(url) as s:
        return s.read()

# cache weather data for no longer than ten minutes
@cached(cache=TTLCache(maxsize=1024, ttl=600))
def get_weather(place):
    return owm.weather_at_place(place).get_weather()

For the purpose of this module, a cache is a mutable mapping of a fixed maximum size. When the cache is full, i.e. by adding another item the cache would exceed its maximum size, the cache must choose which item(s) to discard based on a suitable cache algorithm.

This module provides multiple cache classes based on different cache algorithms, as well as decorators for easily memoizing function and method calls.

Installation

cachetools is available from PyPI and can be installed by running:

pip install cachetools

Typing stubs for this package are provided by typeshed and can be installed by running:

pip install types-cachetools

Project Resources

License

Copyright (c) 2014-2024 Thomas Kemmer.

Licensed under the MIT License.

cachetools's People

Contributors

alejandro-rivera avatar dependabot[bot] avatar gabibguti avatar hugovk avatar kuraga avatar kurtmckee avatar mgorny avatar skirpichev avatar thetorpedodog avatar tkem avatar wimglenn avatar woutervh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cachetools's Issues

Refactor cache base class

The base Cache class should be made public (again), so that individual caching strategies can be built on top of it. Core features:

  • getsizeof should be a static method returning 1 by default
  • getsizeof should be called only once, at item insertion; this probably means the result has to be stored alongside the item's value

Add @ttl_cache decorator

Similar to the other function decorators, a @ttl_cache decorator should be added. ttl can have an arbitrary default value; 600 (10 minutes) looks good to me.

Do not delete expired items in TTLCache.__getitem__

Deleting items in a normally non-mutating method may lead to unexpected behavior, e.g. a RuntimeError when iterating over the cache elements. By simply raising a KeyError for expired elements and leaving the underlying dict as is, this can be avoided. Expired items should be removed on the next mutating operation.

Refactor unit tests

The CacheTestMixin class works well for cache implementations, but the state of decorator tests could be improved.

mapping doesn't handle dict as an input arg

When trying to use a cache on a function that accepts dict args, an error is raised because dict is not hashable.

>>> import cachetools
>>> @cachetools.ttl_cache(maxsize=1, ttl=1)
... def whatever(a, b):
...     pass
...
>>> whatever({1:"A"}, 2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/cachetools/decorators.py", line 39, in wrapper
    result = cache[key]
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/cachetools/ttlcache.py", line 69, in __getitem__
    link = cache_getitem(self, key)
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/cachetools/cache.py", line 39, in __getitem__
    return self.__mapping[key][0]
TypeError: unhashable type: 'dict'

Add TTLCache

This module needs a cache that supports per-item time-to-live values.

Refactor LFUCache

Use a collections.Counter to keep track of usage counts. Though not the fastest way possible, Counter has some features that should make implementation quite simple. This may also serve as an example/blueprint for other cache implementors. Do not use the documented method for getting the least common elements, though, since this will perform a full item sort. Use a simple min instead.

Provide generic @cache decorator

A generic @cachedfunc decorator should work with any mutable mapping, similar to @cachedmethod. One issue is with cache_info() support for currsize and maxsize (which dict does not have, but this might be handled with optional "getter" parameters, e.g.

def cachedfunc(cache_factory, typed=False, currsize=len, maxsize=None, lock=None)

Note that to be generally usable as a decorator, the first argument should be a factory function, which could be used like

@cachedfunc(dict)
def myfun(arg1, arg2)

In the @lru_cache compatible decorators, this could be passed a lambda, e.g.

def lru_cache(maxsize=128, typed=False, getsizeof=None, lock=RLock):
    return cachedfunc(lambda: LRUCache(maxsize, getsizeof), 
                                      typed, 
                                      currsize=operator.attrgetter('currsize'),
                                      maxsize=operator.attrgetter('maxsize'),
                                      lock=lock)

Alternatively, an info() method could be added to the Cache base class, tracking cache hits and misses, but this would probably conflict with how cache implementations use the base class's __getitem__. For example, TTLCache and LRUCache call Cache.__getitem__ during __setitem__, which would increment cache hits/misses.

On the other hand, providing currsize and maxsize for @cachedfunc might call for similar support in @cachedmethod, which has to be evaluated seperately.

Allow cachedmethod.cache to return None

If the cache method provided to the @cachedmethod decorator returns None, the wrapped method should be called directly.
Also consider making the cache parameter a property of the wrapper.

Improve documentation

Add an example showing how to use getsizeof with sys.getsizeof() and/or len() for sequences/strings.

Add "missing" class decorator

The "missing" stuff is somewhat orthogonal to the rest of the library. Instead of providing this as a constructor argument, this should be made a class decorator, which could be applied to dict as well (details TBD).
Cache classes would only have to provide basic __missing__ support, with the default method raising a KeyError. Constructor arguments should be deprecated. Documentation should use an extra chapter, describing this as an "alternative way" of doing memoization.

Feature request: manually clearing cache

It seems that currently you can only pop individual items by a key.

It also looks like a cache object isn't saved by cachedfunc decorator into a wrapped function, it's done in cachedmethod only.

Add proper locking for cache_clear(), cache_info()

Calling cache.clear() must be locked in cache_clear().
We might get away with unlocked access to cache.size and cache.maxsize in cache_info(), but proper locking should be added in case these are implemented as properties.

Update README/introduction docs

cachetools now features much more than cache classes, so the introduction pycon samples should also show use of the decorators and maybe custom keys.

Code cleanup

Perform some code cleanup, to bring this more in line with PEP 8; for example, add underscores to functions that should not be exported.

Report cache.size in cache_info()

CacheInfo.currsize as reported by cache_info() is set to len(cache).
For caches that use getsizeof() to compute their size, this should be changed to cache.size.

Custom exception type for "size too large"

Derived from ValueError, raised if getsizeof(value) > maxsize.

May be caught in decorators to bypass caching. This way, maxsize=0 may be supported, for improved @lru_cache compatibility.

Evaluate __missing__ methods

  • Add a __missing__ method to Cache base class, which will be called from __getitem__ if key cannot be found. Make sure that __missing__ behaves as specified in collections and elsewhere, e.g is only called from __getitem__ and not from get. This may involve providing extra methods in the base class, and possible derived classes, e.g. __contains__.
  • By default, Cache.__missing__ raises KeyError.
  • Add missing key argument to Cache constructor. If provided, missing(key) will be called from Cache.__missing__, and the return value will be added to the cache (respecting maxsize and possibly discarding items).
  • Adapt other cache classes to support missing key argument. TTLCache will probably need to be modified the most; maybe need to provide ExpiredError support in the Cache base class.
  • Optional: adapt decorators to use missing instead of __setitem__.

Refactor LRU/TTLCache

Regarding performance, it might be desirable to not derive TTLCache from LRUCache, but to duplicate the LRU implementation.

Refactor cache implementations

  • Separate values from support data structures in cache implementations, e.g. LRU and TTL links. This may require more space, but leads to cleaner implementation. Will also make __missing__ implementation more intuitive.
  • In Cache base class, separate cached size from value for similar reasons. If no getsizeof is given, this can be optimized away. Implementation details TBD.

Pickle improvements

Hi,

Thanks for your cache implementation. I was hitting the "maximum recursion limit" when pickling a large LRU cache. I think this is due to the recursive nature of the double linked list you use for sorting the values. My workaround was to pickle only this list using a loop, instead of the list+dict (this also saves space).

The code below was added to lru.py to fix the problem. I'm posting it here in case you're interested.

Thanks!
Marco

  def __getstate__(self):
        c = self.__root
        l = []
        while c.prev!=self.__root:
            l.append((c.prev.key,c.prev.value))
            c = c.prev
        state = {
          "currsize" : self.currsize ,
          "maxsize" : self.maxsize,
          "ordering" : l
        }
        if hasattr(self,"__missing"):
            state["missing"] = self.__missing
        else:
            state["missing"] = None
        if hasattr(self,"__getsizeof"):
            state["getsizeof"] = self.__getsizeof
        else:
            state["getsizeof"] = None
        return state

    def __setstate__(self,state):
        self.__init__(state["maxsize"],state["missing"],state["getsizeof"])
        for kv in reversed(state["ordering"]):
            self.__setitem__(kv[0],kv[1])

Move Python3 style decorators to sub module

If/when a generic @cachedfunc decorator gets added, the current function decorators should be moved to a sub module, and documented as add-ons.

Since we can't break the API before v2.0, they should still be available, i.e. imported, by the main module in v1.x.

Exclude expired items when iterating over TTLCache

It may be confusing if expired items show up when iterating over TTLCache.items(), or in its string representation. Therefore, a custom TTLCache.__iter__ method should be defined, which fast-forwards over expired items on start.

TBD:

  • handle expiration during iteration
  • how to handle TTLCache.currsize (maybe ignore for now)

No lock provided in cachedmethod

In cachedfunc implementation, you are providing a lock argument, why not in cachedmethod ?

It seems that multiple thread could be sharing the same instance of an object, and require locking facility also ? Am I wrong ?

BTW, shouldn't the locking mecanism implemented into the Cache objects ? (in this case, no need of this argument in either cachedfunc nor cachedmethod).

Improve key functions

  • make hashed tuple's add and radd insert marker objects
  • compute hash value on demand (lazy)
  • "flatten" tuples in hashkey and typedkey
  • add docs how to use hashkey with custom key functions, e.g. with dict argument

TTLCache operations must be atomic/transactional

Some TTLCache operations may behave erratically if a cache items expires in the middle of the operation. This especially concerns the "mixin" methods such as get, pop and setdefault, which have been changed for __missing__ support to first check for item presence using __contains__.

If, for example, a cache item expires after TTLCache.get calls __contains__, but before __getitem__ is called, an exception will be raised unexcpectedly.

So all operations should behave as if the timer was frozen at the start of the outermost method call. Since passing optional time arguments to each and every method call is unwieldy, at least for "special" methods, other implementation strategies need to be evaluated.

Note this does not affect normal cache operations, i.e. __getitem__, __setitem__ and popitem, and does not affect decorators.

Change Changes to RST

Changes is already - almost - in RST format, so this should be renamed to CHANGES.rst and link in README.rst should be updated accordingly.

Rewrite @cachedmethod example to use per-object cache

The main advantage of @cachedmethod over function decorators is that cache properties can be set at runtime. Therefore, the example class should have a constructor with a cachesize parameters. This may also avoid questions regarding locking, though these should be addressed at some point.

Save hash value for tuples returned by makekey()

Since hashes of tuples need to be recomputed each time, objects returned by makekey could compute the hash once and store it in a slot. See Python 3's lru_cache implementation for details (using list).

Evict callback

Hi,

I feel what is missing from this library is an evict callback. When an item is evicted, maybe you want to do something with the item - for example persist it to storage somewhere because it might have changed during its lifetime in the cache.

Do you have something like this planned?

Update README.rst

Should use same wording as documentation/index.rst for introductory section.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.