GithubHelp home page GithubHelp logo

pymemoize's Introduction

Memoize

Test Status

This is a (relatively) simple Python memoizing module (ie. a function cache), in which any dict-like can be used as the actual storage object.

Basics

Lets walk through a simple example. First we need somewhere to cache values. We will use a simple dictionary for this, but you can use anything with a dict-like interface (see lower for the specification), including persistent storage mechanisms.

# Make a store.
store = {}

Now we create the Memoizer object itself. Any keyword arguments will be stored as options to the "default" region (more on this later).

# Initialize the cache object.
from memoize import Memoizer
memo = Memoizer(store)

There is a direct API for retrieving a value. We pass it the key we want, and a function that is used to calculate it.

def basic_func():
    print 'called'
    return 123

# Manually retrieve a value.
memo.get('basic', basic_func)
# stdout > called
# return > 123

# If we ask for the same key it will not run the function again.
memo.get('basic', basic_func)
# return > 123

You can check for a given key, and manually remove something from the cache if you want:

memo.exists('basic')
# return > True

memo.delete('basic')
memo.exists('basic')
# return > False

memo.get('basic', basic_func)
# stdout > called
# return > 123

Function arguments

You can specify the positional and keyword arguments to call the function with:

def adder_func(a, b):
    print called
    return a + b

# Passing args...
memo.get('adder-1', adder_func, (1, 2))
# stdout > called
# return > 3

memo.get('adder-1', adder_func, (1, 2))
# return > 3

# And kwargs...
memo.get('adder-2', adder_func, (), {'a':3, 'b':4})
# stdout > called
# return > 7

Expiring values

Values need not live forever; we can easily supply maximum ages or expiry times for the data.

memo.get('expires', basic_func, max_age=1)
# stdout > called
# return > 123

memo.get('expires', basic_func, max_age=1)
# return > 123

# Let us wait for this to expire...
import time
time.sleep(1.1)

# The function will get called again!
memo.get('expires', basic_func, max_age=1)
# stdout > called
# return > 123

Alternatively you can pass a expire which is the explicit unix time for the data to expire at.

You can see how much time is left until something expires, or set an expiration manually:

memo.get('expires', basic_func, max_age=60)
# stdout > called
# return > 123

memo.ttl('expires')
# return > something slightly less than 60

# Wait a bit and see what happens...
time.sleep(1)
memo.ttl('expires')
# return > something slightly less than 59

# Manually set the remaining ttl of the data.
memo.expire('expires', 10)
memo.ttl('expires')
# return > something slightly less than 10

# Set an explicit expiry time.
memo.expire_at('expires', time.time() + 3600)
memo.ttl('expires')
# return > something slightly less than 3600

Etags

If you need to regenerate your content based off an external resource that is not reflected by the arguments of the function when called, then you can use etags.

An etag is any object that represents the current state of the resources that will be drawn upon to generate the final value. If the etag changes then it is assumed that the value is out of date. Note that not providing an etag will not trigger a regeneration.

store = {}
memo = Memoizer(store)

memo.get('etagged', basic_func, etag='a')
# stdout > called

# Doesnt call the function again if the etag is the same.
memo.get('etagged', basic_func, etag='a')

# Change the etag.
memo.get('etagged', basic_func, etag='b')
# stdout > called

You can also supply a function that is called with the same arguments that the function will be called with, and its return value is used as the etag.

state = []
def get_etag():
    return len(state)

memo.get('etagged', basic_func, etagger=get_etag)

Decoration

The Memoizer object can be applied as a decorator to a function, which will automatically cache its return values keyed on the function name, and arguments provided. This is only reliable as long as the repr of the arguments is deterministic (ie. no dicts which can change order).

The inclusion of arguments into the key is the primary difference between the direct get method, and using the decorator; functions have effectively been memoized!

You can manually specify a key as a positional argument if there will be a name collision by another function with the same name.

Note: Most options we have used previously are perfectly valid to use in the decorator declaration, with an obvious exception of arguments as those are handled by calling the function. Features that we will explain on the decorators (ie. regions, locks, etc) apply equally to the direct get method.

Basic usage:

@memo
def cached_func():
    print 'called'
    return 456

cached_func()
# stdout > called
# return > 456

cached_func()
# return > 456

Using options:

# This should be obvious as to what it does...
@memo(max_age=60)
def one_minute():
    return 'value'

one_minute()
# return > 'value'

Many of the Memoizer methods have been applied to the wrapped function as well!

one_minute.exists()
# return > True

one_minute.ttl()
# return > slightly less than 60

one_minute.expire(10)
one_minute.ttl()
# return > slightly less than 10

one_minute.delete()
one_minute.exists()
# return > False

Since the cache for the function is keyed by the arguments, you must provide all of the positional and keyword arguments that you are checking against to these methods.

@memo
def adder(a, b):
    return a + b

adder(1, 2)
# return > 3

# This will not work as expected!
adder.exists()
# return > False

# But this does...
adder.exists((1, 2))
# return > True

Care has been taken to key this keeping in mind how the arguments will be accepted by the function. Ergo you can specify positional arguments by a keyword and it will still use the same key.

# Note that this results in the same argument values.
adder.exists((2, ), {'a': 1})
# return > True

Namespaces

A namespace a simply a string prefix for the final key. The prefixed key is only ever seen by the store itself.

store = {}
memo = Memoizer(store, namespace='master')

memo.get('key', str)
store.keys()
# return > ['master:key']

store.clear()
memo.get('key2', str, namespace='2')
store.keys()
# return > ['2:key2']

store.clear()
@memo(namespace='3')
def func():
    pass
func()
store.keys()
# return > ['3:__main__.func()']

Regions

Regions are sets of default values. A region named "default" is initially created and populated with keyword arguments passed to the constructor.

A region is simply a dictionary within the Memoizer.regions dictionary (mapped by name). It is referenced by name as an kwarg in all methods.

store = {}
memo = Memoizer(store)
memo.regions['short'] = {'max_age': 60}
memo.regions['long'] = {'max_age': 3600}

memo.get('key1', str)
memo.get('key2', str, region='short')
memo.get('key3', str, region='long')

memo.ttl('key1')
# return > None
memo.ttl('key2')
# return > ~60
memo.ttl('key3')
# return > ~3600

A simple form a region inheritance may exist by a region naming another region as its "parent" (again, by name). Please see the tests for a demonstration.

Alternative stores

You can specify the store to use on a global, region, or individual bases.

primary = {}
secondary = {}
tertiary = {}
memo = Memoizer(default)
memo.regions['secondary'] = {'store': secondary}

memo.get('key1', str)
memo.get('key2', str, region='secondary')
memo.get('key3', str, store=tertiary)

primary.keys()
# return > 'key1'
secondary.keys()
# return > 'key2'
tertiary.keys()
# return > 'key3'

Locking

It is possible that two threads will attempt to generate content at the same time, but that can be a waste of resources. Ergo, you (or the store) can provide a locking implementation to attempt to stem this waste.

If the store has a lock method, that will be used. You can also provide a lock function as an option that will override the store's native lock.

You may provide a float timeout option as well to override the default.

The lock constructor will be called with the key for which a value is about to be calculated. The function MUST return an object with an acquire(timeout) and a release() method.

The acquire method will be called with a single float representing the maximum amount of time to block waiting for a lock, and the boolean value of the return value MUST indicate if the lock was acquired. Any exceptions thrown will not be caught. The release method will be called only if the lock was acquired.

Redis

We have provided a small wrapper and lock implementation for use with Redis.

from redis import Redis
db = Redis()

import memoize.redis
store = memoize.redis.wrap(db)

memo = memoize.Memoizer(store)

# Use!

Django's cache framework

If you want to integrate PyMemoize in a django project, you can use Django's cache framework as a store for memoized values. This way, you can leverage any cache that has a Django integration, such as Redis, Memcached, or even database cache.

import memoize.djangocache

store = memoize.djangocache.Cache('default')
memo = memoize.Memoizer(store, namespace='memoize')

If you want to use another cache declared in your django settings, you can replace default with the name of the cache.

Store Interface

There is a relatively minimal interface that a store must offer to be used with this package.

On keys

Keys are ALWAYS strings.

On stored values

The valued stored are ALWAYS tuples. The first item is in integer representing the current protocol version (currently 1). For protocol 1 the fields are:

0: protocol version (always 1, thus far)
1: creation time
2: expiry time
3: etag
4: value

There are constants in memoize.core that hold the index values so you will not have to hardcode these (ie. CREATION_INDEX, and ETAG_INDEX).

Method: Store.__getitem__(key)

Return the requested data tuple. MUST raise a KeyError, or return None of the key does not exist.

Method: Store.get(key)

Return the requested data tuple. MUST return None if the key does not exist.

Method: Store.__setitem__(key, data_tuple)

Store the data tuple. This may optionally set an expiry time with the store's native method. If the native method does not support float times, then round up to the next usable time so that the store does not expire a value before we intend it to.

ie. If using a store that has second resolution, set the expiry time to: math.ceil(expiry).

Method: Store.__delitem__(key)

Delete the data tuple. MAY throw an KeyError.

Method: Store.lock(key) optional

Return a lock object as specified by the locking section below.

Method: Store.ttl(key) optional

Return the time-to-live of the given key as a float, or None if it does not exist or will not expire. If the native expiry mechanism does not support float times then take care that the returned value is less than the "real" expiry time.

ie. If using a store that has second resolution, return: native_ttl - 1.

pymemoize's People

Contributors

e1ven avatar eliotberriot avatar mikeboers avatar swayf avatar wolf0403 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

pymemoize's Issues

Etagger decorator

state = []

@memo
def summarize_state():
    # Do something with `state`.
    pass

@summarize_state.etagger
def summarize_state():
    return len(summarize_state)

Redis store cache warming

When using the redis store provider, it would be nice to have a way to "warm the cache" that uses redis bulk operations.

I'm using the memo decorator basically to cache database lookups -- It's kinda slow to store one result at a time.

Add django's cache integration

PyMemoize has Redis support, but I think it could be a good addition to have Django's cache framework support, meaning you can plug it directly into a Django project regardless of the cache used internally by Django (redis, memory, file...).

Imaginary API:

import memoize.djangocache
# django can have various cache, each one having a different name
store = memoize.djangocache.Cache('default')
memo = memoize.Memoizer(store, namespace='memoize')

I might submit a PR soon implementing this.

Python2.5

With the addition of coverage to the test suite, Python2.5 is no longer supported.

Does anyone care if it gets dropped?

Inject memo instance

Hey,

is there a way to inject a memo instance into a class?

I don't like the idea of either accessing it globally or having to create an instance in every file

Is the cache "thread-safe"?

One of the possible scenarios:

  1. One thread is calling the supplier function to calculate the value.
  2. Another thread is requesting the value of the cache afterwards but the first call of the supplier function has not returned a value yet.
    Is the function called a second time? (I hope not.)

`__memokey__` for special keying

Right now, every arg/kwarg passes through repr to get a string for use in the cache key. This may put too much overloading on __repr__ of classes.

Perhaps, a special __memokey__ could help for when it is too difficult to do everything in __repr__.

Clearly document use on methods

Right now it is not documented well (or at all) how to decorate class methods.

This is particular tricky with regards to how keys are generated from instances (they are repred).

Overload etag/etagger (and other potential pairs, e.g.: key/keyer)

If we restrict the types that are allowed to be used as etags (and/or user-specified keys, see #4), then we could allow either a scalar etag or a function to create said etag to be passed via the same etag kwarg.

Potential set of types could be:

  • strings
  • ints
  • bools
  • None
  • tuples of these

Basically, hashable builtin types.

Keying function

It could be nice to be able to provide a function that returns the specific key to use much like etagger does:

def keyer(*args, **kwargs):
    id_ = kwargs.get('id')
    if id_ is not None:
        return id_
    else:
        raise NotImplementedError()

@memo(keyer=keyer)
def render_blog_post(content, **kwargs):
    # Some expensive process to render latex, Markdown, etc.
    pass

# ...

render_blog_post(post.body, id=post.id)

Decorator should fail safe

Scenario: Using redis store, and the store is remote

Expected behavior: If the store is unavailable, catch exceptions, log a warning, and just run the normal function.

Actual behavior: exceptions bubble up

Release latest version to PyPI

Can you release the latest version/current HEAD to PyPI? With the version on PyPI (0.1.1) I cannot memoize methods of class-instances (python 2.7), while this does work with the latest version from master (52d00e8 at the moment.)

Thank you! Great work by the way!

Python3

I ran 2to3 on this, and it was able to run under Python3.2 nearly unchanged.
it looked like the biggest change it made was moving iteritems to iteritems()

Can we do this on your build as well? ;)

Need annotations support

This code:

@memo
def get_by_guid(guid: uuid.UUID) -> Dict[str, str]:
    pass

raise error at python 3.5.1:

ValueError: Function has keyword-only arguments or annotations, use getfullargspec() API which can support them

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.