GithubHelp home page GithubHelp logo

whitfin / cachex Goto Github PK

View Code? Open in Web Editor NEW
1.5K 17.0 97.0 1.5 MB

A powerful caching library for Elixir with support for transactions, fallbacks and expirations

Home Page: https://hexdocs.pm/cachex/

License: MIT License

Elixir 100.00%
caching transactions lru distributed-systems expiration

cachex's Introduction

Cachex

Build Status Coverage Status Hex.pm Version Documentation

Cachex is an extremely fast in-memory key/value store with support for many useful features:

  • Time-based key expirations
  • Maximum size protection
  • Pre/post execution hooks
  • Proactive/reactive cache warming
  • Transactions and row locking
  • Asynchronous write operations
  • Distribution across app nodes
  • Syncing to a local filesystem
  • Idiomatic cache streaming
  • Batched write operations
  • User command invocation
  • Statistics gathering

All of these features are optional and are off by default so you can pick and choose those you wish to enable.

Table of Contents

Installation

As of v0.8, Cachex is available on Hex. You can install the package via:

def deps do
  [{:cachex, "~> 3.6"}]
end

Usage

In the most typical use of Cachex, you only need to add your cache as a child of your application. If you created your project via Mix (passing the --sup flag) this is handled in lib/my_app/application.ex. This file will already contain an empty list of children to add to your application - simply add entries for your cache to this list:

children = [
  {Cachex, name: :my_cache_name}
]

If you wish to start a cache manually (for example, in iex), you can just use Cachex.start_link/2:

Cachex.start_link(name: :my_cache)

For anything else, please see the documentation.

Benchmarks

There are some very trivial benchmarks available using Benchee in the benchmarks/ directory. You can run the benchmarks using the following command:

# default benchmarks, no modifiers
$ mix bench

# enable underlying table compression
$ CACHEX_BENCH_COMPRESS=true mix bench

# use a state instead of a cache name
$ CACHEX_BENCH_STATE=true mix bench

# use a lock write context for all writes
$ CACHEX_BENCH_TRANSACTIONS=true mix bench

Any combination of these environment variables is also possible, to allow you to test and benchmark your specific workflows.

Contributions

If you feel something can be improved, or have any questions about certain behaviours or pieces of implementation, please feel free to file an issue. Proposed changes should be taken to issues before any PRs to avoid wasting time on code which might not be merged upstream.

If you do make changes to the codebase, please make sure you test your changes thoroughly, and include any unit tests alongside new or changed behaviours. Cachex currently uses the excellent excoveralls to track code coverage.

$ mix test # --exclude=distributed to skip slower tests
$ mix credo
$ mix coveralls
$ mix coveralls.html && open cover/excoveralls.html

cachex's People

Contributors

adkron avatar aerosol avatar apelsinka223 avatar artkay avatar camilleryr avatar comboy avatar drpandemic avatar ejscunha avatar elvanja avatar esse avatar feld avatar george124816 avatar hypno2000 avatar imaxmelnyk avatar ivan avatar jannikbecher avatar jung-hunsoo avatar kianmeng avatar legoscia avatar maples7 avatar masashiyokota avatar princemaple avatar qwerescape avatar rrrene avatar sneako avatar suprafly avatar whitfin avatar zorbash avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cachex's Issues

Ensure consistency across :ok/:missing/:loaded

The main CRUD functions deal with the following:

  • :ok if the key was found
  • :missing if the key was not found
  • :loaded if the key was found via fallback

We should expand this to everywhere it's applicable - the use case which triggered this issue is take/3 always returning :ok but I'm pretty sure it applies to others too.

Automatic discovery of nodes

Hi,

First of all, thanks for this amazing project !

I was wondering if we could not discover nodes automatically using Phoenix Presence which also solves Service Discovery across nodes.

Example in the README wrong

The example in the README does not work:

{ :ok, information } = Cachex.get(:info_cache, "/api/v1/packages", fallback: fn(key, db) ->
  Database.query_all_packages(db)
end)

The problem is that the first time, Cachex returns { :loaded, information }, and so it crashes because it doesn't match.

It could be changed to either { status, information } or information = Cachex.get!(...)

Its very hard to match a successful status when there is :ok and :loaded both successful...

Add an option to disable write concurrency

Right now it's possible for writes to happen whilst you're working with Transactions.

It would be nice if we could add an option to allow for sequential writes, to avoid them clashing. For example, all writes finish before the next - so you can guarantee that your Transaction is executing alone.

This would have to be a cache level setting, because otherwise it'd make no sense. I can't imagine this would be massively used, but it should be fairly easy to implement.

Refactor module structure to nest module appropriately

Less of a concern, but there are some things currently which need moving to make more sense logically. As an example notifier.ex should live under the hook directory and be a child module because it's only use against Hooks.

Investigate asynchronous fallback actions

Fallbacks are currently executed inside the worker itself, which sucks when you have a long-running fallback. We should investigate how to deal with them asynchronously.

I think that maybe returning { :deferred, task } to the calling process is the way to go, and then you can just await on the task. The only problem with this is that async: true calls will never receive this reply... so it needs thinking about.

This would naturally be optional - it might be that you want to block the worker.

Implement max_size and eviction policies

Based on #49.

Due to #51 being merged, we're now able to implement size limits on a cache without any race conditions. This will first make an appearance in v2.0.

Implementations should use hooks to avoid bloating the Cachex core. Policies to implement should be LRW (default) and LRU. More may come in future, but these are likely the two most popular. Policies based on frequency would be extremely difficult to implement without introducing either performance hits or memory costs.

Add pooling support to workers

Previously (prior to v1.0.0), Cachex had support for worker pooling. This was stripped out because it was decided that a single GenServer was fast enough. This is true for all ETS related actions, however not strictly true when the user is working with fallback functions which hit things like remote datastores.

As an example, say you're using Cachex as a local cache of a Redis cache (very common). Your fallback is then going to be a call to Redis. Even though Redis is fast, this may still take (for example) 10ms. Due to the fact that these fallbacks are executed in the worker, it means all other cache actions are blocked for 10ms.

Re-introducing (optional) pooling would help this as you can tune appropriately.

Strip out Mnesia usage in favour of plain ETS/Eternal

Mnesia is overkill for what we now provide, we can do everything (and in a faster way) using plain ETS. The only thing Mnesia we currently require Mnesia for is Transactions, and so this issue is blocked by #64.

Once #64 is resolved, we should absolutely strip out Mnesia and just use vanilla ETS. This will be a lot of work because a lot of places (mainly tests) use Mnesia specific function calls.

Expiration based deletes should register as purge actions

Right now when we expire on-demand, it fires off as a delete action to any registered hooks. It should technically be a purge. We can make use of the :via flag introduced in #8 to change this easily.

I'm going to mark this as a bug as it technically makes cache stats look invalid (evictions rather than expirations), so it gives you false ideas of what your cache is doing.

Add additional operations such as List/Set operations

Not that is a bad thing by itself at all but i am wondering if there is any plan to support List and Set operations in the future ?

I believe it would open up many more use cases for cachex which arguably would go beyond just caching..

Update documentation and specs on all functions

Right now there are chunks of documentation which are out of date, they need to be updated ASAP. In addition, there are a number of missing @spec tags which need adding. We might open up the documentation on Hex in v2.x, so makes sense to start preparing.

Define cache size limit

I guess we should have a way to determine cache size limit, to avoid reach server RAM limit. I could not find any options to define this.

Can't replicate data to another node

Can I replicate data across two connected nodes? I created simple elixir project to play with the cache and then started two elixir nodes in separate IEX sessions. I connected the nodes, start_link'ed both of them, called the Cachex.add_node(...) and I can only see cached values which were stored on their respective nodes. "key"'s value is not visible from node 'two' and "key2"'s value is not visible from node 'one'. Screen captures of these sessions attached for your reference. What could be the issue?

screen shot 2016-04-27 at 6 12 10 pm

screen shot 2016-04-27 at 6 12 26 pm

Convert Hook messages from { :func, args... } to { :func, [ args ] }

After speaking to @martinsvalin on Slack, I realised I should've done this to begin with.

Rather than { :get, :my_cache, "key" }, we can do { :get, [ :my_cache, "key" ] }. This means that you can match on the action much more easily in function heads, which is always super fun.

This might actually be cheaper in terms of performance too, so win win!

Naturally this drops in v2.0 as it's totally breaking for hooks. Need to make sure to update documentation and migration guides. Also results in rewriting a fair few of the internal hook implementations.

Enum.group_by/3 with a map/dictionary as second element is deprecated

Just upgraded to Elixir 1.3.0

warning: Enum.group_by/3 with a map/dictionary as second element is deprecated, please use a map instead
  (elixir) lib/enum.ex:1000: Enum.group_by/3
  (cachex) lib/cachex/hook.ex:68: Cachex.Hook.hooks_by_type/2
  (cachex) lib/cachex/options.ex:94: Cachex.Options.setup_hooks/2
  (cachex) lib/cachex/options.ex:44: Cachex.Options.parse/1
  (cachex) lib/cachex.ex:1050: Cachex.parse_options/1

Line: https://github.com/zackehh/cachex/blob/master/lib/cachex/hook.ex#L62

All delegate methods should be forked out inside the cache Worker

Right now there's a blend of delegates in that some delegate inside the Worker (expire_at/persist) and some delegate in Cachex (decr).

It should be ensured that all of these delegates do take place inside the Worker. It adds a bit of extra and arguably redundant code, but it allows the hook notifications to contain the correct function. As it stands right now, if you call Cachex.decr(:my_cache, "key") your notification becomes { :incr, "key" }, which totally sucks (and is why this is marked as a bug). Anyone listening for decr calls will just never get them.

Alternatively, perhaps we could add a recognisable :from or :via to all the worker functions, so if you call something like Cachex.incr(:my_cache, "key", via: :decr), it notifies using { :decr, "key" } (overriding the function name).

Migrate Hooks to live as GenServer and not GenEvent

GenEvent really doesn't do much for us here, the Hooks operate more as GenServers. There's a fair bit of bloat to Hook setup because of this, so we should just fallback to using GenServer. In future, perhaps GenStage but servers will suffice for now.

We should be able to keep the same interface, but this is going in v2 just in case.

Merge Worker.Local and Worker.Remote into Worker.Actions

We can infer whether we wish to use Remote actions vs non-remote actions at runtime, which means that we can delegate internally instead of having to have two different actions sets.

This is a step towards removing the distributed interfaces.

Add a stop function to terminate a cache

As much as I don't think this will be used often, it'd be nice to provide a way to terminate everything related to a cache in a single call - e.g. kill Janitors, Hooks, Tables, etc.

Shouldn't be too hard, so should do it for v2.

Overhauling of the remote connection and replication implementations

Right now there's a neat little bug with remote connections (in that they don't work at all). This is just down to a bad Mnesia config and can easily be fixed.

However, I found this during a run through of various replication scenarios and realised that we don't handle node dropping very well, at all. I have no good ideas at this point of how to fix, but I'm dumping all thoughts into this issue to get them on "paper".

Ok, so things to do out of this:

  • Add a Cachex.add_node/2 function to add nodes dynamically (this also helps from remote consoles)
  • Add a way to update a working state, for example if a local node becomes remote
  • Ensure that tables are replicated around - this will stop crashes on node drops

Neaten up how Cachex tasks are executed

Right now we have to alias Tasks internally if we want a Cachex context. It would be better if we could just execute mix cachex <task> to provide a context.

This would remove a chunk of the internal codebase whilst also expanding the tasks we support running in context.

Post purge hooks?

Is there a way to be notified when entries are purged? Really great project btw, using it more and more.

Adopt the use of GenDelegate

Recently I split out the macros inside gen_server.ex into their own module named gen_delegate. Cachex should use this module as a dependency rather than keeping a copy internally.

The migration should be as simple as use GenDelegate and requiring it as a dependency.

Add the ability to touch keys on read

A common use case for in-memory key/value is LRU caching. We can support this quite easily, but it would be more convenient if Cachex provided a way to "touch" a key automatically. This can currently be done using the refresh/3 function, but it would be nice to provide it as an option on all retrieval commands.

I'm thinking :refresh_on_read as a cache-based option, so you always touch, and then a :refresh option on each of the read operations (which defaults to false).

Perhaps s/refresh/touch on those above options, need to think about it more.

Revisit the Stats hook to ensure that we're correctly representing operations

Right now we just track get/set and a couple of other things. It would be better if the Stats hook were modified into tracking all CRUD groups (at least).

Perhaps it would be nice to track the count of results in some form. We could keep track of results on a per-action basis (this is just noted for my memory).

%{
  "clear" => 500,
  "exists?" => %{
    "true" => 10,
    "false" => 20
  },
  "get" => %{
    "ok" => 10,
    "missing" => 20,
    "loaded" => 15
  }
}

Maybe then we could support stats retrieval on a per-command basis which allows us to provide additional info about that stat.

Cachex.stats(:my_cache)                 # very high level stats (op count, etc)
Cachex.stats(:my_cache, :get)           # include things like hit rate using info in the `:get` key
Cachex.stats(:my_cache, [ :get, :set ]) # same as above but for both actions
Cachex.stats(:my_cache, :raw)           # everything we have collected (the struct above)

Hooks should be provided access to the Worker state

As it currently stands, there is no way to call a cache from within a hook callback. This could prove valuable for synchronous hooks - i.e. you might want to retrieve a key quickly before it's removed.

As of v0.9.0, the Cachex interface accepts a worker and does not have to pass between procs to execute actions. This means that we can allow cache calls inside hooks using the worker, and everything should flow smoothly.

I'd imagine the best way to do this is to pass the worker into init/1 and allow the developer to choose whether to keep it around or not.

Convert the default Hook type to be :post

It's less frequent that you want to fire a hook before the action (I'd think). The default in the struct should be :post going forwards. This makes it less confusing when you attempt to use results: true and nothing happens.

:random module deprecated

This happens while compiling cachex:

==> cachex
Compiling 20 files (.ex)
warning: random:uniform/1: the 'random' module is deprecated; use the 'rand' module instead
  lib/cachex/notifier.ex:46

Expand the options available in Cachex.Inspector

Right now we just have basics such as checking memory usage and retrieving state, we should add some of the following (this list will be updated as things become useful):

  • total size of the expired keyspace
  • last time the ttl loop ran (if applicable)
  • total run time of the last ttl loop (if applicable)

Integrate code linting into CI builds

Recently I've been using Credo in other projects and it seems like a good idea to set it up with Cachex so that people have a guide when contributing. It should also be embedded into Travis builds to make sure all code fits the lint guides.

Open up the documentation for every module

Right now only the main interface goes into the Hex documentation. Realistically we should open all of it, just with a note stating that they're for internal use only.

Reimplement Transactions without using Mnesia

Now that we're no longer using distribution via Mnesia, we should jump to using straight ETS tables.

This presents a challenge, because of transaction support, which we don't want to remove - so we need an implementation which functions in the same way (although we don't need to provide abort/1 I think), without using Mnesia.

This will take shape by using a special lock table per cache, which allows for either a global lock, or key locks. If the key being operated on is locked, then calls to that key just sit until it's ready. This needs heavy documentation to make it clear that this will block all calls (if you block the table, obviously).

There's naturally a slight performance hit to this even if you're not using transactions. Due to this, transactions should be a flag you can turn on or off. If they're disabled, then you can't use them (and thus there's zero hit). If it's enabled, you can use them, with the understanding that there's likely a couple of microseconds perf hit on every command. I haven't yet decided whether it should be enabled by default or not (I'm leaning towards not).

It's very possible that this won't ever be done/merged, due to complexity. Also, the above is a scramble of thoughts in my head - I'll neaten up the notes here once it's more solid.

Move to storing state in ETS tables and execute in the calling process

This is a big win because we get to strip out the internal GenServer, which is awesome because sending the messages is like 90% of the actual overhead of Cachex.

This issue can't be resolved without either a minor (likely 1.2), or more likely a major (2.0), so no timeline on when this will be done. It's not yet guaranteed if this will actually be done, I'm just considering it.

The basic idea is to move all state into ETS itself so we read from ETS instead of storing in a GenServer. There are a lot of factors to this, and they're described in #39.

Reasons for doing this:

  • Any retrieval actions block the caller anyway, so this is no different.
  • A Cachex state changes only rarely, so we're pretty much just reading memory.
  • Huge performance boosts when moving to this model (right now you're looking at 9 microseconds, vs < 1 microsecond if using ETS).
  • The GenServer becomes an unnecessary bottleneck when using fallback functions.
  • It's much easier for the user to configure their own Tasks and Concurrency.
  • There should actually be very limited impact to the interface, only that of execute/3.

Things to consider:

  • Async behaviours are removed, because everything executes in the calling process. Not sure if this means a Major bump is required. (done, and no - it can be v1.2)
  • Timeouts are removed, so same as above because they do nothing. (done)
  • We'll need Macros to avoid constantly getting/setting state. (not macros, but wrappers - done)
  • What should the ETS table look like? (just cache -> state - done)
  • What happens if we crash? (done via Eternal)
  • Synchronous hooks need to maintain their behaviour - they might clash with messages being passed by the user. (no change needed).

Provide a way to iterate through a cache

Although I wouldn't recommend this as a heavily-used operation, it does make sense to have a way to iterate the cache (similar to Redis' SCAN command family).

It'd be neat if there were a function of Cachex.stream(cache, options) which can be used to iterate a cache in such fashion:

:my_cache |> Cachex.stream! |> Enum.to_list

This would allow various filtering and whatnot, in order to allow people to filter out cache values and operate on them appropriately. For example if you wanted to delete all keys beginning with a:

list_of_a =
  :my_cache
  |> Cachex.stream!
  |> Enum.filter(fn{ key, value } ->
        String.starts_with(key, "a")
      end)
  |> Enum.to_list

Cachex.execute(:my_cache, fn(worker) ->
  Enum.each(list_of_a, &(Cachex.del(worker, &1))
end)

Remove the notion of distributed nodes

Going forward, Cachex is going to operate as a local cache only. The distribution can be replaced by using something like Redis and fallback operations. This decision was taken to allow extra flexibility in caches (such as #49) and remove several race conditions.

Naturally this will hit in 2.x as it's breaking.

Add the ability to reset a cache to an initial state

I have recently hit a use case where it would be nice to have an option to reset a cache, e.g. Cachex.reset/2.

Reset would re-initalize all hooks and empty the cache

I'm thinking of three options:

  • async: the default option which casts instead of calling
  • only: a list of things to reset, e.g. [ :cache, :hooks ]
    • so only: :hooks would reset hook states but not the cache itself
  • hooks: a list of hooks to reset, if you wish to only reset certain hooks.

Here are a couple of examples:

Cachex.reset(:my_cache) # wipes cache, resets all hooks
Cachex.reset(:my_cache, only: :hooks) # only resets the hooks of a cache, leaves it filled
Cachex.reset(:my_cache, hooks: [ Cachex.Stats ]) # only resets the stats hook

Run through tests and improve comments/descriptions/coverage

The tests are 100% line coverage, but I'm not convinced that all cases are covered. We should go through and improve them as it makes sense to - and make sure to comment exactly what each test is supposed to be covering (sometimes the title is not enough).

All hooks should link to the Cachex supervisor, not the main process

As it currently stands, hooks are started in the main process rather than inside the Cachex Supervisor. This means that hooks can crash without being restarted, causing backlogs in message queues and eventual memory issues.

This needs resolving ASAP as starting a cache from a spawn will have all your hooks die on startup, even though your cache will still be alive.

Allow an option to disable on-demand expiration

In cases where you're not overly concerned about 100% accuracy, it might be ok to skip over checking the TTLs on reads (i.e. leaving everything to the Janitor) in order to speed up reads on the whole (to avoid expired writes and TTL comparisons).

For this case, there should be an option to disable on-demand expirations in reads - naturally it would default to being enabled (and will probably usually be enabled), but it would be a nice to have.

Synchronous hook timeouts should be implemented server side

Right now if there's a hook timeout, we just stop waiting for a reply. This is bad because the server might actually be blocked. We should have the server implement the timeout so it can throw away messages.

This is pretty easy, just spawn out into a Task, await it until the timeout - if it hasn't come back in time, kill it.

Perform a performance pass on each action

There are several places the code is bloated and runs slower than it could/should. Should run over each action and profile to see if there's anything we can touch up. Some of the util functions are bloated.

This isn't super important, but would be nice to optimise where possible.

Unify as many actions in the Worker as possible

Based on commit 81dc6aa it's a pain to fix a bug in N places (Where N is the number of worker sets).

I have previously thought about enforcing very plain actions in the Workers, i.e. read/write, etc. This would allow the main Worker interface to define most of the actual cache-specific actions, and the Workers pretty much just become CRUD ops on a cache.

Not proposing overkill on this, but just the basics of both read/write would be nice to start.

Require the name of a cache as the first argument at startup

It doesn't make sense to have a required argument as an option, so we should adjust Cachex.start_link/1 to be Cachex.start_link/2. We can provide backwards compatibility so it can go in v1.2.

Example:

Cachex.start_link([ name: :test, record_stats: true ])
Cachex.start_link(:test, [ record_stats: true ])

Add the ability to touch a key's write time

This is a prerequisite to having good LRU.

Slightly different to refresh in that we need to calculate any potential TTL deltas to make sure that it still expires at the same time.

For example:

# assume now is 1000ms

set({ "key", 1000, 500, "value" }) # would expire in 500ms

:timer.sleep(100)

touch("key") # this should become { "key", 1100, 400, "value" } so it still expires in 400ms

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.