mailgun / groupcache Goto Github PK

Clone of golang/groupcache with TTL and Item Removal support

License: Apache License 2.0

Go 99.75% Shell 0.25%

groupcache's Introduction

groupcache

A modified version of group cache with support for context.Context, go modules, and explicit key removal and expiration. See the CHANGELOG for a complete list of modifications.

Summary

groupcache is a caching and cache-filling library, intended as a replacement for memcached in many cases.

For API docs and examples, see http://godoc.org/github.com/mailgun/groupcache/v2

Modifications from original library

Support for explicit key removal from a group. Remove() requests are first sent to the peer who owns the key, then the remove request is forwarded to every peer in the groupcache. NOTE: This is a best case design since it is possible a temporary network disruption could occur resulting in remove requests never making it their peers. In practice this scenario is very rare and the system remains very consistent. In case of an inconsistency placing a expiration time on your values will ensure the cluster eventually becomes consistent again.
Support for expired values. SetBytes(), SetProto() and SetString() now accept an optional time.Time which represents a time in the future when the value will expire. If you don't want expiration, pass the zero value for time.Time (for instance, time.Time{}). Expiration is handled by the LRU Cache when a Get() on a key is requested. This means no network coordination of expired values is needed. However this does require that time on all nodes in the cluster is synchronized for consistent expiration of values.
Now always populating the hotcache. A more complex algorithm is unnecessary when the LRU cache will ensure the most used values remain in the cache. The evict code ensures the hotcache never overcrowds the maincache.

Comparing Groupcache to memcached

Like memcached, groupcache:

shards by key to select which peer is responsible for that key

Unlike memcached, groupcache:

does not require running a separate set of servers, thus massively reducing deployment/configuration pain. groupcache is a client library as well as a server. It connects to its own peers.
comes with a cache filling mechanism. Whereas memcached just says "Sorry, cache miss", often resulting in a thundering herd of database (or whatever) loads from an unbounded number of clients (which has resulted in several fun outages), groupcache coordinates cache fills such that only one load in one process of an entire replicated set of processes populates the cache, then multiplexes the loaded value to all callers.
does not support versioned values. If key "foo" is value "bar", key "foo" must always be "bar".

Loading process

In a nutshell, a groupcache lookup of Get("foo") looks like:

(On machine #5 of a set of N machines running the same code)

Is the value of "foo" in local memory because it's super hot? If so, use it.
Is the value of "foo" in local memory because peer #5 (the current peer) is the owner of it? If so, use it.
Amongst all the peers in my set of N, am I the owner of the key "foo"? (e.g. does it consistent hash to 5?) If so, load it. If other callers come in, via the same process or via RPC requests from peers, they block waiting for the load to finish and get the same answer. If not, RPC to the peer that's the owner and get the answer. If the RPC fails, just load it locally (still with local dup suppression).

Example

import (
    "context"
    "fmt"
    "log"
    "time"

    "github.com/mailgun/groupcache/v2"
)

func ExampleUsage() {

    // NOTE: It is important to pass the same peer `http://192.168.1.1:8080` to `NewHTTPPoolOpts`
    // which is provided to `pool.Set()` so the pool can identify which of the peers is our instance.
    // The pool will not operate correctly if it can't identify which peer is our instance.
    
    // Pool keeps track of peers in our cluster and identifies which peer owns a key.
    pool := groupcache.NewHTTPPoolOpts("http://192.168.1.1:8080", &groupcache.HTTPPoolOptions{})

    // Add more peers to the cluster You MUST Ensure our instance is included in this list else
    // determining who owns the key accross the cluster will not be consistent, and the pool won't
    // be able to determine if our instance owns the key.
    pool.Set("http://192.168.1.1:8080", "http://192.168.1.2:8080", "http://192.168.1.3:8080")

    server := http.Server{
        Addr:    "192.168.1.1:8080",
        Handler: pool,
    }

    // Start a HTTP server to listen for peer requests from the groupcache
    go func() {
        log.Printf("Serving....\n")
        if err := server.ListenAndServe(); err != nil {
            log.Fatal(err)
        }
    }()
    defer server.Shutdown(context.Background())

    // Create a new group cache with a max cache size of 3MB
    group := groupcache.NewGroup("users", 3000000, groupcache.GetterFunc(
        func(ctx context.Context, id string, dest groupcache.Sink) error {

            // Returns a protobuf struct `User`
            user, err := fetchUserFromMongo(ctx, id)
            if err != nil {
                return err
            }

            // Set the user in the groupcache to expire after 5 minutes
            return dest.SetProto(&user, time.Now().Add(time.Minute*5))
        },
    ))

    var user User

    ctx, cancel := context.WithTimeout(context.Background(), time.Millisecond*500)
    defer cancel()

    if err := group.Get(ctx, "12345", groupcache.ProtoSink(&user)); err != nil {
        log.Fatal(err)
    }

    fmt.Printf("-- User --\n")
    fmt.Printf("Id: %s\n", user.Id)
    fmt.Printf("Name: %s\n", user.Name)
    fmt.Printf("Age: %d\n", user.Age)
    fmt.Printf("IsSuper: %t\n", user.IsSuper)

    // Remove the key from the groupcache
    if err := group.Remove(ctx, "12345"); err != nil {
        log.Fatal(err)
    }
}

Note

The call to groupcache.NewHTTPPoolOpts() is a bit misleading. NewHTTPPoolOpts() creates a new pool internally within the groupcache package where it is uitilized by any groups created. The pool returned is only a pointer to the internallly registered pool so the caller can update the peers in the pool as needed.

groupcache's People

Contributors

Stargazers

Watchers

groupcache's Issues

[Feature Request]: Redis SetNX() equivalent

Thank you for providing your updated version!

Do you think it is possible to implement a safe Redis SetNX() equivalent (only set(), if the key does not exist)?

How would you achieve this?

Groupcache fails to distribute across peers

I have the following application:

package main

import (...)

var store = map[string]string{}

var group = groupcache.NewGroup("cache1", 64<<20, groupcache.GetterFunc(
	func(ctx groupcache.Context, key string, dest groupcache.Sink) error {

		v, ok := store[key]
		if !ok {
			return fmt.Errorf("key not set")
		} else {
			if err := dest.SetBytes([]byte(v), time.Now().Add(10*time.Minute)); err != nil {
				log.Printf("Failed to set cache value for key '%s' - %v\n", key, err)
				return err
			}
		}

		return nil
	},
))

func main() {
	addr := flag.String("addr", ":8080", "server address")
	peers := flag.String("pool", "http://localhost:8080", "server pool list")
	flag.Parse()

	p := strings.Split(*peers, ",")
	pool := groupcache.NewHTTPPoolOpts(*addr, &groupcache.HTTPPoolOptions{})
	pool.Set(p...)

	http.HandleFunc("/set", func(w http.ResponseWriter, r *http.Request) {
		key := r.FormValue("key")
		value := r.FormValue("value")
		store[key] = value
	})

	http.HandleFunc("/cache", func(w http.ResponseWriter, r *http.Request) {
		key := r.FormValue("key")

		fmt.Printf("Fetching value for key '%s'\n", key)

		ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
		defer cancel()

		var b []byte
		err := group.Get(ctx, key, groupcache.AllocatingByteSliceSink(&b))
		if err != nil {
			http.Error(w, err.Error(), http.StatusNotFound)
			return
		}
		w.Write(b)
		w.Write([]byte{'\n'})
	})

	go func() {
		if err := http.ListenAndServe(*addr, nil); err != nil {
			log.Fatalf("Failed to start HTTP server - %v", err)
		}
	}()

	termChan := make(chan os.Signal)
	signal.Notify(termChan, syscall.SIGINT, syscall.SIGTERM)
	<-termChan
}

I run two instances of this application:

go run main.go -addr=:8081 -pool=http://127.0.0.1:8081,http://127.0.0.1:8082
go run main.go -addr=:8082 -pool=http://127.0.0.1:8081,http://127.0.0.1:8082

Can you explain why I see this behavior?

curl -X GET "localhost:8081/set?key=key1&value=val1"
curl -X GET "localhost:8081/cache?key=key1"
> val1
curl -X GET "localhost:8082/cache?key=key1"
> error "key not set"

but, if I reverse the operations and issue them against the second server I get

curl -X GET "localhost:8082/set?key=key1&value=val1"
curl -X GET "localhost:8082/cache?key=key1"
> val1
curl -X GET "localhost:8081/cache?key=key1"
> val1

Why is this? Shouldn't the group cache return the value previously read by any peer (e.g. the cache is shared by the cache group) so long as no error happened with the dest.SetBytes(..) call?

GetterFunc execute once

But execute every times in your groupcache copy 'GetterFunc' function.

[feature-request] Provide an option to set http client options, like headers or maybe update the http req.

I would like to pass contextual data to the Getter interface Get function via ctx.

For example, using tracing tools like Datadog, the client HTTP headers need to be added before the HTTP request is sent and pulled off when the loading function is run. That could allow complete application visibility:
https://docs.datadoghq.com/tracing/setup_overview/custom_instrumentation/go/#distributed-tracing

It could also allow the caller to pass other context-related information into the fill function. It seems like receiving side is possible by providing a custom Context func:
https://github.com/mailgun/groupcache/blob/master/http.go#L74

However, it does not look like there is any way currently to add onto the HTTP client request:
https://github.com/mailgun/groupcache/blob/master/http.go#L244

What should we do when membership changes ?

When a new server joins or goes offline，how groupcache solve this problem

Question(s)

Hello,

Thank you for making this fork available!

Since there is very little documentation available on groupcache I thought I might get some insight from someone who is actually running it in production.

Basically, I want to know what happens when one or multiple groupcache running nodes goes offline temporarily. Will the groupcache cluster be able to recover and continue operating properly?

Support to Attach to Already Running Server

Hi,

There is already a http server running on a port xxxxx. Is it possible to attach groupcache to the same ?. I see that its sending a request "http://_groupcache//". Should we handle this and call the groupcache get in each node ?. I hope this wouldn't go in a loop. Any help on the same is much appreciated..

rgds
Baljai Kamal Kannadassan

Remove only works with the last peer of the list

This code captures only the last iteration of the for loop. When the goroutines finally run the variable peer has already changed to the last value.

		// Asynchronously clear the key from all hot and main caches of peers
		for _, peer := range g.peers.GetAll() {
			// avoid deleting from owner a second time
			if peer == owner {
				continue
			}

			wg.Add(1)
			go func() {
				errs <- g.removeFromPeer(ctx, peer, key)
				wg.Done()
			}()
		}

You can see the same efect here: https://play.golang.org/p/I82KNfRkqSP
Go vet will also highlight the error:

./prog.go:16:19: loop variable peer captured by func literal

The solution here is easy though. The smallest working change is to pass the peer as an argument to the goroutine: https://play.golang.org/p/0mNV3nSkGUy

Provide example for configuring a Kubernetes pool

It's mentioned here that you could use Kubernetes as a discovery mechanism, but then the example links to a different mailgun project.

It would be nice if there was an example for how to use Kubernetes while integrated with groupcache, not gubernator.

Move global state to explicit var to allow deallocation?

Hi,

I have a use case for, while not restarting the application, bringing down groupcache allocated resources (let the GC release them), then recreate those resources with new configurations.

I see the global state is the obvious roadblock for that scenario.

So I started to sketch a change to allow recreation of groupcache resources. I am keeping existing APIs unchanged and adding some alternative APIs with support for an extra argument for explicit state ("workspace") provided by the caller. Hence the caller would be able to drop references to workspace in order to allow the garbage collector to get rid of its resources.

Draft PR: #65

What do you think? Would this be generally useful for a wider audience?

Cheers,
Everton

Deadlocks after panic inside Getter

If the Getter's Get method panics, the singleflight mechanism is left in an unclean state (missing a Waitgroup.Done() call), which causes all subsequent Gets of the same key to block indefinitely.

MR with a solution suggestion incoming.

value with no expired

Any possible to set a value with no expired time?

Authorization to refactor this library

Hello @thrawn01,

Thanks for the great piece of work of cloning the original version and enhancing it. I would like to know whether I can copy this library, refactor it and publish using the same license, however under a different module name.

getgroup from different main instances will it work ?

` func NewGroupCacheHandler(cfg *viper.Viper) (*GroupCacheHandler, error) {
var (
endpoint = cfg.GetString("storage.kv.groupcache.endpoint")
ports = cfg.GetStringSlice("storage.kv.groupcache.ports")
cacheTTL = cfg.GetInt64("storage.kv.groupcache.cacheTTL")
cacheServers []*http.Server
group *groupcache.Group
)

var kvHandler KVHandler
var getterFunc groupcache.GetterFunc
kvType := cfg.GetString("storage.kv.groupcache.db")
getterFunc = func(ctx context.Context, key string, dest groupcache.Sink) error {

	resp, err := kvHandler.Get(key)
	if err != nil {
		return err
	}
	if resp == nil {
		return fmt.Errorf("key not found in etcd: %s", key)
	}
	jsonResp, _ := json.Marshal(resp)
	dest.SetBytes(jsonResp, time.Now().Add(time.Duration(cacheTTL)*time.Minute))
	return nil

}
switch kvType {
case "etcd":
	etcdHandler, err := NewEtcdHandler(cfg)
	if err != nil {
		return nil, err
	}
	kvHandler = etcdHandler
case "badger":
	badgerHandler, err := NewBadgerHandler(cfg)
	if err != nil {
		return nil, err
	}
	kvHandler = badgerHandler
default:
	return nil, fmt.Errorf("unsupported kvType: %s", kvType)
}

pool := groupcache.NewHTTPPool("")
for _, port := range ports {
	pool.Set("http://" + endpoint + ":" + port)
	server := &http.Server{
		Addr:    endpoint + ":" + port,
		Handler: pool,
	}
	cacheServers = append(cacheServers, server)
	go func(srv *http.Server) {
		log.Printf("Serving cache server %s \n", srv.Addr)
		if err := srv.ListenAndServe(); err != nil {
			log.Fatal(err)
		}
	}(server)
}

group = groupcache.GetGroup("data")
if group == nil {
	fmt.Println("Error getting group from groupcache:")
	group = groupcache.NewGroup("data", 3000000, getterFunc)
}

handler := &GroupCacheHandler{
	KVHandler: kvHandler,
	Group:     group,
	cacheTTL:  cacheTTL,
}

return handler, nil

} I am running multiple main instances that use this function what I need is to be able to reach cache instance from another instance the problem is when am doing getgroup it doesn't work anyone have a solution?

Peer pods autodiscovery within kubernetes cluster

Hi,

I am experimenting with a simple standalone package to automatically keep pod peers up-to-date within a kubernetes cluster:

https://github.com/udhos/kubegroup

Recipe currently looks like this:

groupcachePort := ":5000"

// 1. get my groupcache URL
myURL, errURL = kubegroup.FindMyURL(groupcachePort)

// 2. spawn groupcache peering server
pool := groupcache.NewHTTPPool(myURL)
server := &http.Server{Addr: groupcachePort, Handler: pool}
go func() {
    log.Printf("groupcache server: listening on %s", groupcachePort)
    err := server.ListenAndServe()
    log.Printf("groupcache server: exited: %v", err)
}()

// 3. spawn peering autodiscovery
go kubegroup.UpdatePeers(pool, groupcachePort)

// 4. create groupcache groups, etc: groupOne := groupcache.NewGroup()

Does it look reasonable!?

Deadlock in groupcache lookup

func main() {
                pool := groupcache.NewHTTPPoolOpts("http://x.x.x.x:8380", &groupcache.HTTPPoolOptions{})
                pool.Set("http://y.y.y.y:8280")
                server := http.Server{
                        Addr:    "localhost:8380",
                        Handler: pool,
                }
                go func() {

When give the sameway to the other people y.y.y.y poolset as x.x.x.x it goes into a deadlock and looking up the db if entry not found. If in y.y.y.y I don;t give Set as x.x.x.x things works fine.

Hence precisely

A-> B
B-> A

Above is not valid and we should make sure it doesn't happen this way ?. I mean circular loop ?.

rgds
Kamal

Add the ability to warm the cache

Right now each time we deploy services using groupcache we lose all the data that's warm and it's like we are running a cold start again.

A mechanism to support the warming of the cache of known items would be super convenient!

Prometheus metrics

Hi!

Is there any publicly available code for easily exposing groupcache metrics for Prometheus?

I am sketching up the package below, but maybe I am reinventing the wheel here?!

Please advise!

// Usage example
{
    metricsRoute := "/metrics"
    metricsPort := ":3000"

    log.Printf("starting metrics server at: %s %s", metricsPort, metricsRoute)

    mailgun := mailgun.New(cache)
    labels := map[string]string{
        "app": appName,
    }
    namespace := ""
    collector := groupcache_exporter.NewExporter(namespace, labels, mailgun)

    prometheus.MustRegister(collector)

    go func() {
        http.Handle(metricsRoute, promhttp.Handler())
        log.Fatal(http.ListenAndServe(metricsPort, nil))
    }()
}

Full details: https://github.com/udhos/groupcache_exporter

Support batching multiple `Get()` requests in a single HTTP request

This would work just like mailgun/gubernators PeerClient which would wait a few microseconds and batch requests together in a single request. This technique has dramatically improved throughput of gubernator, I imagine groupcache would also benefit. In production we have seen batch sizes of 1k in a single request when queueing for only 500 Microseconds.

context deadline on specific keys

Hello,
I'm trying to integrate the library into my application everything works good, except times when sometimes key gets "broken" and keep failing with the

"err":"context canceled",
"key":"52",
"level":"error",
"msg":"error retrieving key from peer 'http://1580b064e10c42e48e58193dffe3fe88.app.backend.testing.:80/_groupcache/'",

I'm not sure how to exactly reproduce that, however, it seems like usually happening upon the creation of the new key.
If it failed, then I cannot fetch it anymore, it will keep failing with context canceled.

My code is following:

	s.cache = groupcache.NewGroup("settings", 3000000, groupcache.GetterFunc(
		func(ctx context.Context, key string, dest groupcache.Sink) error {
			ID, err := strconv.ParseUint(key, 10, 64)
			if err != nil {
				return err
			}

			res, err := s.CreateOrGet(ID)
			if err != nil {
				return err
			}
			b, err := json.Marshal(res)
			if err != nil {
				return err
			}

			return dest.SetBytes(b, time.Now().Add(time.Minute*5))
		}))

And later fetching it with the:

	ctx, cancel := context.WithTimeout(context.Background(), time.Second*2)
	defer cancel()

	b := []byte{}
	if err := s.cache.Get(ctx,
		strconv.FormatUint(ID, 10),
		groupcache.AllocatingByteSliceSink(&b),
	); err != nil {
		return nil, err
	}

Where 2 seconds of timeout should be more than enough.
Any idea for possible reasons?

Not sure if this is related, but we are using the DNS discovery and calling pool.Set(...) every 30 seconds.

Upgrade the Protobuf API version

The Protobuf API used in this project is already outdated and cannot be used for Protobuf definitions that are compiled using a recent version of protoc-gen-go which uses this package instead.

Is there any plan to upgrade this? I can create a PR for this as I already upgraded it in my fork although I understand that this may break backward compatibility with the existing mailgun/groupcache/v2 API.

HTTP Error Handling Problem

Proposal

When a requested value is not found in the local or hot cache within groupcache, the system performs a consistent hash on the key to determine which instance owns the requested value. If it determines that the value is owned by the local instance, it calls the GetterFunc() to retrieve the value. If GetterFunc() returns an error during the local call, groupcache propagates that error to the caller of group.Get(), allowing the caller to handle the error appropriately.

However, if groupcache determines that the value is owned by a remote instance, it makes an HTTP call to that instance which invokes GetterFunc() on the remote instance. If the GetterFunc() returns an error during the remote call, groupcache returns an http.StatusInternalServerError error to the caller. When this happens the calling instance of, groupcache logs this error (if a logger is set) and proceeds to fall back to calling GetterFunc() locally in an attempt to retrieve the value.

This situation is suboptimal for a few reasons

In the case where the GetterFunc returns a not found error during a remote HTTP call, it makes no sense for the calling instance to fall back to calling GetterFunc locally which will likely result in also returning not found. Especially if the GetterFunc is retrieving the value from a common database.
The actual error is lost when making remote calls.
If any remote call has an error, then a local call will be made as a result. This could result in the duplicate GetterFunc calls exacerbating the underlying problem which caused the error.

Solution

groupcache should add an ErrNotFound error to the library. This error is provided such that implementors of GetterFunc can return this error to indicate the call failed to find the request the value. Remote calls via HTTP will reflect this error by returning http.StatusNotFound

All other errors will be returned via HTTP will be returned with http.StatusServiceUnavailable instead of the current http.StatusInternalServerError. This differentiates between an internal error (Something is wrong with groupcache internals) and an error returned by the GetterFunc.

The groupcache instance which made the HTTP call will look for these codes and avoid making a local GetterFunc call, and instead will propagate the error back to the caller with either ErrNotFound or ErrRemoteCall which contains the string value of the error returned by GetterFunc.

README example doesn't demonstrate using HTTP Pool

Could the example get updated to actually utilize the HTTP pool?

As it stands now, it doesn't seem like the pool is used at all, but perhaps I'm misunderstanding how this library works.

I'm just not sure how the group could be using the peers when the pool doesn't seem to be referenced anywhere.

Can you update key?

Can you update a key's value directly? Or have to delete the key first, then set "key: new value"?