massenz / go-statemachine Goto Github PK

View Code? Open in Web Editor NEW

6.0 6.0 0.0 1.06 MB

A basic implementation of a Finite State Machine in Go

License: Apache License 2.0

Go 93.04% Makefile 3.43% Shell 2.43% Dockerfile 1.10%

go-statemachine's People

Contributors

Stargazers

Watchers

go-statemachine's Issues

Enable locking for FSM during updates

Move protos out of this repository

We currently have the Protobuf definitions (and gRPC services' definitions) inside this repository and we generate every time a build is run.

This is not really great practice, and makes overall versioning complicated (and untenable if/when we'll have clients, possibly in other languages) relying on them.

The proto folder needs moving out; api package imported from there (and versioned accordingly) and the api package refactored to remove the ad-hoc functions which depend on the object definitions to live inside the package.

Improve Logging on State Transitions

As someone trying to debug why a transition is not occurring, I'd like to see additional logging of what that state was that it was trying to transition from -> to in addition to the logs including which SM ID and event IDs are involved. For example, unexpected event transition from open to re-open [....

Make imports consistent across packages/files/tests

At the moment, our imports are all over the place for the following packages:

    "github.com/massenz/go-statemachine/api"
    "github.com/massenz/slf4go/logging"
    "github.com/massenz/statemachine-proto/golang/api"

sometimes we use . for the "local" api and protos for the PBs, sometimes we don't - and logging is sometimes log sometimes left as logging.

This needs fixing - let's write down the rules, and refactor the code accordingly.

Missing SQS Queues configured causes panic when setting Log level

This is pretty moot once we move to Zerolog (#97), but it's a weird bug (as we check for nil pointers).

Update key naming strategies to be in line with Redis best practice

we currently just use the id for all the objects stored in Redis - we should instead move to something more like:

configurations#orders:v1
fsm:orders#12345
events#9876-aadfe

this would make the semantics of the object stored clearer.

No external API change would be modified; however, the GET /api/v1/statemachines/{id} should be changed to something like GET /api/v1/statemachines/{type}/{id}:

GET /api/v1/statemachines/orders/12345

As the URI is returned in the Location header after a POST clients could either store that; retain the information (anyway part of the config_id field); or dynamically generate at runtime (as they would know what type of statemachine they are trying to retrieve).

NOTE the version of the Configuration is explicitly excluded from both the key as well as the API URI.

Update the README to reflect the new data model and API

We have now updated the data model (and soon will the API too) so the README needs to be updated.

Automate release

Implement a GH Action which, upon merging to a release branch will generate a tag.

This should also:

create the container and push to Docker Hub
create a new GH Release

Increase test coverage in the `api` package

This currently stands at ~30%, and we should bring it up to >70%:

└─( make cov

ok  	github.com/massenz/go-statemachine/api	  	0.016s	coverage: 30.9% of statements
ok  	github.com/massenz/go-statemachine/grpc	 	0.039s	coverage: 89.1% of statements
ok  	github.com/massenz/go-statemachine/pubsub	0.918s	coverage: 75.6% of statements
ok  	github.com/massenz/go-statemachine/server	0.014s	coverage: 72.4% of statements
ok  	github.com/massenz/go-statemachine/storage	0.013s	coverage: 72.3% of statements

Create gitops config repository

Now that we have automated creation of the container (see #18 ) it would also be a good time to create the necessary Kubernetes specifications to deploy the service (alongside, with dependencies, which may be optionally deployed too).

In keeping with gitops best practice, those should be kept in a separate statemachine-configs repository.

Events processing tests MUST use Redis backing store

Update API to retrieve Event Outcomes

We currently only have enabled the gRPC API to obtain events' outcomes, we need to add the REST API.
We also need to add the /api/v1/events endpoint

Update README to reflect updates

We need a `status` command to use as a heartbeat

Once the HTTP API is removed (#64 ) there won't be an "easy" way to determine the health of the server - this should be replaced by either a gRPC method and/or an alternative way to determine the health of the server (and connectivity, especially to the Redis store).

This is important for when a user wants to deploy SM as a Kubernetes Pod.

Events' outcomes should be stored for ease of retrieval

While the outcome of sending an Event may be inferred indirectly from the current_state of an FSM, this is cumbersome at best.
Errors are also posted on the SQS queue configured (optionally) via the -notifications argument; however, this is also cumbersome (especially for clients using gRPC API)

We should (in addition to, optionally, posting error events on the -notifications queue) also store events' outcomes in Redis, possibly with a (-retention) TTL and add a gRPC API (GetEvent(eventId)) to retrieve outcome.

Consider using a listeners pool

Currently, even if running in coroutines and using channels to distribute the workload, the entire application is (essentially) single-threaded, as we only start one coroutine per task (listener, pub/sub, etc.)

We should explore the possibility of running the workers concurrently - however, this opens up the possibility of race conditions for events which, even if arriving in order from SQS, may become out-of-order (and cause unwanted issues, with transitions being deemed invalid, incorrectly).

It is non-obvious how to "single-thread" per FSM, while keeping independent event stream running concurrently.

Refactor Error handling to use Protobuf natively

Currently error messages are handled inconsistently across various parts of the system (gRPC, REST, SQS Messages).

We should use Protobuf everywhere (and the same one) as well as use it to serialize across SQS Messages (like we do for Events).

Additionally, we should use gRPC error handling and use codes from there (and gRPC calls should return errors consistently)

Test also with TLS enabled for Redis

Redis has a handy script to generate test certs to run tests (see here) and we can enable the docker-compose to run Redis with TLS enabled:

./src/redis-server --tls-port 6379 --port 0 \
    --tls-cert-file ./tests/tls/redis.crt \
    --tls-key-file ./tests/tls/redis.key \
    --tls-ca-cert-file ./tests/tls/ca.crt

As of #47, we can enable our SM-Server to use TLS, so we should also test this functionality.

Implement distributed protocol for multiple SMs

Currently, running multiple SM servers behind a gateway or reverse-proxy, all connected to the same Redis cluster, could cause races and inconsistencies if events for the same FSM are processed out-of-order.

We should at least consider a leader/follower architecture, with the replicas being read-only (possibly having the primary to be event-processing only).
Alternatively, the followers could be standby only, and ready to take over in case of primary failures, but otherwise not processing any request at all.

This needs some careful considerations, and also a clear definition of use-cases and requirements.

Make Configurations immutable

We currently allow the insertion of new configs which could have the same ID of existing ones (and upon which State Machines may be running); this may cause confusion, bugs and other undesirable side-effects.

We should make Configurations immutable and prevent them to be changed; if a POST (or PutConfig) call is received where the ID of the Config already exists, we should return a 409 CONFLICT error.

Adopt ZeroLog

Use Zerolog as A Better Log

Improve Makefile

Currently the build is handled via the Makefile - it should be refactored so that only necessary steps are taken, and it correctly re-builds the server/container if any of the underlying files change

All the GetXxxx in the Store should return `error` not `bool`

As a User I want to specify the max len of events history

we currently store all events forever; we should allow a Configuration to specify how many Events to keep, as well as a TTL.

Consider adopting the Connect protocol

We now only support gRPC protocol, which works fine, but it may be limiting, for clients/users looking to use JSON encoding.

The Connect protocol promises to add support for gRPC-web and straight HTTP protocols, and could conceivably be used to add to our "native" gRPC.

We should explore desirability, feasibility and effort to implement it.

As a User I do not need to be notified of succesful events

we should only have the notification topic used to notify asynchronously of issues with Events - and nothing else really.
Also remove the acks topic, as with the new data model (including event outcomes) it is possible to query the state of the FSM and the event's successful (or otherwise) outcome.

As a User I want Events and Outcomes to only be retained for a configured time

Redis supports a TTL expiration time for entries; we currently do not use it (set it to 0, which means forever).

There should be a default retention period for events (and outcomes) which should further be configurable by the user.

We do not plan to enable on a per-event TTL, although, that would not be unreasonable.

Improve TLS configuration for certs

Currently the location of the TLS cryptographic material (certificate and keys) is in a fixed location (/etc/statemachine/certs) and we only have the -insecure flag to disable TLS.

We should have a more flexible setup - continuing to enable TLS by default and keeping the default location, however we should also allow:

keep the -insecure flag to disable TLS;
add a -tlsCertFile for the X509 cert;
add a -tlsKeyFile for the certificate private key;
add a -tlsCAFile for an (optional) Certificate Authority root file (self-signed certs)

If not configured, the defaults will be:

/etc/statemachine/certs/server.pem for the certificate
/etc/statemachine/certs/server-key.pem for the private key
No CA file (roots as installed by the host OS)

Make the event-model independent of the underlying message bus

We currently support only AWS SQS as the event bus, we should implement support of Kafka, and make the server configuration/startup independent from the actual event framework used.

Deprecate REST API

Fore the 1.0 release we will only support gRPC API and async Events; we need to deprecate the REST API.

Add support for external configuration

We currently configure SM via an extensive set of runtime flags; we should also map (the same options) to an external configuration file (e.g., YAML) which would be used for those values that are not specified at launch via CLI flags.

This would be particularly useful for Kubernetes deployments, as we could use a ConfigMap to abstract the configuration away (and this would also, indirectly, facilitate the use of a Helm chart).

Remove the in-memory store

it was always meant as a "quick & dirty" of running the SM server, but given how easy it is to run a Redis container locally, it makes little sense to keep it around (with the associated double effort of having to implement features & tests).

This should be removed.

Support for Redis running in cluster mode

For resiliency and scalability, adding support for connecting to Redis running in cluster mode would be a great addition

Improve tests by using TestContainers

Testcontainers-go should be used to clean the tests and avoid having to start the supporting services (LocalStack and Redis) manually.

Enable TLS for the gRPC Server

Refactor store to disable arbitrary ID for Configuration

Configuration already has a "natural" ID (GetVersionId()) and this is also what is used by FSM to configure itself; we should not allow arbitrary IDs for configs, as this may cause confusion, and makes the code unnecessarily more complex than it should be.

Change store.PutConfig(id, config) to a simpler store.PutConfig(config)

Use GinkgoWriter instead of muting the logs

Loggers in massenz/slf4go can be redirected to specific writers, and Ginkgo offers a GinkgoWriter that only emits when the test fails, and is otherwise quiet, which is exactly what we want.

Improve the Data Model

we currently only support a very simplistic data model, we want to improve support for queries about Statemachines and Configurations:

We want to be able to support the following queries:

all Configuration names
all Configuration versions for a given name
all FSMs' IDs for a given Configuration
all FSMs' IDs for a given Configuration, in a given state

Consider adding integration tests

We currently only test individual packages/functions; we should add integration tests that exercise the full lifecycle:

create a Configuration
create one (or more) Statemachine for that configuration
send Events
assert that the FSMs transition to the expected states
assert that error Events result in notifications being posted to the correct topic

Use go/mocks to test error conditions and retries with Redis

Currently, handling of a number of error conditions (permanent and/or transient) and the retry logic, cannot be tested, because it is not possible to easily simulate errors during tests with Redis.

Alongside introducing test containers (see #26 ) we should also use go/mocks to inject error conditions in the client in RedisStore.

Add metadata to Events

In general, we expect Events (if carrying information of interest) to be stored elsewhere in the system, with the EventId acting as the "bridge" to retrieve them, as necessary.

However, under certain circumstances, Statemachine's users may not want to use a separate store for events' metadata; there are currently two possible options:

we store the metadata as bytes alongside the Event in the FSM's History;
we store the Event separately as an entity in Redis, and we use the event_id from the FSM's `history' to retrieve the associated Event data.

There are pros/cons to both approaches and they should be both considered before making a decision.

As a User i would like the ability to query SM

We currently only support "point" queries (given the ID, returns the entity)
We would like to be able to conduct queries such as:

give me all configuration versions for "test.orders"
give me all statemachines which are configured with test.orders:v3 and whose state is pending
give me all statemachines which are configured ... and have received an event in the last 24 hours
etc.

I am not sure whether Redis supports range queries (and certainly won't given the current encoding) but we should explore ways to do so.

Most error handling cannot be tested

Error handling for SQS and Redis is not really possible, as it's not possible to "inject" error conditions in externally running containers (and even less, in individual API calls).

We should explore using instead mocks and inject errors this way, to test proper error handling.

Separate output events queue from errors queue

When enabling "events" outbound mode, allow for having those go into a separate queue than the "notifications" one, which can then be dedicated to errors/bad transitions. This is useful so that one can alert on any messages in that queue.

massenz / go-statemachine Goto Github PK

go-statemachine's People

Contributors

Stargazers

Watchers

go-statemachine's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs