massenz / go-statemachine Goto Github PK
View Code? Open in Web Editor NEWA basic implementation of a Finite State Machine in Go
License: Apache License 2.0
A basic implementation of a Finite State Machine in Go
License: Apache License 2.0
We currently have the Protobuf definitions (and gRPC services' definitions) inside this repository and we generate every time a build is run.
This is not really great practice, and makes overall versioning complicated (and untenable if/when we'll have clients, possibly in other languages) relying on them.
The proto
folder needs moving out; api
package imported from there (and versioned accordingly) and the api
package refactored to remove the ad-hoc functions which depend on the object definitions to live inside the package.
As someone trying to debug why a transition is not occurring, I'd like to see additional logging of what that state was that it was trying to transition from -> to in addition to the logs including which SM ID and event IDs are involved. For example, unexpected event transition from open to re-open [....
At the moment, our imports are all over the place for the following packages:
"github.com/massenz/go-statemachine/api"
"github.com/massenz/slf4go/logging"
"github.com/massenz/statemachine-proto/golang/api"
sometimes we use .
for the "local" api
and protos
for the PBs, sometimes we don't - and logging
is sometimes log
sometimes left as logging
.
This needs fixing - let's write down the rules, and refactor the code accordingly.
This is pretty moot once we move to Zerolog (#97), but it's a weird bug (as we check for nil
pointers).
we currently just use the id
for all the objects stored in Redis - we should instead move to something more like:
configurations#orders:v1
fsm:orders#12345
events#9876-aadfe
this would make the semantics of the object stored clearer.
No external API change would be modified; however, the GET /api/v1/statemachines/{id}
should be changed to something like GET /api/v1/statemachines/{type}/{id}
:
GET /api/v1/statemachines/orders/12345
As the URI is returned in the Location
header after a POST
clients could either store that; retain the information (anyway part of the config_id
field); or dynamically generate at runtime (as they would know what type
of statemachine they are trying to retrieve).
NOTE the version
of the Configuration
is explicitly excluded from both the key
as well as the API URI.
We have now updated the data model (and soon will the API too) so the README needs to be updated.
Implement a GH Action which, upon merging to a release
branch will generate a tag.
This should also:
This currently stands at ~30%, and we should bring it up to >70%:
└─( make cov
ok github.com/massenz/go-statemachine/api 0.016s coverage: 30.9% of statements
ok github.com/massenz/go-statemachine/grpc 0.039s coverage: 89.1% of statements
ok github.com/massenz/go-statemachine/pubsub 0.918s coverage: 75.6% of statements
ok github.com/massenz/go-statemachine/server 0.014s coverage: 72.4% of statements
ok github.com/massenz/go-statemachine/storage 0.013s coverage: 72.3% of statements
Now that we have automated creation of the container (see #18 ) it would also be a good time to create the necessary Kubernetes specifications to deploy the service (alongside, with dependencies, which may be optionally deployed too).
In keeping with gitops
best practice, those should be kept in a separate statemachine-configs
repository.
We currently only have enabled the gRPC API to obtain events' outcomes, we need to add the REST API.
We also need to add the /api/v1/events
endpoint
Once the HTTP API is removed (#64 ) there won't be an "easy" way to determine the health of the server - this should be replaced by either a gRPC method and/or an alternative way to determine the health of the server (and connectivity, especially to the Redis store).
This is important for when a user wants to deploy SM as a Kubernetes Pod.
While the outcome of sending an Event
may be inferred indirectly from the current_state
of an FSM, this is cumbersome at best.
Errors are also posted on the SQS queue configured (optionally) via the -notifications
argument; however, this is also cumbersome (especially for clients using gRPC API)
We should (in addition to, optionally, posting error events on the -notifications
queue) also store events' outcomes in Redis, possibly with a (-retention
) TTL and add a gRPC API (GetEvent(eventId)
) to retrieve outcome.
Currently, even if running in coroutines and using channels to distribute the workload, the entire application is (essentially) single-threaded, as we only start one coroutine per task (listener, pub/sub, etc.)
We should explore the possibility of running the workers concurrently - however, this opens up the possibility of race conditions for events which, even if arriving in order from SQS, may become out-of-order (and cause unwanted issues, with transitions being deemed invalid, incorrectly).
It is non-obvious how to "single-thread" per FSM, while keeping independent event stream running concurrently.
Currently error messages are handled inconsistently across various parts of the system (gRPC, REST, SQS Messages).
We should use Protobuf everywhere (and the same one) as well as use it to serialize across SQS Messages (like we do for Events).
Additionally, we should use gRPC error handling and use codes from there (and gRPC calls should return errors consistently)
Redis has a handy script to generate test certs to run tests (see here) and we can enable the docker-compose
to run Redis with TLS enabled:
./src/redis-server --tls-port 6379 --port 0 \
--tls-cert-file ./tests/tls/redis.crt \
--tls-key-file ./tests/tls/redis.key \
--tls-ca-cert-file ./tests/tls/ca.crt
As of #47, we can enable our SM-Server to use TLS, so we should also test this functionality.
Currently, running multiple SM servers behind a gateway or reverse-proxy, all connected to the same Redis cluster, could cause races and inconsistencies if events for the same FSM are processed out-of-order.
We should at least consider a leader/follower architecture, with the replicas being read-only (possibly having the primary to be event-processing only).
Alternatively, the followers could be standby only, and ready to take over in case of primary failures, but otherwise not processing any request at all.
This needs some careful considerations, and also a clear definition of use-cases and requirements.
We currently allow the insertion of new configs which could have the same ID of existing ones (and upon which State Machines may be running); this may cause confusion, bugs and other undesirable side-effects.
We should make Configurations immutable and prevent them to be changed; if a POST (or PutConfig) call is received where the ID of the Config already exists, we should return a 409 CONFLICT error.
Use Zerolog as A Better Log
Currently the build is handled via the Makefile
- it should be refactored so that only necessary steps are taken, and it correctly re-builds the server/container if any of the underlying files change
we currently store all events forever; we should allow a Configuration
to specify how many Event
s to keep, as well as a TTL.
We now only support gRPC protocol, which works fine, but it may be limiting, for clients/users looking to use JSON encoding.
The Connect protocol promises to add support for gRPC-web and straight HTTP protocols, and could conceivably be used to add to our "native" gRPC.
We should explore desirability, feasibility and effort to implement it.
we should only have the notification
topic used to notify asynchronously of issues with Events
- and nothing else really.
Also remove the acks
topic, as with the new data model (including event outcomes) it is possible to query the state of the FSM and the event's successful (or otherwise) outcome.
Redis supports a TTL expiration time for entries; we currently do not use it (set it to 0, which means forever
).
There should be a default retention
period for events (and outcomes) which should further be configurable by the user.
We do not plan to enable on a per-event TTL, although, that would not be unreasonable.
Currently the location of the TLS cryptographic material (certificate and keys) is in a fixed location (/etc/statemachine/certs
) and we only have the -insecure
flag to disable TLS.
We should have a more flexible setup - continuing to enable TLS by default and keeping the default location, however we should also allow:
-insecure
flag to disable TLS;-tlsCertFile
for the X509 cert;-tlsKeyFile
for the certificate private key;-tlsCAFile
for an (optional) Certificate Authority root file (self-signed certs)If not configured, the defaults will be:
/etc/statemachine/certs/server.pem
for the certificate
/etc/statemachine/certs/server-key.pem
for the private key
No CA file (roots as installed by the host OS)
We currently support only AWS SQS as the event bus, we should implement support of Kafka, and make the server configuration/startup independent from the actual event framework used.
Fore the 1.0
release we will only support gRPC API and async Events; we need to deprecate the REST API.
We currently configure SM via an extensive set of runtime flags; we should also map (the same options) to an external configuration file (e.g., YAML) which would be used for those values that are not specified at launch via CLI flags.
This would be particularly useful for Kubernetes deployments, as we could use a ConfigMap
to abstract the configuration away (and this would also, indirectly, facilitate the use of a Helm chart).
it was always meant as a "quick & dirty" of running the SM server, but given how easy it is to run a Redis container locally, it makes little sense to keep it around (with the associated double effort of having to implement features & tests).
This should be removed.
For resiliency and scalability, adding support for connecting to Redis running in cluster mode would be a great addition
Testcontainers-go should be used to clean the tests and avoid having to start the supporting services (LocalStack and Redis) manually.
Configuration
already has a "natural" ID (GetVersionId()
) and this is also what is used by FSM to configure itself; we should not allow arbitrary IDs for configs, as this may cause confusion, and makes the code unnecessarily more complex than it should be.
Change store.PutConfig(id, config)
to a simpler store.PutConfig(config)
Loggers in massenz/slf4go
can be redirected to specific writers, and Ginkgo offers a GinkgoWriter
that only emits when the test fails, and is otherwise quiet, which is exactly what we want.
we currently only support a very simplistic data model, we want to improve support for queries about Statemachines and Configurations:
We want to be able to support the following queries:
all Configuration
name
s
all Configuration
versions for a given name
all FSMs' IDs for a given Configuration
all FSMs' IDs for a given Configuration
, in a given state
We currently only test individual packages/functions; we should add integration tests that exercise the full lifecycle:
Configuration
Statemachine
for that configurationEvents
Currently, handling of a number of error conditions (permanent and/or transient) and the retry logic, cannot be tested, because it is not possible to easily simulate errors during tests with Redis.
Alongside introducing test containers (see #26 ) we should also use go/mocks to inject error conditions in the client
in RedisStore
.
In general, we expect Events (if carrying information of interest) to be stored elsewhere in the system, with the EventId
acting as the "bridge" to retrieve them, as necessary.
However, under certain circumstances, Statemachine's users may not want to use a separate store for events' metadata; there are currently two possible options:
metadata
as bytes
alongside the Event
in the FSM's History
;event_id
from the FSM's `history' to retrieve the associated Event data.There are pros/cons to both approaches and they should be both considered before making a decision.
We currently only support "point" queries (given the ID, returns the entity)
We would like to be able to conduct queries such as:
"test.orders"
test.orders:v3
and whose state is pending
I am not sure whether Redis supports range queries (and certainly won't given the current encoding) but we should explore ways to do so.
Error handling for SQS and Redis is not really possible, as it's not possible to "inject" error conditions in externally running containers (and even less, in individual API calls).
We should explore using instead mocks and inject errors this way, to test proper error handling.
When enabling "events" outbound mode, allow for having those go into a separate queue than the "notifications" one, which can then be dedicated to errors/bad transitions. This is useful so that one can alert on any messages in that queue.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.