GithubHelp home page GithubHelp logo

Comments (8)

fubuloubu avatar fubuloubu commented on August 27, 2024

Related to #18

from silverback.

mikeshultz avatar mikeshultz commented on August 27, 2024

Still not an expert on taskiq but learned a decent amount now. This is my understanding:

  • "state" is something stored in memory, unique to that process (runner, worker, whatever). It's useful for setting up things like connections to other services or keeping some kind of state of that process. Storing anything here is ephemeral.
  • "Results backends" are just dumb storage of data blobs returned from the worker handlers (and some job metadata). I believe this is a critical component when runner and worker are split. Looks like the broker would normally store results here, and without configuring it the runner would never receive the results. Results backends can be configured to be permanent stores, but the redis one defaults to "store until read" like a message queue.

Neither of these are especially fitting for our use case. I think that leaves us with only one option; to implement our own state storage for Silverback. That comes with the bonus of ultimate flexibility but I'm left wondering what other use cases it might have when thinking about the design.

My initial thought was that we could use files in ~/.config. I think Ape already leverages this design. However, that requires filesystem persistence and that wouldn't be ideal in containerized services, like ApePay. Redis might be a good option if we're expecting to utilize it for a results backend. Rolling a DB would be major overkill to store a block number cursor.

@fubuloubu I could use your input and future-think on this. Is there other state we might be interested in storing? Other databases or storage options we might leverage for other things in Silverback? Do we want to offer options to users or go for just what we need right now?

from silverback.

mikeshultz avatar mikeshultz commented on August 27, 2024

After writing all that up, I'm kind of leaning towards Redis being required for persistence in Silverback. Maybe remove results backend settings and just offer up Redis connection settings to the user. Then we either have Redis to do what we want with, or the whole configuration is ephemeral.

from silverback.

fubuloubu avatar fubuloubu commented on August 27, 2024

Hmm, so there's a few ways to look at this:

  1. Reading results database to gain metrics from worker execution e.g. see the last successfully processed block. Persistence would be required for this, but instead of redis persistence we could migrate the redis result store into either a memory store in the runner or a more permanent datastore like postgres/sqlite (configurable via sqlalchemy) or Mongo.
  2. The runner itself should implement a REST api to communicate metrics, status, etc. See #39. This result data is critical for filling in metrics for running apps and will be an important part of the Silverback SaaS platform. However, there should be a simple implementation of this, which is what I'm planning to do e.g. the built-in runners host a FastAPI app using stored metrics in the runner implementation.

P.s. the data is read by the runner via the _handle_result

from silverback.

mikeshultz avatar mikeshultz commented on August 27, 2024

Reading results database to gain metrics from worker execution e.g. see the last successfully processed block. Persistence would be required for this, but instead of redis persistence we could migrate the redis result store into either a memory store in the runner or a more permanent datastore like postgres/sqlite (configurable via sqlalchemy) or Mongo.

The results backend isn't going to cut it for us. We can't query it for what we want (e.g. last executed job). There's also a chance that other results backends are more friendly to this (e.g. if there's a postgres one with metadata cols). I'll look today. But I suspect we'll need to move to our own persistence layer.

This result data is critical for filling in metrics for running apps and will be an important part of the Silverback SaaS platform. However, there should be a simple implementation of this, which is what I'm planning to do e.g. the built-in runners host a FastAPI app using stored metrics in the runner implementation.

Does this mean you're implementing the persistence layer or that you would leverage it as well?

from silverback.

mikeshultz avatar mikeshultz commented on August 27, 2024

Here's some roughing I did of what I'm thinking, with some psuedo-code.

We could have some kind of storage setup. There might be something like this already we can use on top of mongo or redis or whatever k/v storage we find most fitting.

class BaseStorage(ABC):
    @abstractmethod
    def get(k: str) -> BaseModel:
        ...

    @abstractmethod
    def store(k: str, v: BaseModel):
        ...

A possible model of silverback instance state:

class RunnerState(BaseModel):
    instance: str # UUID, or tag, or deployment/network name?
    network: str
    block_number: int
    updated: datetime

A rough model of handler results:

class HandlerResult(BaseModel):
    return_value: T
    labels: Dict[str, str]
    execution_time: float
    network: str
    block_number: int
    instance: str

We'd still need something to handle relations. We need a way to query "give me all events for this contract" which this doesn't cover. Maybe we can tag results with block and event data that might allow us to fetch. Or we add a contract model that has a list of event IDs or something. Depends on the storage, really.

Rough example of how it might work in the runner:

    def _checkpoint(self, block_number: int) -> int:
        """Set latest checkpoint block number"""
        if block_number > self.latest_block_number:
            logger.debug(f"Checkpoint block #{self.latest_block_number}")
            self.latest_block_number = block_number

            self._storage.store(
                f"{self.instance}:runner_state",
                RunnerState(
                    instance=self.instance,
                    network=self.network,
                    block_number=result.result.block_number,
                    updated=datetime.utcnow(),
                ),
            )

        return self.latest_block_number
    def _handle_result(self, task_type: str, result: TaskiqResult):
    	store_key: str

    	if task_type == "block":
	        store_key = f"{self.instance}:block:{result.block_number}:result"
    	elif task_type == "event":
    		store_key = f"{self.instance}:event:{result.block_number}:{contract_event.contract.address}:{contract_event.name}result"

        self._storage.store(
            store_key,
            HandlerResult(
                instance=self.instance,
                network=self.network,
                block_number=result.block_number,
                execution_time=result.execution_time,
                labels=result.labels,
                return_value=result.return_value,
            ),
        )

@fubuloubu let me know if this sounds ok to you or if I'm barking up the wrong tree.

from silverback.

mikeshultz avatar mikeshultz commented on August 27, 2024

Beanie looks interesting if we're itching to go the mongo route. Seems it fits well in with FastAPI and Pydantic.

from silverback.

mikeshultz avatar mikeshultz commented on August 27, 2024

Raw draft PR up at #45 for early feedback.

from silverback.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.