I'm grappling with a requirement: We're currently using Heroku for deployment (using C

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

No-FS Version - Pondering / Question about acid-state HOT 11 CLOSED

acid-state commented on May 25, 2024

No-FS Version - Pondering / Question

from acid-state.

Comments (11)

GetContented commented on May 25, 2024 1

For anyone interested finding this in the future, we ended up jettisoning the idea of using SQL and relational databases entirely, and switched to a similar idea to the one that unison lang uses for its entity "storage": immutability, and content-addressable hashing for the identity of types, data and code; then the system can easily copy it around (same meaning as caching) without any problem. This handles "change" very easily because there's no such thing as change.

from acid-state.

dmjio commented on May 25, 2024

With the remote module you can separate your acid-state data store from your web server process.
In regards to not keeping all of the data in RAM, the docs do mention storing keys that refer to on-disk storage, where you could use mmap to retrieve them. So in this case acid-state wouldn't retrieve the data for you, but just store the keys, then you'd have to write the code to mmap from disk.

In regards to sharing web handlers on the client / server. I'd really check out the servant project. They have a way to generate client handlers (in ghcjs) that automatically query the generated API web service.

For long term storage I'd use postgresql (there's many good projects for it). Then if you want, use acid-state as a memory caching mechanism (in place of something like redis). Acid-state doesn't have the ability to expire keys like redis, but it's just as fast. You can also compress data before saving into acid-state.

If you want a different backing for acid-state you could write acid-state-s3. There has been talk about this. I have the ability to do it, but not the time currently.

Also, unless I'm mistaken, acid-state will be undergoing significant changes per a replication backend after gsoc. @lemmih can say more :)

from acid-state.

GetContented commented on May 25, 2024

@dmjio dang... nice reply! :) Thanks.

I discovered the cloud-haskell project a little bit shortly after writing this, which is interesting.

I guess I can use HLINT to load code / new types that I've persisted in my live running system, but what form do I store them in?

The trouble with postgres is that its types aren't Haskell's types, and I have no idea how I'd store code in it (text?) such that I can perform the computation I'll need to perform on it, but without it being in memory.

I also found the whole hadoop set quite interesting... viewing a db as message streams is a step in the right direction, but it feels like overkill for what I want to use it for.

I'll have a bit more of a think about this. Thanks. I wish I knew more about what was possible with transport between front & backend and code / types.

from acid-state.

dmjio commented on May 25, 2024

@GetContented the beauty of acid-state is exactly that, the marshaling code is so natural, since it's language specific serialization. The con is that this language specific (cereal / binary) serialization, locks you into a haskell specific database for good. A data migration later on can potentially be hairy. There has been talk on using a standardized serialization spec, but no action on this AFAIK, but Duncan Coutts might know more.

Postgres is probably the nicest it gets in regards to mapping haskell types to DB types and back. It's far worse in redis, elasticsearch (everything is set as ByteString in redis and returned as ByteString in redis / elasticsearch). There is a project (relational-record w/ HDBC) that will use a quasi quoter to read your database schema (it actually performs I/O during compile time), and will generate the ADTs and the functions to compose queries on these types, and handles the serialization/deserialization for you. So if your schema changes your code changes. But I'm not sure there is thread pool support.

Another thing to know with acid-state is that you have to keep track of your migration history, otherwise you can forget how the on-disk state maps to what you have in the code. This is where a schema is actually a good thing IMO.

In regards to storing code, I would use a free monad. A free monad let's you construct an expression that can be serialized, stored on disk, and later retrieved and evaluated/interpreted by any number of interpreter/evaluators you have defined.. I worked on a project once where financial trading strategies were stored as expressions. So you define a grammar, and a free monad lets you turn this grammar into an expression, and then you can interpret/evaluate (optionally serialize -- since free monad grammar is pure) this expression.

So you can store user defined settings in postgres, then at runtime read these settings and construct a free monad expression from them that lives in memory. From here anything is possible.

Postgres is pretty solid. It's indexing scheme is sophisticated, so it might already be storing frequently used query results in RAM (it uses machine learning for this).

from acid-state.

dmjio commented on May 25, 2024

There's also projects to convert postgres tables into streams (pipes/conduits)

from acid-state.

GetContented commented on May 25, 2024

@dmjio Again, thank you so much for your input and time It's very helpful.

FWIW I'm happy to be locked into Haskell. Until we get some form of data representation that can also express generic algorithms, I think it's as good as we've got so far. (Maybe one could say Scheme, but it's not expressive enough WRT types or other more general things like typeclasses, as far as my understanding goes - the language of expression can't seem to express the inexpressible in that language very easily, fairly obviously. Then one could say what about (JSON, XML, EDN, other), but that doesn't express functionality. How about Ometa/JS or one of its other mother-language-symbiotic variants? well, they require a base language anyway).

I'm a bit unsure how the mashalling of, say, a "Maybe Int", or a "[[String]]" or a "[(Int String)]" goes back and forth to postgres data types. Could you possibly concretize this for me? Sorry if this is a hassle, I'm just having a bit of a tough time trying to understand it. Does it go to text? In which case, how can one sort, search or filter on it?

from acid-state.

dmjio commented on May 25, 2024

So postgresql has a raw binary protocol, but there doesn't (to my knowledge... maybe hasql has this) exist pure Haskell bindings for it. The way most haskell postgresql libs communicate with the postgres server is through FFI bindings to libpq (a C library). (https://hackage.haskell.org/package/postgresql-libpq). It gets tricky in multi-threaded scenarios (because a lot of C code isn't thread-safe), but libpq advertises itself as thread safe (the term is "re-entrant", similar to the concept of referential transparency), this allows us to use haskell green threads with it (check the docs for postgresql-libpq on hackage for more info).

So once we've written FFI bindings to this C library we'd just call them. So what we'd do is convert things like [[String]] to ByteString and use useAsCString to convert ByteString to CString. The invoke the C function. Using the Storable typeclass we can get access to the C heap directly.

If we were to write our own pure haskell bindings we'd write a haskell parser for a binary protocol specification. We'd use attoparsec and bytestring and the network package. (The hedis package does this).

from acid-state.

GetContented commented on May 25, 2024

I don't see how using the raw binary protocol would let you query the data, given that the shape of the data wouldn't be tables of columns of types... but rather, actually, that the shape of it (ie its types) would be using typeclasses that hadn't yet been expressed at the time of writing the FFI binding?

from acid-state.

dmjio commented on May 25, 2024

So at some level we eventually need to send raw bytes which get transmitted over TCP right?

{-# LANGUAGE OverloadedStrings #-}

import Database.PostgreSQL.Simple

hello :: IO Int -- type that FromField instance gets used
hello = do
  conn <- connectPostgreSQL "" -- calls libpq opens up a socket
  [Only i] <- query_ conn "select 2 + 2" -- takes a socket and a `Text` string, submits it over TCP, returns it
  return i

Now the return value needs to be parsed into something meaningful. This is why we have the FromRow & FromField typeclasses. There are default instances for it, for converting haskell types to postgresql types and back.

At some point it's just ByteString -> IO ByteString.The first ByteString is our query, the second is our result, we can use typeclasses with associated type families to relate the encoding/decoding of the query type to the return type. So we really want code that looks like the hello example above. The Query monad along with FromField/ToField, FromRow/ToRow typeclasses are a nicer interface for dealing with the encoding / decoding to ByteString in a type safe way.

from acid-state.

dmjio commented on May 25, 2024

https://hackage.haskell.org/package/postgresql-simple-0.4.10.0/docs/Database-PostgreSQL-Simple-FromRow.html

from acid-state.

GetContented commented on May 25, 2024

Sorry about this... I think, perhaps, I haven't expressed myself clearly enough.

You seem to be explaining encoding/decoding from/to ByteStrings. That's fine if we just want to load all our data upfront, and do all of the querying in memory, but that's not good practice if our server has limited memory constraints (as Heroku does).

I need a lazy loading query-able mechanism but for Haskell data types, and also for loading the data types in advance of this. In other words, the system would start up with nothing much, and load the data types (and therefore also typeclasses) in, then load the root pieces of data and code, and begin serving it, which in turn would load what it needs, etc (lazily).

I currently have a somewhat working system based on this model (running in another language) serving a number of websites under a similar system, which I'm using pre-described types for... but I'd like a way to store types that aren't known at system compile time.

My major problem has been that RDBMS's only let you query (and store) certain data types, so I've only been able to store those data types. This isn't very flexible. So far I haven't found a DBMS that lets you extend the data types.

Hence... why I was asking about ACID-state, because this, coupled with something like HLINT (if I understand it correctly - that it could be possible to live-compile code at runtime) would let me extend the data types based around data stored in the backend.

However, it's not going to work too well for me if the backend isn't able to lazily load this data and code in dynamically...

Again, I really appreciate the time and effort you've taken to respond, and I realise this might sound ridiculous or almost impossible, but my proof of concept so far is incredibly exciting, working really fast, and quite good (we're actually able to build websites much much faster than any other method I've ever used, and we haven't even got types or code in yet, as I've described).

from acid-state.

No-FS Version - Pondering / Question about acid-state HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs