GithubHelp home page GithubHelp logo

acid-state / acid-state Goto Github PK

View Code? Open in Web Editor NEW
289.0 17.0 61.0 546 KB

Add ACID guarantees to any serializable Haskell data structure

Haskell 100.00%
acid-state haskell hackage replication transactional

acid-state's Introduction

acid-state Hackage version acid-state on Stackage Nightly Stackage LTS version Cabal build

Unplug your machine, restart and have your app recover to exactly where it left off. Acid-State spares you the need to deal with all the marshalling, consistency, and configuration headache that you would have if you used an external DBMS for this purpose.

How does it work?

Acid-state does not write your data types to disk every time you change it. It instead keeps a history of all the functions (along with their arguments) that have modified the state. Thus, recreating the state after an unforeseen error is a simple as rerunning the functions in the history log.

Keep in mind that acid-state does not provide schema migrations. If you plan on changing the definition of your data-type during the lifetime of your application (you most likely do), you can either use a fixed schema such as XML or JSON, or you can use safecopy. As of version 0.4, safecopy is the default serialization path but using XML or JSON is still a possibility.

FAQ

  • Will my data still be accessible with future versions of AcidState?
    • Yes, all future versions will be compatible with, or easily upgradeable from, older versions.
  • Is AcidState thread-safe?
    • Yes, using AcidState from multiple threads will only increase performance.
  • Does AcidState work on Windows?
    • Yes, as of version 0.5.1, acid-state works on Windows.
  • Is using Template Haskell (makeAcidic) recommended?
    • Yes.
  • Is it necessary to use Template Haskell?
    • No, all instances and data-types can be declared by hand. See the NoTH examples
  • Can two processes access the same AcidState store?
    • Using Data.Acid.Remote you can create a UnixSocket and one process can communicate with another though a ‘file’ on the file system. Check out examples/Proxy.hs in the github source for acid-state.
  • Can two machines access the same AcidState store?
    • Data.Acid.Remote can do that as well. See RemoteClient / RemoteServer. At the moment, zero security is provided. But that is quite fixable.
  • Does AcidState have a mechanism for interactive queries?
    • Using Data.Acid.Remote, you should be able to fire up ghci and run existing queries interactively. You can then munge that data using normal Haskell functions. Obviously, this is not a total solution. It would be nice to see some generics based stuff added to make this process friendlier and more powerful.
  • My process seems to be consuming a bunch of CPU even though it is idle, why is that?
    • Run your application with +RTS -I0 (which can be done via compile time options in your .cabal). This disables idle-time garbage collection which is almost always the cause of CPU usage while idle in acid-state based apps.
  • How does AcidState deal with the presence of very big data structures which may not fit in memory?
    • There are a few answers to this depending on the situation. One solution is to just buy more RAM. After all, you can buy machines with 1TB of RAM these days. But, obviously, not everyone has that kind of budget. (Interestingly, Facebook keeps something like 90% of their working data in RAM using TBs of memcached servers.)

    • A related solution is to buy more RAM, but spread it across multiple machines using replication/sharding. That area of acid-state is still in development. happstack-state has had a few experimental replication implementations. Lemmih is working on a new approach for acid-state.

    • Another potential solution is to create a special data-structure like IxSet which stores only the keys in RAM, and uses mmap and/or enumerators to transparently read the values from disk without loading them all into RAM at once.

    • You could also do something similar but in a more manual/controlled way where you store parts of your data structure in acid-state, and use another system to store key/value blobs on the disk which you load in explicitly. Allow your app to have very fine control over what, when, and how data gets loads into RAM, flushed to disk, etc. Rather than having a general purpose system try to guess what data should be in RAM.

    • It’s a cost vs speed tradeoff. The only reason anyone would choose to use disk over RAM is because it is cheaper. But it is also 100x slower (or more). Obviously, in many cases much slower is still fast enough. However, acid-state still aims to beat traditional SQL in terms of development and maintenance effort.

Robustness

  • How do I recover a corrupted acid-state database?
    • acid-state provides an acid-state-repair tool. Build it with stack build in the root of this repository, you can then call acid-state-repair from a directory holding a corrupted database to repair it. In most case it should only drop the last entry, presumably a partial write because of a crash, but if the base is heavily corrupted, it may restore the database to an older state. Your database is always saved in .bak files, so that the repair operation is reversible.
  • How well does acid-state deal with errors?
    • List of Error Scenarios.

acid-state's People

Contributors

adamgundry avatar andreasabel avatar aslatter avatar aspiwack avatar bmillwood avatar bsima avatar dag avatar ddssff avatar dimchansky avatar dmjio avatar dougburke avatar edsko avatar fumieval avatar gertcuykens avatar gromakovsky avatar jmcarthur avatar kirstin-rhys avatar lemmih avatar markus1189 avatar meteficha avatar nikita-volkov avatar parsonsmatt avatar ryanglscott avatar schell avatar sdroege avatar sdx23 avatar stepcut avatar utdemir avatar veprbl avatar ysangkok avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

acid-state's Issues

Very slow loading of state?

I have an acid-state db of a string-record key/value store that is several million entries and about 2gb on disk. I checkpointed right after creation, however, the state takes about 20 minutes to load into ram and it seems to consume 11gb during this process (after which it returns to ~2gb).

The record data type is

data R = R {
    a       :: [Maybe (Text)],
    b     :: Maybe (Text),
    c      :: Maybe (Text),
    d     :: Maybe (Text),
    e   :: Maybe (Text),
    f       :: [[(Maybe Text)]],
    g       :: [Maybe (Text)],
    h :: [[(Maybe Text)]],
    i    :: ReleaseId,
    j  :: [Maybe (Text)]
  } deriving (Show,Eq,Generic)

Can I optimize this somehow?

template-haskell-2.10: unbound type variables

With ghc-7.10.1, the module below fails to compile:

{-# LANGUAGE KindSignatures, TemplateHaskell #-}
import Control.Monad.Reader (ask)
import Control.Monad.State (modify)
import Data.Acid (Update, Query, makeAcidic)
import Data.Map as Map (insert, lookup, Map)

putValue :: Ord key => key -> val -> Update (Map key val) ()
putValue key val = modify $ Map.insert key val

lookValue :: Ord key => key -> Query (Map key val) (Maybe val)
lookValue key = ask >>= return . Map.lookup key

$(makeAcidic ''Map ['putValue, 'lookValue])

The messages:

Bug.hs:14:3:
    The exact Name ‘val_a8m3’ is not in scope
      Probable cause: you used a unique Template Haskell name (NameU), 
      perhaps via newName, but did not bind it
      If that's it, then -ddump-splices might be useful

Bug.hs:14:3:
    The exact Name ‘val_a8m3’ is not in scope
      Probable cause: you used a unique Template Haskell name (NameU), 
      perhaps via newName, but did not bind it
      If that's it, then -ddump-splices might be useful

Bug.hs:14:3:
    The RHS of an associated type declaration mentions ‘key_a8m2’
      All such variables must be bound on the LHS

Bug.hs:14:3:
    The exact Name ‘val_a8m5’ is not in scope
      Probable cause: you used a unique Template Haskell name (NameU), 
      perhaps via newName, but did not bind it
      If that's it, then -ddump-splices might be useful

Bug.hs:14:3:
    The RHS of an associated type declaration mentions ‘key_a8m4’
      All such variables must be bound on the LHS

Can't open checkpoint after a program crash

Here is how my database directory looks:

image

After a program crash due to an unrelated bug, my program can no longer open/load the database, giving me this error:

myapp: db/checkpoints-0000000106.log: hGetBuf: invalid argument (Invalid argument)

It may be of note that:

  • I call 'createArchive' right after loading the database for the first time during application boot.
  • I call createSnapshot quite often (every minute or so). The program is a continuously running and loopin data collection/computation application.
  • While the total data is likely well under 1GB, the workload is very update-heavy with thousands of (some large-ish) updates flying around every minute.
  • I've noticed my checkpoints get larger and larger. Aren't they supposed to only contain the data itself without any history?

I'd appreciate any direction as far as handling this situation goes. Unfortunately, I will have to switch to something like redis if we can't be sure of acid-state's reliability under this kind of workload.

Records for configuring Remote

TLDR: Should I use separate types for client and server configuration, or unify the configuration in a shared type?

I'm working on replacing the arguments to acidServer and openRemoteState with data records instead, as this will allow us to provide default configurations and the ability to add new options without necessarily breaking existing code.

However, I'm stuck at a design decision: should there be a single unified record type that's used both for servers and clients, or should there be separate record types?

Single type:

  • More convenient in simple cases: we only need one nullConf-like function, we only need one sharedSecretConf function, and the user only needs to make one configuration instance that they can use both for starting their server and for connecting to it.
  • The function pairs for authentication are unified in a single configuration, reflecting the fact that they should be used together.
  • More complex setups are still possible.

Separate types:

  • Added type safety makes it impossible to set up a configuration intended for clients that simply skips server authentication, and then accidentally using it for configuring the server and ending up running it unprotected. This problem can be avoided with a single type too as long as the user uses the provided functions for generating configurations rather than using the record constructors directly.

  • Even though the function pairs for authentication should be used together, their configuration can in fact differ.

    When using shared secret authentication, the server can be configured to accept a set of secrets, but the client only needs one of them to authenticate. Thus with a unified configuration, we end up having to either provide two functions for generating configuration for shared secret authentication anyway, or a single but weird function Set Secret -> Secret -> Config. It is weird because you can now make a shared configuration with server-side secrets that differ from the one secret the client offers.

    In the case where you only use one secret on the server, a unified configuration would make more sense because you could just provide a single function that simply takes the single secret as an argument.

In my own usage so far, I just use the remote backend to share a state between processes on the same system and as the same user, and as a way to inspect the state of a running server. In this type of use, a single configuration is more convenient and even makes more sense: things like the port number is kept in one place.

However, the fact that sharedSecretCheck takes a set of secrets suggests that acid-state wants to support more complex setups where you have multiple clients connecting from multiple systems. In this type of use, I think separate configurations better reflect reality, at the expense of complicating the simpler cases a bit.

I'm probably over-thinking this but I'm not confident to make design decisions like this for a project that isn't mine. I'm experiencing some cognitive dissonance because I originally wanted to do this in part to simplify the user facing API, but at the same time I think separate types are probably the more "correct" option.

What sayeth you?

CC @stepcut

SerializeError "too few bytes\nFrom:\tdemandInput\n\n"

There is a bug in acid-state which makes acidServer to throw error on client disconnect.

Steps to reproduce:

./RemoteServer&
./RemoteClient
quit

RemoteServer will produce unwanted message:

RemoteServer: SerializeError "too few bytes\nFrom:\tdemandInput\n\n"

The bug was reproduced on:

  • ghc 7.6.3
  • network-2.4.1.2
  • cereal-0.3.5.2
  • acid-state-0.11.4 (also on current git master)

Fails to build with cereal-0.4.0.0

Latest acid-state fails to build with the latest cereal:

Downloading acid-state-0.12.0...
Configuring acid-state-0.12.0...
Building acid-state-0.12.0...
Preprocessing library acid-state-0.12.0...
[ 1 of 15] Compiling Data.Acid.CRC    ( src/Data/Acid/CRC.hs, dist/dist-sandbox-808f8932/build/Data/Acid/CRC.o )
[ 2 of 15] Compiling Paths_acid_state ( dist/dist-sandbox-808f8932/build/autogen/Paths_acid_state.hs, dist/dist-sandbox-808f8932/build/Paths_acid_state.o )
[ 3 of 15] Compiling Data.Acid.Archive ( src/Data/Acid/Archive.hs, dist/dist-sandbox-808f8932/build/Data/Acid/Archive.o )

src/Data/Acid/Archive.hs:65:19:
    Constructor `Serialize.Fail' should have 2 arguments, but has been given 1
    In the pattern: Serialize.Fail msg
    In a case alternative: Serialize.Fail msg -> Fail msg
    In the expression:
      case result of {
        Serialize.Done entry rest
          | Strict.null rest -> Next entry (worker more)
          | otherwise -> Next entry (worker (rest : more))
        Serialize.Fail msg -> Fail msg
        Serialize.Partial cont
          -> case more of {
               [] -> check (cont Strict.empty) []
               (x : xs) -> check (cont x) xs } }
Failed to install acid-state-0.12.0
cabal: Error: some packages failed to install:
acid-state-0.12.0 failed during the building phase. The exception was:
ExitFailure 1

Provide function to close the current events log, create a new checkpoint and start a new events log

Currently the events log seems to be appended forever until the local state is closed, and only then a new event log is created the next time the state is loaded again.

It would be useful if there was something like createCheckpoint but which additionally would also start a new events log. It seems like createCheckpoint only creates the checkpoint while the old events log is still used and growing further.

My problem here is that the events log can become quite big, while the actual living dataset is rather small all the time.

Better support for throwing exceptions

At the moment acid-state uses

newtype Update st a = Update { unUpdate :: State st a }
    deriving (Monad, Functor, MonadState st)

newtype Query st a  = Query { unQuery :: Reader st a }
    deriving (Monad, Functor, MonadReader st)

Neither of these monads is particularly suitable for throwing exceptions. The only possibility is to use fail, but neither the Reader monad nor the (lazy) State monad is strict enough so that in

foo = do
    fail "The query will continue perfectly fine after this fail"
    restOfQuery

does what one would expect (restOfQuery gets executed). Ideally, there would be explicit support for exceptions (ErrorT, perhaps), with the expected strictness properties.

Doesn't build with GHC 7.8.2

When I try to build into a clean sandbox with 7.8.2, I get:

Resolving dependencies...
Configuring acid-state-0.12.1...
Building acid-state-0.12.1...
Preprocessing library acid-state-0.12.1...
[ 1 of 15] Compiling Data.Acid.CRC    ( src/Data/Acid/CRC.hs, dist/dist-sandbox-e493cd96/build/Data/Acid/CRC.o )
[ 2 of 15] Compiling Paths_acid_state ( dist/dist-sandbox-e493cd96/build/autogen/Paths_acid_state.hs, dist/dist-sandbox-e493cd96/build/Paths_acid_state.o )
[ 3 of 15] Compiling Data.Acid.Archive ( src/Data/Acid/Archive.hs, dist/dist-sandbox-e493cd96/build/Data/Acid/Archive.o )
[ 4 of 15] Compiling FileIO           ( src-unix/FileIO.hs, dist/dist-sandbox-e493cd96/build/FileIO.o )
[ 5 of 15] Compiling Data.Acid.Core   ( src/Data/Acid/Core.hs, dist/dist-sandbox-e493cd96/build/Data/Acid/Core.o )
[ 6 of 15] Compiling Data.Acid.Log    ( src/Data/Acid/Log.hs, dist/dist-sandbox-e493cd96/build/Data/Acid/Log.o )
[ 7 of 15] Compiling Data.Acid.Common ( src/Data/Acid/Common.hs, dist/dist-sandbox-e493cd96/build/Data/Acid/Common.o )
[ 8 of 15] Compiling Data.Acid.Abstract ( src/Data/Acid/Abstract.hs, dist/dist-sandbox-e493cd96/build/Data/Acid/Abstract.o )

src/Data/Acid/Abstract.hs:24:15: Warning:
    In the use of type constructor or class ‘Typeable1’
    (imported from Data.Typeable, but defined in Data.Typeable.Internal):
    Deprecated: "renamed to 'Typeable'"

src/Data/Acid/Abstract.hs:106:15: Warning:
    In the use of type constructor or class ‘Typeable1’
    (imported from Data.Typeable, but defined in Data.Typeable.Internal):
    Deprecated: "renamed to 'Typeable'"

src/Data/Acid/Abstract.hs:109:13: Warning:
    In the use of type constructor or class ‘Typeable1’
    (imported from Data.Typeable, but defined in Data.Typeable.Internal):
    Deprecated: "renamed to 'Typeable'"
[ 9 of 15] Compiling Data.Acid.TemplateHaskell ( src/Data/Acid/TemplateHaskell.hs, dist/dist-sandbox-e493cd96/build/Data/Acid/TemplateHaskell.o )

src/Data/Acid/TemplateHaskell.hs:260:22:
    Couldn't match expected type ‘m0 Type -> DecQ’
                with actual type ‘Q Dec’
    The function ‘tySynInstD’ is applied to three arguments,
    but its type ‘Name -> TySynEqnQ -> DecQ’ has only two
    In the expression:
      tySynInstD ''MethodResult [structType] (return resultType)
    In the third argument of ‘instanceD’, namely
      ‘[tySynInstD ''MethodResult [structType] (return resultType),
        tySynInstD ''MethodState [structType] (return stateType)]’

src/Data/Acid/TemplateHaskell.hs:260:48:
    Couldn't match type ‘[TypeQ]’ with ‘Q TySynEqn’
    Expected type: TySynEqnQ
      Actual type: [TypeQ]
    In the second argument of ‘tySynInstD’, namely ‘[structType]’
    In the expression:
      tySynInstD ''MethodResult [structType] (return resultType)
    In the third argument of ‘instanceD’, namely
      ‘[tySynInstD ''MethodResult [structType] (return resultType),
        tySynInstD ''MethodState [structType] (return stateType)]’

src/Data/Acid/TemplateHaskell.hs:261:22:
    Couldn't match expected type ‘m1 Type -> DecQ’
                with actual type ‘Q Dec’
    The function ‘tySynInstD’ is applied to three arguments,
    but its type ‘Name -> TySynEqnQ -> DecQ’ has only two
    In the expression:
      tySynInstD ''MethodState [structType] (return stateType)
    In the third argument of ‘instanceD’, namely
      ‘[tySynInstD ''MethodResult [structType] (return resultType),
        tySynInstD ''MethodState [structType] (return stateType)]’

src/Data/Acid/TemplateHaskell.hs:261:48:
    Couldn't match type ‘[TypeQ]’ with ‘Q TySynEqn’
    Expected type: TySynEqnQ
      Actual type: [TypeQ]
    In the second argument of ‘tySynInstD’, namely ‘[structType]’
    In the expression:
      tySynInstD ''MethodState [structType] (return stateType)
    In the third argument of ‘instanceD’, namely
      ‘[tySynInstD ''MethodResult [structType] (return resultType),
        tySynInstD ''MethodState [structType] (return stateType)]’
Failed to install acid-state-0.12.1

cereal version bump

just an FYI: cereal just had a major bug fix for the Alternative instance with lazy state. I suggest a version bump to cereal 0.4.1 for the next acid-state release.

Connect Windows client to Linux Server

I have a remote acid-state server served via a linux box over TLS. I'm successfully connecting a linux client to this remote server, but with a windows client I get an interesting error.

When the binary is run it simply hangs. After shutting down the server I receive a thread block error. This seems strange since I only have one client connecting. I'm using cygwin which emulates unix sockets. Any ideas?

 ./dist/dist-sandbox-9589282d/build/worker/worker.exe
Initializing client
worker.exe: user error (Pattern match failure in do expression at src\Data\Acid\Remote.hs:507:20-30)
worker.exe: thread blocked indefinitely in an MVar operation

Line 507 is the line that begins with forkIO

scheduleRemoteUpdate :: UpdateEvent event => RemoteState (EventState event) -> event -> IO (MVar (EventResult event))
scheduleRemoteUpdate (RemoteState fn _shutdown) event
  = do let encoded = runPutLazy (safePut event)
       parsed <- newEmptyMVar
       respRef <- fn (RunUpdate (methodTag event, encoded))
       forkIO $ do Result resp <- takeMVar respRef
                   putMVar parsed (case runGetLazyFix safeGet resp of
                                      Left msg -> error msg
                                      Right result -> result)
       return parsed

I have opened to proper ports on both instances for communication, and believe I have configured windows inbound and outbound traffic correctly.

Make acid-state useable as a mobile/web app state synchronization library

Consider the problem of a distributed chat system:

A accounts/users, C client connections, X chat channels.

The state in the X chat channels is large, and we don't want to keep it all on the clients. For clients that have multiple connections, we want the account state to be in sync.

Now imagine that we use event sourcing as an architecture. Thus each client will send mutation events to a server for processing.

This is quite similar to the acid-state remote setup, and without knowing too much about acid-state, there might be some differences:

  1. For each update received by a server, multiple (remote client) states should be updated, and some async queueing to handle how to do the updates for partially connected clients.
  2. A client might want to choose between replaying updates, receiving a serialized copy of the full state, or doing a remote query, in order to access the current state.
  3. It seems like the machinery to serialize queries, data structures, and updates is already in acid-state.

Some problems:

  1. acid-state depends on network which is incompatible with GHCJS. We are using GHCJS extensively for mobile development, and this makes acid-state unsuitable.
  2. I am not sure if I want to use acid-state as a storage mechanism on the server itself - maybe I want to use postgres or something else. I don't know if there is a nice way of transforming acid-state updates to postgres updates.

Provide lens compatible typeclasses.

(This is a followup to my question on SO: http://stackoverflow.com/questions/20686215/how-to-zoom-in-acid-state )

It would be great if you could provide instances to make acid-states Update and Query Monads compatible with the lens ecosystem. Specifically the Zoomed instance, but there are probably other useful things in there.

J Abrahamson (on SO) pointed out that it is not possible to do this without exposing some of acid-states internals or having it depend on lens (which should be prevented). Maybe you could cooperate with the lens team to find a better solution?

No error when removing field from data structure without migration

Hi,

I have the following code:

data File = File {
    fileName     :: String
  , filePath     :: FilePath
  , fileContents :: ByteString
}
$(deriveSafeCopy 0 'base ''File)

data Database = Database [File]
$(deriveSafeCopy 0 'base ''Database)

addFile :: File -> Update Database ()
addFile file = do
    Database files <- get
    put $ Database (file:files)

viewFiles :: Int -> Query Database [File]
viewFiles limit = do
    Database files <- ask
    return $ take limit files

$(makeAcidic ''Database ['addFile, 'viewFiles]) 

I did some tests and in one test I removed the fileName field. Then I ran a ViewFiles 10 query and no error showed up, but the resulting data was incorrect.

For example:
I inserted File "a" "b" "c" and after removing the field I got Field "a" "\NUL" back.

It was also possible to add new Files to the state. The new Files were ok when retrieved by a query. (A migration solved the problem too, of course).

Is this behavior intentional or a bug? I would expect an error, because the structure doesn't match the persisted structure.

Questions on remote state

Lemmih,

I'm trying to use acid-state remotely. In the docs it says,

"...when working remotely, the entire result will be serialized and sent to the remote client. Hence, it is good practice to create queries and updates that will only return the required data."

I'm a little worried that the queries/updates I have will attempt to send the entire state across the wire. Wanted to run my scenario by you to see if this was the case.

I have a client machine that is connecting to an acid-state server remotely to update a record. The state contains a lot of other data that could get large, like maps, hash-maps, etc.

The client will call the function (updateJobByUserId) against a remote state. Since getJobsByUserId is a query, I'm assuming the data (Maybe Jobs) will be serialized on the server and sent back to the client, which is good, not much data. What I'm not sure about is the use of " Backend a b c d e jmap q <- get."

Since the client is calling get, will this send the entire state from the remote acid-server to the client, perform the update on the client, then send it back to the server to update the log?

updateJobByUserId :: UserId -> JobId -> Update State ()
updateJobByUserId uid jib time = do
           jobs <- liftQuery $ getJobsByUserId uid
           case jobs of
             Nothing -> return () 
             Just joblist -> do
                let j = find ((==jib) . jobId) joblist
                forM_ j $ \job ->
                     jobs' = job { status = Success
                                      , finished = Just time } 
                                      : delete job joblist
                      -- does 'get' here grab the whole state from the remote server?
                      db <- get 
                      put $ (db & userJobs) %~ M.insert uid jobs'

getJobsByUserId :: UserId -> Query Backend (Maybe Jobs)
getJobsByUserId uid = do
  Backend{..} <- ask
  return $ M.lookup uid _jobs

If it's not clear what I'm saying, I'll try to elaborate.

RFE: openLocalStateIO

One initializes acid by opening up a checkpoint of the state, and providing an initial value if there's no checkpoint, with

openLocalState :: Sane st => st -> IO (AcidState st)

This works well for applications with a sensible static default state; but in general programs may want to do some computation to produce an initial state. In particular, for random number generator seeding, one often has to do some IO to pick a suitable initial seed. One would like to avoid doing this IO unless it's necessary (in the case of seeding a PRNG the call may block until enough entropy is available, for example, and in any case consumes a limited resource). Something like the following is already possible:

main = do
    seed <- unsafeInterleaveIO pickASeed
    acid <- openLocalState seed
    -- etc.

However, this is a bit awkward; uses of unsafe* are naturally a point of contention. A more ideal solution would be to have an initializer with a type like

openLocalStateIO :: Sane st => IO st -> IO (AcidState st)

available in the acid-state API, which promises to execute its argument only when the state can't be found on disk. IO-variants of other similar functions might be welcome for other users, too.

seize.it is down - deadlinks

Seemingly seize.it is down, so both the homepage link and the examples reference in Hackage don't work anymore.

Support an operation to compact the history without writing an archive

This would be semantically equivalent to createArchive, but instead of storing the archive on the disk, just delete it.

I have a pathological case where my working set is a few kB but my archive folder is several gigabytes. I could include my own removeDirectory (xapianDir </> "Archive") call but it feels hacky and arbitrary. Can't acid-state do this as well?

In fact, why doesn't acid-state do this out of the box? It feels like acid-state is designed to fail eventually by running the host out of disk space for history from years ago.

Decouple makeAcidic calls while retaining single on-disk state

A call to makeAcidic will generate acid events for a given state. I have 82 functions querying/updating 30 fields. I'd like to keep it all in one state for simplicity but would like to separate the calls out on makeAcidic across functions to reduce code smell. Instead of having them in a large 82 line long function.

$(makeAcidic ''Backend [ -- * Auth
                         'saveAuthUser
                       , 'byUserId
                       , 'destroyU
                       , 'allLogins
                       , 'byLogin
                       , 'byRememberToken
                       , 'getUserByResetToken 
                       -- .... and on and on

Instead have something like the following:

$(makeAcidic ''Backend [ 'saveAuthUser ])
$(makeAcidic ''Backend [ 'byUserId ])

That could be split into separate files (w/o creating multiple states). I know this would create duplicate instances for Backend (which is invalid haskell), is my only option manually writing out the instances?

No-FS Version - Pondering / Question

I'm grappling with a requirement: We're currently using Heroku for deployment (using CLJS on front end, RoR on backend - groan).

I'd like very much to switch to Haskell and use ACID-state to store data in our application on the back end.

I'd also like a way to communicate typed data between the backend and the frontend. I'd like them both to be written in Haskell (ie GHC Haskell on server, GHCJS JS on client).

Would it be possible to separate the marshalling code such that a different backend than the local filesystem could be used?

Could we then also possibly use that marshalling mechanism as a transport for the typed information to the front end?

(I'd like to keep my data separate from my source code, and Heroku blows the image of the app away after each deployment and once or twice a day, anyway, so I can't use the local FS as storage).

Ideally I'd also like to use ACID-state as a caching mechanism at two points for my typed data, but not have the entire data set in memory all the time.

If ACID-state was pulled apart in this way, would it be possible to inject something like Apache Samza in between the marshalling / demarshalling so we could use it as an efficient transport?

I'd like to cache and stream the data at various points (ie composible pub/sub streams, small processed, cached chunks, but in Haskell Typed data rather than generic messages, JSON or text - ideally I'd like the ability to transfer & store code, too). The larger "big data" solutions seem like massive overkill here, but it doesn't seem like there's any good way to store actual Haskell Typed data other than ACID-state, and that only uses the local filesystem.

Does ACID-state allow the storing of code? I'd obviously have to use something like HLINT to eval any code I had stored... for things like specifying types not known at compile-time.

I realise you'll probably "shoot me down" here... but I just wondered about the separation.

UPDATE: Since asking this, I realised that there is https://github.com/GaloisInc/cereal - but I'm not really sure how one would go about organising a transport or adjusting so that ACID-state uses a different backing store.

<interactive>: SerializeError "too few bytes\nFrom:\tdemandInput\n\n"

Why do I get the following error using openRemoteState? <interactive>: SerializeError "too few bytes\nFrom:\tdemandInput\n\n"

{-# LANGUAGE OverloadedStrings #-}
import Control.Exception     ( bracket )
import Data.Acid             ( AcidState, createCheckpoint, closeAcidState )
import Data.Acid.Advanced    ( query', update' )
import Data.Acid.Local       ( createArchive, openLocalState )
import Data.Acid.Remote      ( openRemoteState, sharedSecretPerform )
import Data.ByteString.Char8 ( pack )
import Data.Map              ( empty )
import Network               ( PortID(PortNumber) )
import Table                 ( UserMap(..), User(..), InsertKey(..), LookupKey(..))

main :: IO ()
main = do
    --acid <- openLocalState $ UserMap empty
    acid <- openRemoteState (sharedSecretPerform $ pack "12345") "localhost" (PortNumber 8080) :: IO (AcidState UserMap)
    --_ <- update' acid (InsertKey "123" (User "" "" "" ""))
    --_ <- query' acid (LookupKey "123")
    createCheckpoint acid
    closeAcidState acid
import Control.Exception     ( bracket )
import Data.Acid             ( AcidState, closeAcidState )
import Data.Acid.Local       ( openLocalState )
import Data.Acid.Remote      ( acidServer, sharedSecretCheck )
import Data.ByteString.Char8 ( pack )
import Data.Map              ( empty )
import Data.Set              ( singleton )
import Network               ( PortID(PortNumber) )
import Table                 ( UserMap(..) )

openAcidState :: IO (AcidState UserMap)
openAcidState = openLocalState $ UserMap empty

runAcidState :: AcidState UserMap -> IO ()
runAcidState = acidServer (sharedSecretCheck (singleton $ pack "12345")) (PortNumber 8080)

main :: IO ()
main = bracket openAcidState closeAcidState runAcidState
{-# LANGUAGE DeriveDataTypeable, TypeFamilies, TemplateHaskell#-}
module Table where
import Control.Lens ((?=), at, from, makeIso, view)
import Data.Acid (Update, Query, makeAcidic)
import Data.SafeCopy (deriveSafeCopy, base)
import Data.Text (Text)
import Data.Typeable (Typeable)
import qualified Data.Map as Map (Map)

data User = User {city::Text
                 ,country::Text
                 ,phone::Text
                 ,email::Text} deriving (Show, Typeable)

$(deriveSafeCopy 0 'base ''User)

newtype UserMap = UserMap (Map.Map String User) deriving (Show, Typeable)

$(deriveSafeCopy 0 'base ''UserMap)

$(makeIso ''UserMap)

insertKey :: String -> User -> Update UserMap ()
insertKey k v = (from userMap.at k) ?= v

lookupKey :: String -> Query UserMap (Maybe User)
lookupKey k = view (from userMap.at k)

$(makeAcidic ''UserMap ['insertKey, 'lookupKey])

Example of Memory.Pure

I'm writing some tests of something that uses acid-state's Memory.Pure variant. The interface is different enough that I've gotten lost. I was wondering if someone could write up a quick example for the examples dir that shows how to use it.

acid-state disk space usage

% ./StressTest query
State value: 0
% du -hs state
 12K    state
% ./StressTest poke
Issuing 100k transactions... Done
./StressTest poke  5,13s user 0,10s system 99% cpu 5,267 total
% du -hs state
5,4M    state
% ./StressTest query
State value: 100000
% ./StressTest checkpoint
% du -hs state
5,4M    state
% ./StressTest poke
Issuing 100k transactions... Done
./StressTest poke  5,14s user 0,10s system 99% cpu 5,277 total
% du -hs state
 11M    state

Is this me or disk space grows linearly with the operation count? There should be some operation that squashes all previous events leaving only resulting value. It looks like createCheckpoint should be doing this.

  • ghc 7.6.3
  • cereal-0.3.5.2
  • acid-state-0.11.4 (also on current git master)

BlockedIndefinitelyOnMVar on GHC 7.8

(I know I may be asking a bit too much on this issue, please let me know if I should close it and look for help elsewhere.)

I'm setting up Travis-CI for the yesodweb/serversession repo and I just got this:

Failures:

  1) AcidStorage on local storage stress test: one 100 MiB value

       uncaught exception: BlockedIndefinitelyOnMVar (thread blocked indefinitely in an MVar operation)

  2) AcidStorage on local storage stress test: one 1 MiB key

       uncaught exception: BlockedIndefinitelyOnMVar (thread blocked indefinitely in an MVar operation)

  3) AcidStorage on local storage stress test: key with all possible Unicode code points and value with all possible byte values

       uncaught exception: BlockedIndefinitelyOnMVar (thread blocked indefinitely in an MVar operation)

(Source: https://travis-ci.org/yesodweb/serversession/jobs/64816620#L1305)

These tests worked fine for acid-state using the Memory storage. These also work fine on my machine. I've ran this test many times already and I've never seen BlockedIndefinitelyOnMVar. Also, I've tried googling "acid-state BlockedIndefinitelyOnMVar" and didn't find anything relevant.

So, any thoughts about what could be causing this? Anything I could do to help debug it?

*** Exception: Data.Acid: Invalid subtype cast: RemoteState -> LocalState

Can you make createArchive also work from remote for easy maintenance without cluttering the server code please?

import Control.Exception     ( bracket )
import Data.Acid             ( AcidState, createCheckpoint, closeAcidState )
import Data.Acid.Local       ( createArchive, openLocalState )
import Data.Acid.Remote      ( openRemoteState, sharedSecretPerform )
import Data.ByteString.Char8 ( pack )
import Data.Map              ( empty )
import Network               ( PortID(PortNumber) )
import Table                 ( UserMap(..) )

main :: IO ()
main = do
    --acid <- openLocalState $ UserMap empty
    acid <- openRemoteState (sharedSecretPerform $ pack "12345") "localhost" (PortNumber 8080) :: IO (AcidState UserMap)
    createCheckpoint acid
    createArchive acid
    closeAcidState acid

acid-state may not restart if killed with kill 9-

Acid state uses a a PID file to ensure being a singleton. Unfortunately,the following sequence does not allow for an automatic recovery:

  • kill the process instance with kill -9
  • have another, unrelated process take the same process ID while the program using acid-state is down
  • restart the program using acid-state

(to simulate this, one can manually modify the PID in the lock file to usurp the one of an existing, unrelated process)

The occurrence of this sequence of events is unfortunately high at machine reboot: it happened to us in production during a disaster recovery test that simulates a power outage.

A possibility would be to use a unix socket rather than a PID file, or alternatively to use flock(2). This seems to be possible, since the win32 and linux implementations are already different, but I did not checked in much details.

Some discussion around this topic can be found here:

https://stackoverflow.com/questions/220525/ensure-a-single-instance-of-an-application-in-linux/221159
https://stackoverflow.com/questions/5339200/how-to-create-a-single-instance-application-in-c-or-c

Unexpectedly using the initial state for an existing local disk state component

The behaviour of openLocalState is such that if there is existing on-disk state but no checkpoint yet, then it replays the events on top of the initial state passed to this call of openLocalState, rather than on top of the initial state used to create the on-disk state originally.

This means that if you use code like:

now <- getCurrentTime
db <- openLocalState (initDB now)

then we get (arguably) wrong results. The user expects that the initial state is only used the first time when the database is created for the first time. After that we already have a db and it really should use the saved state. If you don't look carefully it looks like data loss, as if we're reverting to the initial state.

Given the mechanism that acid-state's Local driver uses, I think the best fix would be to write the initial state into that first checkpoints-0000000000.log rather than having it as empty.

Performance tuning via reduced durability

I have some test code that right now that is taking 582s of wall time and 51s of CPU time to complete. If I completely remove fsync calls by editing the Unix FileIO, I get 16s of wall time and 35s of CPU time. That's a 36-fold improvement to wall time!

I'm not suggesting we should not call fsync as that would rename the package to aci-state. However, many DBMSs provide knobs to adjust performance vs durability. For example:

  • SQLite provides three different compromises.
  • Redis allows you to never fsync, to fsync every second or fsync every query.
  • PostgreSQL has many knobs. You can disable fsync entirely. You can keep fsync but return query results before fsync completes.

I'm not proposing anything concrete that should be done, I'd just like to start a conversation about possible tradeoffs.

Configurable serialisation layer?

For a client project I've been looking at the feasibility of replacing the use of safecopy in acid-state with an alternative serialisation layer, based on the cborg/serialise packages. (We use an external approach to data migration as described in the api-tools package.)

I think the least disruptive way to do this would be to replace the uses of SafeCopy with a new class for encoding/decoding with default implementations that use the SafeCopy equivalents. For example, Method might change from

class ( Typeable ev, SafeCopy ev
      , Typeable (MethodResult ev), SafeCopy (MethodResult ev)) =>
      Method ev where

to something like

class Serialisable a where
    encode :: a -> Lazy.ByteString
    default encode :: SafeCopy a => a -> Lazy.ByteString
    encode = runPutLazy . safePut

    decode :: Lazy.ByteString -> Either String a
    default decode :: SafeCopy a => Lazy.ByteString -> Either String a
    decode = runGetLazy safeGet

class ( Typeable ev, Serialisable ev
      , Typeable (MethodResult ev), Serialisable (MethodResult ev)) =>
      Method ev where

This means that in the common case, the only change required to client code would be adding (or generating using TH) empty instances of Serialisable.

Does this approach sound plausible? Would a PR doing this be of interest?

Can't migrate when you have 2 consecutive fields of the same type and then one of them changes type

Take the following program:

{-# LANGUAGE
TemplateHaskell,
TypeFamilies,
OverloadedStrings,
ScopedTypeVariables
  #-}

import Data.SafeCopy
import Data.Acid

newtype X = X String

data Foo = Foo X X

deriveSafeCopy 0 'base ''X
deriveSafeCopy 0 'base ''Foo

makeAcidic ''Foo []

main = do
  db :: AcidState Foo <- openLocalStateFrom "acid-hs/" (Foo (X "a") (X "b"))
  putStrLn "loaded the database successfully"
  createCheckpoint db
  putStrLn "created checkpoint"
  closeAcidState db
  putStrLn "closed db"

Introduce a type Y and try to go from Foo X X to Foo X Y:

{-# LANGUAGE
TemplateHaskell,
TypeFamilies,
OverloadedStrings,
ScopedTypeVariables
  #-}

import Data.SafeCopy
import Data.Acid

newtype X = X String
newtype Y = Y String               -- new

data Foo = Foo X Y                 -- was Foo X X

deriveSafeCopy 0 'base ''X
deriveSafeCopy 1 'extension ''Y
deriveSafeCopy 0 'base ''Foo

instance Migrate Y where           -- migration from X to Y
  type MigrateFrom Y = X
  migrate (X a) = Y a

makeAcidic ''Foo []

main = do
  db :: AcidState Foo <- openLocalStateFrom "acid-hs/" undefined
  putStrLn "loaded the database successfully"
  createCheckpoint db
  putStrLn "created checkpoint"
  closeAcidState db
  putStrLn "closed db"

Try to run the 1st version and then load the checkpoint in the 2nd version:

$ runghc acid1.hs
loaded the database successfully
created checkpoint
closed db

$ runghc acid2.hs
acid2.hs: Could not parse saved checkpoint due to the following error: 
Failed reading: safecopy: Char: Cannot find getter associated with this version number: 
Version {unVersion = 1627389952}
From:   Main.Foo:
    Main.X:

Ouch.

I think the reason it happens is that for Foo X X we generate the following safePut:

      putCopy (Foo arg_a9dA arg_a9dB)
        = contain
            (do { safePut_X_a9dC <- getSafePut;
                  safePut_X_a9dC arg_a9dA;
                  safePut_X_a9dC arg_a9dB;
                  return () })

It will write the version tag for X only once (since writing the tag is done in getSafePut). However, when reading Foo X Y we will try to read the version tag twice:

      getCopy
        = contain
            (Data.Serialize.Get.label
               "Main.Foo:"
               (do { safeGet_X_a9fJ <- getSafeGet;
                     safeGet_Y_a9fK <- getSafeGet;
                     (((return Foo) <*> safeGet_X_a9fJ) <*> safeGet_Y_a9fK) }))

Ouch.

(My actual usecase, by the way, involves migrating from something like Foo X X to Foo (X Int) (X Bool), where Int and Bool are phantom (i.e. X is defined as newtype X a = X Text).)

seeing lots of errors: gendata: hGetBuf: invalid argument (Invalid argument)

seeing lots of errors:

gendata: dist/db/events-0000000330.log: hGetBuf: invalid argument (Invalid argument)

i am trying to download a bunch of and store it as an IxSet in the database, but I regularly see this error during program start.

So I dont think its due to the 2gb limit, here are some callstacks from the profiler

*** Exception (reporting due to +RTS -xc): (THUNK_1_1), stack trace:                      
  FileIO.readLock,                                                                        
  called from FileIO.checkLock,                                                           
  called from FileIO.obtainPrefixLock,                                                    
  called from Data.Acid.Local.resumeLocalStateFrom,                                       
  called from Data.Acid.Local.openLocalStateFrom,                                         
  called from Util.DB.new,                                                                
  called from GenData.gendata                                                             
*** Exception (reporting due to +RTS -xc): (THUNK_1_1), stack trace:                      
  FileIO.readLock,                                                                        
  called from FileIO.checkLock,                                                           
  called from FileIO.obtainPrefixLock,                                                    
  called from Data.Acid.Local.resumeLocalStateFrom,                                       
  called from Data.Acid.Local.openLocalStateFrom,                                         
  called from Util.DB.new,                                                                
  called from GenData.gendata                                                             
*** Exception (reporting due to +RTS -xc): (THUNK_2_0), stack trace:                      
  FileIO.readLock,                                                                        
  called from FileIO.checkLock,                                                           
  called from FileIO.obtainPrefixLock,                                                    
  called from Data.Acid.Local.resumeLocalStateFrom,                                       
  called from Data.Acid.Local.openLocalStateFrom,                                         
  called from Util.DB.new,                                                                
  called from GenData.gendata                                                             
*** Exception (reporting due to +RTS -xc): (THUNK), stack trace:                          
  FileIO.breakLock,                                                                       
  called from FileIO.maybeBreakLock,                                                      
  called from FileIO.checkLock,                                                           
  called from FileIO.obtainPrefixLock,                                                    
  called from Data.Acid.Local.resumeLocalStateFrom,                                       
  called from Data.Acid.Local.openLocalStateFrom,                                         
  called from Util.DB.new,                                                                
  called from GenData.gendata                                                             
*** Exception (reporting due to +RTS -xc): (THUNK), stack trace:                          
  FileIO.breakLock,                                                                       
  called from FileIO.maybeBreakLock,                                                      
  called from FileIO.checkLock,                                                           
  called from FileIO.obtainPrefixLock,                                                    
  called from Data.Acid.Local.resumeLocalStateFrom,                                       
  called from Data.Acid.Local.openLocalStateFrom,                                         
  called from Util.DB.new,                                                                
  called from GenData.gendata
*** Exception (reporting due to +RTS -xc): (THUNK_1_1), stack trace:                      
  FileIO.readLock,                                                                        
  called from FileIO.checkLock,                                                           
  called from FileIO.obtainPrefixLock,                                                    
  called from Data.Acid.Local.resumeLocalStateFrom,                                       
  called from Data.Acid.Local.openLocalStateFrom,                                         
  called from Util.DB.new,                                                                
  called from GenData.gendata                                                             
*** Exception (reporting due to +RTS -xc): (THUNK_1_1), stack trace:                      
  FileIO.readLock,                                                                        
  called from FileIO.checkLock,                                                           
  called from FileIO.obtainPrefixLock,                                                    
  called from Data.Acid.Local.resumeLocalStateFrom,                                       
  called from Data.Acid.Local.openLocalStateFrom,                                         
  called from Util.DB.new,                                                                
  called from GenData.gendata                                                             
*** Exception (reporting due to +RTS -xc): (THUNK_2_0), stack trace:                      
  FileIO.readLock,                                                                        
  called from FileIO.checkLock,                                                           
  called from FileIO.obtainPrefixLock,                                                    
  called from Data.Acid.Local.resumeLocalStateFrom,                                       
  called from Data.Acid.Local.openLocalStateFrom,                                         
  called from Util.DB.new,                                                                
  called from GenData.gendata                                                             
*** Exception (reporting due to +RTS -xc): (THUNK), stack trace:                          
  FileIO.breakLock,                                                                       
  called from FileIO.maybeBreakLock,                                                      
  called from FileIO.checkLock,                                                           
  called from FileIO.obtainPrefixLock,                                                    
  called from Data.Acid.Local.resumeLocalStateFrom,                                       
  called from Data.Acid.Local.openLocalStateFrom,                                         
  called from Util.DB.new,                                                                
  called from GenData.gendata                                                             
*** Exception (reporting due to +RTS -xc): (THUNK), stack trace:                          
  FileIO.breakLock,                                                                       
  called from FileIO.maybeBreakLock,                                                      
  called from FileIO.checkLock,                                                           
  called from FileIO.obtainPrefixLock,                                                    
  called from Data.Acid.Local.resumeLocalStateFrom,                                       
  called from Data.Acid.Local.openLocalStateFrom,                                         
  called from Util.DB.new,                                                                
  called from GenData.gendata 

the lock handle is getting closed maybe?

Unable to build with ghc 7.7.20131121

Here's the error message from cabal build

Preprocessing library acid-state-0.12.1...
[ 1 of 15] Compiling Data.Acid.CRC    ( src/Data/Acid/CRC.hs, dist/build/Data/Acid/CRC.o )
[ 2 of 15] Compiling Paths_acid_state ( dist/build/autogen/Paths_acid_state.hs, dist/build/Paths_acid_state.o )
[ 3 of 15] Compiling Data.Acid.Archive ( src/Data/Acid/Archive.hs, dist/build/Data/Acid/Archive.o )
[ 4 of 15] Compiling FileIO           ( src-unix/FileIO.hs, dist/build/FileIO.o )
[ 5 of 15] Compiling Data.Acid.Core   ( src/Data/Acid/Core.hs, dist/build/Data/Acid/Core.o )
[ 6 of 15] Compiling Data.Acid.Log    ( src/Data/Acid/Log.hs, dist/build/Data/Acid/Log.o )
[ 7 of 15] Compiling Data.Acid.Common ( src/Data/Acid/Common.hs, dist/build/Data/Acid/Common.o )
[ 8 of 15] Compiling Data.Acid.Abstract ( src/Data/Acid/Abstract.hs, dist/build/Data/Acid/Abstract.o )

src/Data/Acid/Abstract.hs:21:34:
    Module ‛Data.Typeable’ does not export ‛Typeable1’

Using

ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.7.20131121

Info on this change can be found in the release notes for 7.8.1 - e.g. https://ghc.haskell.org/trac/ghc/browser/ghc/docs/users_guide/7.8.1-notes.xml

  • where Typeable is now poly-kinded.

thread blocked indefinitely in an MVar operation

I am trying to build a server based on your example. When using closeAcidState I can shutdown the server by ^C with no errors.

Using closeLocalState I get the following error by doing ^C

module Main (main) where

import Control.Exception     ( bracket )
import Data.Acid             ( AcidState, closeAcidState )
import Data.Acid.Local       ( openLocalState, createCheckpointAndClose, createArchive )
import Data.Acid.Remote      ( acidServer, sharedSecretCheck )
import Data.ByteString.Char8 ( pack )
import Data.Map              ( empty )
import Data.Set              ( singleton )
import Network               ( PortID(PortNumber) )
import Table                 ( UserMap(..) )

closeLocalState :: AcidState UserMap -> IO ()
closeLocalState s = createCheckpointAndClose s >> createArchive s

main :: IO ()
main = bracket
    (openLocalState $ UserMap empty )
    closeLocalState
    (acidServer (sharedSecretCheck (singleton $ pack "12345")) (PortNumber 8080))
^C
server: FileLog has been closed
server: thread blocked indefinitely in an MVar operation

Examples don't compile with GHC-7.8.3.

The acid-state example in The Happstack Book and some of the acid-state repository examples don't compile with GHC-7.8.3. The error messages say coercion is impossible because Data.SafeCopy.SafeCopy.Kind is nominal while two arguments differ. Simon Peyton-Jones explains what changed with the GHC-7.8.x in a video presentation at "https://skillsmatter.com/skillscasts/5296-safe-zero-cost-coercions-in-haskell".
The compiler messages says"Possible fix: use a standalone 'deriving instance' declaration . . ." Has anyone had any success implementing this suggestion? I wanted to use acid-state but the example code doesn't compile with the new GHC and I don't want to use GHC 7.x because the problem is not a bug in GHC-7.8.3; the problem stems from a bug having been fixed (see the Jones presentation). How are people dealing with this situation? Is the solution so trivial that nobody has bothered to mention it here? Are people removing "newtype" from their code, or using "unsafeCoerce", or what?

Limit RAM usage with a config option?

Hello,

I really quite like the many of the concepts in acid-state. But I am reluctant to use it (e.g. for server applications) because I know it needs to keep all data in RAM all the time. And while the server might not have many users -- if it happens that it does have later on, RAM might become an issue, I am concerned.

Would it be possible to configure acid-state in such a way to tell it "Use at most 2 GB of RAM and not a bit more!"? And what I would expect acid-state to do when it reaches 2GB is to drop the least accessed bits from memory at once, and re-read them from disk iff/when they are accessed from other parts of the code, on demand.

Would this be possible? And if so, could this be implemented easily?

Can't open state after crash: too few bytes

If I run a bunch of updates on my state (local, on disk), then press Ctrl-C during some update, then try to open state (openLocalStateFrom in stack ghci), sometimes I get the following error:

too few bytes
From:   demandInput

I tried to debug it by my own and can provide some extra information.
There are two checkpoints files with two consequent numbers. The last one is much small than another one. So probably it's malformed. Also there are events files, but they seem to be irrelevant.
Error happens in newestEntry function. To be more precise, it happens when the last (probably malformed) checkpoint is being read. If I go deeper, I can say that this runGetPartial fails.
My guess is that the last checkpoint wasn't dumped properly because program crashed, but the previous checkpoint also exists and it should be fine. The problem is that newestEntry fails with error if the last checkpoint is malformed, but instead it should try to read another checkpoint (if it exists). It's just my guess, I am be wrong, because I don't know this code.

Also I can provide an example of database which can't be read because of this bug. Unfortunatelly, this example is quite heavy, but maybe someone will look into it.
Here is database (zip archived): wallet-db.zip
Definition of this data type can be found in this repository. Please use f97db74cbf09e7d2aa403d2c47a7fe37f7583e8f revision (just in case).
Just run in stack ghci:

ghci> import qualified Data.Acid as A
ghci> import qualified RSCoin.User as U
ghci> st <- A.openLocalStateFrom "wallet-db" U.emptyWalletStorage

and you will get that error.

7.8.2 TH Issue

On fresh install w/ OSX Mavericks using 7.8.2

src/Data/Acid/TemplateHaskell.hs:261:48:
    Couldn't match type [TypeQ] with Q TySynEqn
    Expected type: TySynEqnQ
      Actual type: [TypeQ]
    In the second argument of tySynInstD, namely [structType]
    In the expression:
      tySynInstD ''MethodState [structType] (return stateType)
    In the third argument of instanceD, namely
      [tySynInstD ''MethodResult [structType] (return resultType),
        tySynInstD ''MethodState [structType] (return stateType)]
makeMethodInstance eventName eventType
    = do let preds = [ ''SafeCopy, ''Typeable ]
             ty = AppT (ConT ''Method) (foldl AppT (ConT eventStructName) [ VarT tyvar | PlainTV tyvar <- tyvars ])
             structType = foldl appT (conT eventStructName) [ varT tyvar | PlainTV tyvar <- tyvars ]
         instanceD (cxt $ [ classP classPred [varT tyvar] | PlainTV tyvar <- tyvars, classPred <- preds ] ++ map return context)
                   (return ty)
                   [ tySynInstD ''MethodResult [structType] (return resultType)
                   , tySynInstD ''MethodState  [structType] (return stateType)
                   ]
    where (tyvars, context, _args, stateType, resultType, _isUpdate) = analyseType eventName eventType
          eventStructName = mkName (structName (nameBase eventName))
          structName [] = []
          structName (x:xs) = toUpper x : xs

makeAcidic yields "‘Typeable’ is applied to too many type argument" with GHC-7.8.2

The following module compiles with GHC-7.6.3 but fails to compile with GHC-7.8.2.

{-# LANGUAGE UnicodeSyntax #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}

module Test where

import Control.Monad.State.Strict

import Data.Acid
import Data.Data
import Data.IxSet
import Data.SafeCopy (SafeCopy(..), deriveSafeCopy, base)

data Entry κ = Entry
    { key ∷ !κ
    , val ∷ !Int
    }
    deriving (Eq, Ord, Typeable, Data)

deriveSafeCopy 0 'base ''Entry

newtype IxSetStore κ = IxSetStore { ixSetStore ∷ (IxSet (Entry κ)) }
    deriving (Eq, Typeable, Data)

deriving instance (SafeCopy κ, Typeable κ, Ord κ) ⇒ SafeCopy (IxSetStore κ)

instance (Ord κ, Typeable κ) ⇒ Indexable (Entry κ) where
    empty = ixSet [ ixFun $ \x → [key x]
                  , ixFun $ \x → [val x]
                  ]

insertStore
    ∷ (Ord κ, Typeable κ)
    ⇒ Entry κ
    → Update (IxSetStore κ) (Entry κ)
insertStore item = do
    modify $ \(IxSetStore s) → IxSetStore $ updateIx (key item) item s
    return item

makeAcidic ''IxSetStore [ 'insertStore ]

The error message is:

Test.hs:43:1:
    ‘Typeable’ is applied to too many type arguments
    In the instance declaration for ‘UpdateEvent (InsertStore κ_a6Xo)’

Test.hs:43:1:
    ‘Typeable’ is applied to too many type arguments
    In the instance declaration for
      ‘acid-state-0.12.2:Data.Acid.Core.Method (InsertStore κ_a6Xo)’

Test.hs:43:1:
    ‘Typeable’ is applied to too many type arguments
    In the instance declaration for ‘SafeCopy (InsertStore κ_a6Xo)’

Test.hs:43:1:
    ‘Typeable’ is applied to too many type arguments
    In the instance declaration for ‘IsAcidic (IxSetStore κ_a6Xn)’

These are the generated splices (note the constraints of the form Typeable (GHC.Prim.*) κ_a6Xn)):

Test.hs:1:1: Splicing declarations
    makeAcidic ''IxSetStore ['insertStore]
  ======>
    Test.hs:43:1-40
    instance (SafeCopy κ_a6Xn,
              Typeable κ_a6Xn,
              Ord κ_a6Xn,
              Typeable (GHC.Prim.*) κ_a6Xn) =>
             IsAcidic (IxSetStore κ_a6Xn) where
      acid-state-0.12.2:Data.Acid.Common.acidEvents
        = [acid-state-0.12.2:Data.Acid.Common.UpdateEvent
             (\ (InsertStore arg_a7B8) -> insertStore arg_a7B8)]
    newtype InsertStore κ_a6Xo
      = InsertStore (Entry κ_a6Xo)
      deriving (Typeable)
    instance (SafeCopy κ_a6Xo,
              Ord κ_a6Xo,
              Typeable (GHC.Prim.*) κ_a6Xo) =>
             SafeCopy (InsertStore κ_a6Xo) where
      putCopy (InsertStore arg_a7B7)
        = safecopy-0.8.3:Data.SafeCopy.SafeCopy.contain
            (do { safecopy-0.8.3:Data.SafeCopy.SafeCopy.safePut arg_a7B7;
                  return () })
      getCopy
        = safecopy-0.8.3:Data.SafeCopy.SafeCopy.contain
            ((return InsertStore)
             Control.Applicative.<*>
               safecopy-0.8.3:Data.SafeCopy.SafeCopy.safeGet)
    instance (SafeCopy κ_a6Xo,
              Typeable κ_a6Xo,
              Ord κ_a6Xo,
              Typeable (GHC.Prim.*) κ_a6Xo) =>
             acid-state-0.12.2:Data.Acid.Core.Method (InsertStore κ_a6Xo) where
      type acid-state-0.12.2:Data.Acid.Core.MethodResult (InsertStore κ_a6Xo) = Entry κ_a6Xo
      type acid-state-0.12.2:Data.Acid.Core.MethodState (InsertStore κ_a6Xo) = IxSetStore κ_a6Xo
    instance (SafeCopy κ_a6Xo,
              Typeable κ_a6Xo,
              Ord κ_a6Xo,
              Typeable (GHC.Prim.*) κ_a6Xo) =>
             UpdateEvent (InsertStore κ_a6Xo)

Large RAM consumption on acid-state during checkpointing / archiving

I have a state that contains two strict hashmaps with roughly 320,000 entries

HashMap Text Entry 

where Entry is like:

data Entry = Entry Text Text Text Text

When loaded from disk ekg says it consumes 195MB's (see image) . The on disk state size is 24.5MB. When most of the thunks get evaluated (a call to writeJSON sends the elements to the browser) the state goes down to 66MB.

The scary part is when I call createCheckpoint or createArchive. It bloats to 500MB sometimes even 700MB

screenshot 2014-05-02 17 44 34
screenshot 2014-05-02 17 46 44

My question is two-fold.

What are the recommended GHC settings for using acid-state (w/ remote module)?
Why does createArchive and createCheckpoint cause temporary massive RAM bloat?

Default function for grabbing the current state in IO monad

I am looking for something with a type signature like this:

grabAcidState :: AcidState as -> IO as

basically I just want to return my data from database abstractly or based on a class.

The reason for this is because I want to have a class instance of state and then I can implement my library like so:

data LibraryCoolData :: LibraryCoolData T.Text

class HasCoolData a where
     getCoolDataList :: a -> [LibraryCoolData]

grabStoredList :: HasCoolData as => AcidState as -> IO [LibraryCoolData]
grabStoredList as = do
    cs <- grabAcidState as
    return $ getCoolDataList cs

libraryActionOnCoolData :: HasCoolData as => AcidState as -> IO (Maybe String)
libraryActionOnCoolData = doSomethingWithIt =<< grabStoredList

and users could implement it like so:

UserDefinedData = Something | OtherSomething

data AppState = AppState {
    someCoolData :: [LibraryCoolData]
    personalData   :: UserDefinedData
}

instance HasCoolData AppState where
    getCoolDataList = someCoolData

However trying to implement this simple behavior has been proving quite difficult. Maybe I am making this too difficult and I should just have a seperate AcidState datatype for my library.

Any help would be appreciated.

acid-state on 7.8.1 break

I'm running OSX Mavericks w/ GHC 7.8.1

Glasgow Haskell Compiler, Version 7.8.1, stage 2 booted by GHC version 7.6.3

and executing the following:

cabal unpack acid-state
cd acid-state
cabal sandbox init
cabal install

yields:

src/Data/Acid/Common.hs:88:61:
    Could not deduce (MonadState
                        (EventState ev)
                        (StateT st1 transformers-0.3.0.0:Data.Functor.Identity.Identity))
      arising from a use of get
    from the context (st1 ~ MethodState ev, QueryEvent ev)
      bound by a pattern with constructor
                 QueryEvent :: forall ev.
                               QueryEvent ev =>
                               (ev -> Query (EventState ev) (EventResult ev))
                               -> Event (EventState ev),
               in an equation for worker
      at src/Data/Acid/Common.hs:88:19-31
    In a stmt of a 'do' block: st <- get
    In the expression:
      do { st <- get;
           return (runReader (unQuery $ fn ev) st) }
    In the first argument of Method, namely
      (\ ev
          -> do { st <- get;
                  return (runReader (unQuery $ fn ev) st) })
Failed to install acid-state-0.12.1
cabal: Error: some packages failed to install:
acid-state-0.12.1 failed during the building phase. The exception was:
ExitFailure 1

For reference it's at this line:

eventsToMethods :: [Event st] -> [MethodContainer st]
eventsToMethods = map worker
    where worker :: Event st -> MethodContainer st
          worker (UpdateEvent fn) = Method (unUpdate . fn)
          worker (QueryEvent fn)  = Method (\ev -> do st <- get
                                                      return (runReader (unQuery $ fn ev) st)
                                           )

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.