DEPRECATED (Cloud Haskell Platform) in favor of distributed-process-extras, distributed-process-async, distributed-process-client-server, distributed-process-registry, distributed-process-supervisor, distributed-process-task and distributed-process-execution

Home Page: http://haskell-distributed.github.com

License: BSD 3-Clause "New" or "Revised" License

Haskell 99.91% Shell 0.09%

distributed-process-platform's People

Contributors

Stargazers

Watchers

Forkers

hyperthunk rodlogic davidsd robstewart57 birdseyesoftware mboes alanz jeremyjh metok wewela imalsogreg carletes jepst abbradar qnikst agentm neuroradiology jccampagne

distributed-process-platform's Issues

Contributors Guide

Sync with d-p issue 94.

GenProcess doesn't drain its mailbox properly

If you pass infoHandlers = [] then no info messages are consumed and the process' mailbox will just keep on growing. So a standard handler that applies the default policy is probably a good idea.

estimate: 10 mins

GenProcess improvements

There are (3) main bits to this....

better support for invariants and pre/post conditions
pre/post processing filter chains
dynamic upgrade support

Items (1, 2) are motivated by this kind of code:

-- | Start a counter server
startCounter :: Int -> Process ProcessId
startCounter startCount =
  let server = defaultProcess {
     dispatchers = [
          handleCallIf (state (\count -> count > 10))   -- invariant
                       (\_ (_ :: Increment) ->
                            noReply_ (TerminateOther "Count > 10"))

        , handleCall handleIncrement
        , handleCall (\count (_ :: Fetch) -> reply count count)
        , handleCast (\_ Fetch -> continue 0)
        ]
    } :: ProcessDefinition State
  in spawnLocal $ start startCount init' server >> return ()
  where init' :: InitHandler Int Int
        init' count = return $ InitOk count Infinity

Quite what this 'pre/post' condition API will look like will hopefully emerge soon...

Item (3) is not a /hot code upgrade/ as it will only work with what's in the current running image, but it does provide a way to change the process definition on the fly, which is quite useful. This would, for example, allow you to dynamically assign tasks to workers in a pool, or allow you to turn on and off support for certain application level protocols. It's also quite simple to implement, as the upgrades branch demonstrates. We send a Closure (Process TerminateReason) to the server (bundled in a message) and an automatically registered handler returns ProcessUpgrade fun and that 'fun' gets evaluated and becomes the main loop. The code that uses this mechanisms calls become newState and the upgradeHandler in newState gets called with oldDefinition -> oldState.

Async Channel support

Once we've cracked issue #8 (and have passing tests) then we should really extend the Control.Distributed.Platform.Async module to support Channels too. For this extension, I think it actually does make sense to make the async task definition a function whose type is aware of the reply channel

-- NB: AsyncTask is a replacement for SpawnAsync

-- | A task to be performed asynchronously. This can either take the
-- form of an action that runs over some type @a@ in the @Process@ monad,
-- or a tuple that adds the node on which the asynchronous task should be
-- spawned - in the @Process a@ case the task is spawned on the local node
type AsyncTask a = Process a | (NodeId, Process a)

-- | A task to be performed asynchronously using typed channels
type AsyncChanTask a = SendPort a -> Process () | (NodeId, SendPort a -> Process ())

``

It is **possible** that the implementation of `Async` (in issue #8) could change to use channels internally anyway, at which point this might become a moot point.

expose profiling data for test runs

Initially so that we can investigate issue #18. This probably feeds into issue #12 as well.

implement timer module

so we can

sendAfter
runAfter
tick
cancelTimer

Provide linkOnFailure API

See distributed-process issue 75 for details.

developer mode build support

I've been using virthualenv to install development libraries (esp. the HEAD dependency on d-process) and automating that would be useful, if only so that we can get travis-ci up and running.

Provide versions of handle{Cast|Call}_ that work in the Process monad

Consider these two cast handlers:

handleCastIf_ (\(c :: String, _ :: Delay) -> c == "timeout")
                            (\("timeout", Delay d) -> timeoutAfter_ d),
handleCast    (\s' ("ping", pid :: ProcessId) ->
                                 send pid "pong" >> continue s')

There's no great reason why we shouldn't be able to skip the state in the latter as we do with the former, but the continue_ function returns (s -> Process (ProcessAction s)) which means it can't be used in monadic code like this.

Process Groups, Group Leaders

Channel vs Process based GenServer

I don't know if a full split is necessary or even a good idea, but for handleInfo and in particular in order to deal with messages sent to the process by monitors, we cannot be entirely oriented around typed channels.

supervision trees

policy control

sibling management (1-4-1, 1-4-all, left/right neighbours)
seq/par shutdown (per branch)
timeout/kill (per process?)
exit policy (i.e., transient, temp, perm, etc)
restart intensity (limits)
alternating restart policies (exit type based, global, etc)
backoff/timeout in response to restart limits

questions...

how do we define the startup procedure?
what type does a start spec have???
how can we cleanly represent error signals in a generic way?

Elevator Demo

Following on from issue #32, it might be nice to copy the Erlang/OTP elevator demo: http://erlang.org/pipermail/erlang-questions/2013-January/071488.html. This depends on various other issues being resolved first.

semantic versioning

As per http://semver.org - I think this should be a given, although it probably isn't vitally important yet but once we get to a conceptual 0.1.0 it'll probably start to matter. Right now we're 0.0.1 IMO.

Logging infrastructure

OTP provides some nice library and application support for logging. This is not distributed logging mind you, but just deals with logging on the local node.

Should we think about doing the same, building perhaps on the core CH APIs to add more features?

Unify Async APIs

Trying to unify the APIs for AsyncChan and AsyncSTM in d-p-platform. I've thought about doing this with type (or data) families but ... we need to handle AsyncChan a or AsyncSTM a so...

class Async a where
  type AsyncHandle a :: *
  poll :: AsyncHandle a -> Process (AsyncResult b)
  wait :: AsyncHandle a -> Process (AsyncResult b)
  ... etc

but ...

instance Async (STM.AsyncSTM a) where
  type AsyncHandle (STM.AsyncSTM a) = STM.AsyncSTM a
  poll = STM.poll
  wait = STM.wait

this isn't quite what I want, as the types cannot be inferred properly. And I couldn't quite figure out whether declaring data AsyncHandle a in the type class and having the instance as data AsyncHandle (STM.AsyncSTM a) = HStm (STM.AsyncSTM a) was the right approach either, but of course you can't leave a gap in the type signature without telling the compiler what the relation is between the two:

instance Async (STM.AsyncSTM a) where
  data AsyncHandle (STM.AsyncSTM a) = HStm (STM.AsyncSTM a)
  poll (HStm h) = STM.poll h  -- illegal because we don't know that the `a` in the associated type is related to the `a` in the instance declaration.

I wondered about functional dependencies here, but couldn't see how to apply them. Anyhow, I realise this is probably a pretty basic question, but I don't see a neat way to handle it. Should I just live with having the two APIs without a common method of interchanging between them? At the moment, for the most part all you need to do to switch from AsyncChan to AsyncSTM is change your imports.

Test cases are racy!

I've seen this fail intermittently several times now. Have also seen testSendAfter and testFlushTimer bomb out a couple of times. The typical symptom of this is that the tests never complete - an apparent deadlock.

In the timer tests, this is often due to the ProcessMonitorNotification in testCancelTimer not arriving. For testTimerFlush I'm less certain of what's happening as we're using timeouts for all of the cases here AFAICT.

I'm not really convinced that there isn't a race in Cloud Haskell's monitoring implementation at the moment, but without further evidence, I'm not going to point the finger either. Part of resolving this bug is getting the right profiling tools enabled so we can inspect what is happening when the tests do start hanging, so I'm opening another bug to look into that.

Application Concept

OTP is heavily dependent on the concept of an application: a top level process that launches the root supervisor and provides one or two other goodies.

Applications are a central unit of reuse, control (e.g., initialisation, upgrade) and dependency in OTP. I suspect we'd do well to borrow this idiom.

What might we do differently in CH to OTP? Can we improve in that model (which isn't without it's flaws and was designed decades ago) and what kind of feature set do we want?

Control.Distributed.Platform.Async

Updated title to reflect name change

Refactor tests to get some code sharing with distributed-process

This will involve collaboration with the CH maintainers, but is worthwhile IMO. Updating title to reflect change of focus.

ProcessId wrapper for hot-loading/migration

An obstacle to migrating processes between nodes is how to deal with ProcessIds.

If we assume that any process can be migrated to another node at any time, ProcessIds are no longer reliable. Instead, we need another way to refer to migratable processes.

This problem may be separable into two sub-problems:

Within an Application which will be migrated as a whole, processes may refer to each other. Here, it may be possible to substitute ProcessIds with correct values as part of migration.
References by ProcessId to the Application from elsewhere will need to be updated.

It might make sense to establish a wrapper type, MigratableProcessId, that provides this additional layer of indirection. Sending a message to a MigratableProcessId will resolve to the actual current ProcessId of the given process.

Misc/Administrative

Things we have to do that do not find their way into releases (like github admin tasks and so on).

Development Process and Branching Model

In terms of naming branches, I would like to keep master as one of the mainlines so I'd propose either:

Production: stable
Development: master

Or as an alternative, we could use the master branch for production and have something like this:

Production: master
Development: development

I think I prefer the latter but I'm open to having a conversation about it.

In either case, issues should be labelled with the appropriate tag so we can differentiate. Once we've written up the merge/rebase procedures for this, we should be able to work on things concurrently without stomping on one another's changes because

@rodlogic - if we can come up with a version of this scheme that works for us both and assign issues to ourselves, then as long as we're able to split up the work so we don't clash too often them I'm willing to set you up as a collaborator and grant you commit access to this project. If you'd prefer to continue working in a fork via pull requests, that's absolutely fine as well.

@edsko - would it make sense to move this project into the haskell-distributed organisation? Github organisations make it easier to manage collaborators and configure access rights, and I'm comfortable opening it up to that wider community (which looks like mostly well-typed folks?) if you think that's appropriate.

Fix Travis-CI Install Procedure

GenProcess `call` interrupt results in chaos

I'm pretty sure this is a bug. If a callback decides to terminate mid-way through a gen-call it can create havoc for the client if they're monitoring the server. The bit that really worries me is this:

Mon Jan 21 17:34:13 UTC 2013 pid://127.0.GenServerTests: 0.too few bytes. Failed reading at byte position 651
:8080:0:6: terminating....
Mon Jan 21 17:34:13 UTC 2013 pid://127.0.0.1:8080:0:6: terminating counter when state = 10 because TerminateOther "Count > 10"
Mon Jan 21 17:34:13 UTC 2013 pid://127.0.0.1:8080:0:58: call: remote process died: DiedNormal
  exceed counter limits: [Failed]
ERROR: thread blocked indefinitely in an MVar operation

Now if we ignore the odd stdio interleaving, the key thing I'm concerned about is that the decoding of some message has blown up. Quite why this is happening, I'm not sure, though This might be to do with the use of terminate or (previously) fail in the async call worker process, but it's pretty awkward to debug.

Distributed Transactions / Leader Election

We should build this. Paxos is the reference architecture starting point, though there are numerous variations on the same. See also...

https://github.com/jepst/distributed-process-global
issue #48 (a corollary feature)

what are OTP ideas that probably aren't a good fit?

(i'm super green / known nothing on the erlang & OTP side, and that'd be handy for understanding and looking at the OTP libs for ideas)

Any roadmap for the project?

Is there anything like subj or just a list of issues with no particular milestones?
I think such a project need some info in wiki (which seems to be empty at the moment)

[DPP-58] Remove dependency on Template Haskell

[Imported from JIRA. Reported by Tim Watson [Administrator] @hyperthunk) as DPP-58 on 2013-01-18 13:53:39]
It is entirely possible to install distributed-process without requiring Template Haskell. We should not therefore, IMO, force consumers of d-p-platform to use Template Haskell if they wish not to.

This is quite easy to do for the most part, although we will need to un-template the uses of mkClosure and remotable in Call.hs first.

Examples/Demos

We should have some of these. I suspect we will need to break up this repo into a multi-project repo.

We could then absorb other projects (like logging) that are related but not core to the library.

Should we just administer them at the top level, or use hit sub modules, or git subtree merging? I favour the latter, as it makes it easy to push back and forth, but perhaps just putting everything into a single repo is fine. I'd appreciate some reasoned opinions on this.

Maintainers' Guide

How to be a good citizen...

Project Homepage

For now, I've set up a simple project page, generated from github pages. We should probably move from a totally generated page to a jekyll site if we're going to stick with github pages, but I just wanted to get started with something for now.

Repository admins can simply click on 'settings > github-pages > automatic page generator' to edit what is already there. Please feel free if you have the itch.

[DPP-54] Control.Distributed.Process.Platform.Task

[Imported from JIRA. Reported by Tim Watson [Administrator] @hyperthunk) as DPP-54 on 2013-01-12 18:02:26]
Building on the facilities available in Control.Distributed.Process.Platform.Async, it would be good to be able to pass a handle to an Async to another process or send it to another node. That's currently impossible because we use typed channels and STM in the two Async implementations, so we'll need a central server to track the task state. This will open up lots other possibilities, including sharing async results with multiple consumers and using task pools of varying configurations to create async processing units.

Mirrored Supervisor

See https://github.com/rabbitmq/rabbitmq-server/blob/master/src/mirrored_supervisor.erl. Depends on issue #3.

Configuration APIs

Following on from issues #30 and #31 (though many more features may depend on it), it is fairly commit to need static configuration in a distributed system. It'd be nice to have a standard API for managing this, though I suspect there will be as many opinions about which format is best as there are developers using the tools.

[DPP-50] SNMP Agent

[Imported from JIRA. Reported by Tim Watson [Administrator] @hyperthunk) as DPP-50 on 2013-01-09 10:51:56]

See http://www.erlang.org/doc/apps/snmp/index.html for reference.

Worker Pool

A supervised pool of asynchronous tasks. These come in at least two distinct flavours.

a coordinator process spawns tasks on demand
a coordinator process that pre-allocates taks that have complex (time consuming) setup logic

The first variety benefits from having a configurable max workers limit, which can be used to throttle over-active producers and prevent resource starvation.

Atomic Broadcast / Guaranteed Multicast

See RabbitMQ's implementation and ZaB as reference architectures:

remove Main.hs

This is a library, not an executable. I'd like to remove Main.hs and have done so in the timer branch as you can see.

Any objections to this? Once we've got the tests properly lined up, I don't see any need to Main.hs.

testCancelTimer deadlocks!

see the test case in the timer-deadlock branch for details. We end up with

Timer Send:
  testSendAfter: [OK]
  testRunAfter: [OK]
  testCancelTimer: [Failed]
ERROR: thread blocked indefinitely in an MVar operation
  testPeriodicSend: [OK]
  testTimerReset: [OK]
  testTimerReset: [OK]

         Test Cases  Total      
 Passed  5           5          
 Failed  1           1          
 Total   6           6          
Test suite TestTimer: FAIL

But this does not happen in the timer branch and I'm struggling to see where the problem has been introduced. The two candidates I can see at the moment are

the fact that we've enabled some new GHC options in the cabal file or
having made Tick and instance of Eq

I'll be dumbfounded if it's down to (2) and I've applied the GHC options (mentioned in (1) above) to the timer branch and the test case still passes there. Hmn.

Unified API for making AsyncTask

At the moment we have asyncDo but that's pretty ugly, leaving a default local spawn looking like this:

h <- async $ asyncDo $ do .....

Urgh. I suspect type synonym families might help here.

Multiprocess Threadscope integration

"Profiling of multi-process or distributed Haskell systems such as client/server or MPI programs" and distributed-process-platform.

There are additional references here:

http://www.haskell.org/haskellwiki/Parallel_GHC_Project

Group Services

A la Apache ZooKeeper perhaps, and I'm quite fond of ZAB. It would be sensible however to consider using global to implement this based on the existing cluster protocols initially, looking to more complex (but perhaps reliable) forms that use atomic broadcast later on - see RabbitMQ's implementation for example.

Break up the test suites

I don't like TestMain.hs - I prefer the way we do things in distributed-process, where there are multiple independent test suites and the cabal configuration ensures they're all run. I do want to keep the code they all rely on shared though.

Support for automating derived/composed RemoteTables

An enabler for issue #31 is the ability to generate an initial RemoteTable from a bunch of dependencies.

Why do we want this?

Consider an application that uses both d-p-platform and d-p-global and another imaginary application d-p-consensus which add support for distributed transactions built on d-p-global's locking. The code in d-p-global will build its own RemoteTable by composing d-p-platform's, and d-p-consensus, if it wants to use d-p-global will compose that RemoteTable with its own definitions. This is simple for the user, because all they need to do is compose the d-p-consensus RemoteTable with their own, however there may be other libraries and/or definitions which are included in the application as dependencies but not part of d-p-consensus.

An example of this might be an application composed of (illustrative imagined applications such as) d-p-monitoring, d-p-pool, d-p-admin and so on. Each time the application developer wants to use a library/component, any __remoteTable definitions for the components will need to be composed. It would be good if we could automate this process to some extent.

What would this look like?

I'm not so sure about this. Perhaps build-time support for building a CH Application that reads the set of inter-dependent components and generates the RemoteTable properly. Perhaps just having some template haskell support that can be invoked by the main application module would do.

$(deriveRemoteTables dependencyList)

rename intervalToMs and timeToMs

Other options on a postcard, but I'm not feeling so keen about these now. The intervalToMs function could, for example, become

fromInterval
intervalToMillis(econds)
toMillis(econds)

How I hate choosing names for things.... Sigh.

Monitoring/Stats

Erlang/OTP provides rich support for monitoring and profiling the state of a running system. This support can be enabled at any time, doesn't require special compilation and has negligible impact on runtime performance.

To what extent can we provide similar facilities for CH? Do they belong in the platform or core CH layers? What would such metrics look like an what kind of APIs would we want to expose for reading them?

Distributed.Process.Management Toolset

In line with CH issue 97 there are times when you'd like to get an overview of running processes in your system. Issue #89 is already starting to provide some of the primitives required to achieve something like this. There are other things a system administrator might want to do, such as manually running exit to terminate on a given process, set up links/monitors or even manually injecting messages (although that last one's a bit far-out).

Whilst the idea behind Network.Transport.Management is just about providing an API so we can build tools, the tools for this feature already exist by and large (viz getProcessInfo, exit, kill, link, etc) so this issue is really about providing the actual tools themselves.

Now this is the kind of thing that makes administrators and tech support love rather than hate your application.

Both command line and web based interfaces would be nice.

I suspect this should be a top level project. Whether it belongs here or bundled in with -platform is another thing to consider.

support for handleCast

It is not necessary to always reply to a sender at all and we should support that notion.

GenServer

See issue #1 and https://gist.github.com/4025934

haskell-distributed / distributed-process-platform Goto Github PK

distributed-process-platform's People

Contributors

Stargazers

Watchers

Forkers

distributed-process-platform's Issues

Why do we want this?

What would this look like?

Recommend Projects

Recommend Topics

Recommend Org

Jobs