Concurrency In Scala with Cats-Effect

This text serves as an introduction to the topic of concurrent asynchronous effects in Scala, based on the cats-effect library.

However, many of the concepts presented here are applicable not only to other Scala effect libraries as well, but also to any system or programming language that deals with concurrency and asynchronous programming.

Note: It should be pointed out that the intention of the text is not to provide a "better documentation". First, because existing material on the subject is pretty good already (you can find some links in the References section), and secondly, because I don't consider myself anything remotely near being an expert in the field. These are simply my notes that I kept while I was exploring the topic, and that I'm willing to share with whomever might find them useful.

All code snippets are based on cats-effect 2, since cats-effect 3 wasn't yet out at the time of writing this text.

Introduction
Asynchronous boundary
Threading
Cats IO basics
Fibers
Cats-effect 3
Fibers outside of Scala
- Project Loom
- Green threads
References

Introduction

First, some useful definitions and descriptions:

Blocking:
Thread that executes a blocking task will wait on action until success or failure.
Non-blocking:
Thread that executes a non-blocking task will initiate it and immediately continue with another task without waiting. When the first task is done, its result may or may not be processed by the same thread that initiated it.
Synchronous:
Thread will complete the task, either by success or failure, before reaching any line after it. This often involves blocking, because task N+1 cannot continue until task N has finished, which might include waiting for the result from another thread.
Asynchronous:
Task was started by one thread, but another thread (either logical or physical) will complete the task, and then return the result using a callback. Task N+1 can continue even if N hasn't finished yet.
Concurrency:
State in which there are multiple logical threads of control, whose tasks are interleaved.
Parallelism:
State in which computations are physically performed in parallel on separate CPU cores.

Note that concurrency isn't the same as parallelism: we could have concurrency without parallelism (effects are interleaved, but everything is done by one single CPU core), and we could have parallelism without concurrency (multiple cores are running multiple standalone threads that don't interleave). In this text, we are interested in the concurrent aspect of our programs, and we don't care whether some of the tasks are done in parallel or not.

Also, there seems to be a lot of confusion around blocking vs synchronous and non-blocking vs asynchronous. Some sources (especially ones related to NodeJS) often assume synchronous = blocking and asynchronous = non-blocking, while other sources point out that blocking is always synchronous, but synchronous doesn't always mean blocking (spinlock mechanisms etc.). For all intents and purposes, this text assumes that all synchronous tasks are blocking, and all asynchronous tasks are non-blocking.

Asynchronous boundary

Concurrency means that a single logical thread can have its tasks distributed across different threads. Or, from the thread perspective, that a single thread can execute interleaved tasks coming from different logical threads.

In order to be able to do so, we have to use asynchronous operations. Once the task has been initiated, it crosses the asynchronous boundary, and it resumes somewhere else.

There are a couple of somewhat famous quotes:

"A Future represents a value, detached from time" - Viktor Klang (link)

"Asynchronous process is a process that continues its execution in a different place or time than the one it started in" - Fabio Labella (link)

"A logical thread offers a synchronous interface to an asynchronous process" - Fabio Labella (link)

Bear in mind that saying "synchronous interface" comes from the fact that all our code is basically synchronous. We write commands one after another and they are executed one after another. So when you see a term "synchronous code", pay attention to the context - maybe someone meant "code that doesn't use asynchronous effects (e.g. it blocks on every asynchronous call)". In this text, however, "synchronous code" means just any code, because all code we write is in nature synchronous. It is via usage of asynchronous effects in our code that we get to cross the asynchronous boundary and model asynchronously executed operations.

All of the above quotes revolve around the same concept of asynchronous boundary; after our computation passes that boundary, we cannot be sure when is it going to be finished, which is why we should not block while waiting for it. We're also not sure where is it going to be finished - another OS thread, another node in our physical network, somewhere in the cloud etc. This text deals with the details of execution on OS threads (hence the relevance of the third quote), and it will not touch upon any other scenario of asynchronous execution, such as distributing the tasks over nodes in a cluster.

When we take a look at how each task is executed by OS thread(s), we can see the importance of concurrency.

After crossing the asynchronous boundary, the tasks get interleaved across threads in the thread pool. Some of them might even get executed on some thread from another thread pool. This is where the property of concurrency comes into play.

Threading

Threads and thread pools

Threads in JVM map 1:1 to the operating system’s native threads. When CPU stops executing one thread and starts executing another thread, the OS needs to store the state of the earlier task and restore the state for the current one. This context switch is expensive and sacrifices throughput. In ideal world we would have a fixed number of tasks and at least the same number of CPU threads; then every task would run on its own thread and throughput would be maximal, because context switches wouldn't exist. However, in the real world there are things to consider:

there will be external requests from the outside that need to be served
even if there are no external requests, no I/O etc. (e.g. mining bitcoin), some work-stealing is bound to happen anyway, for example garbage collector on the JVM

This is why it's useful to sacrifice some throughput to achieve fairness. High fairness makes sure that all tasks get their share of the CPU time and no task is left waiting for too long.

Asynchronous operations can be divided into three groups based on their thread pool requirements:

Non-blocking asynchronous operations, e.g. HTTP requests, database calls
Blocking asynchronous operations, e.g. reading from the file system
CPU-heavy operations, e.g. mining bitcoin

These three types of operations require significantly different thread pools to run on:

Non-blocking asynchronous operations:
- Bounded pool with a very low number of threads (maybe even just one), with a very high priority. These threads will basically just sit idle most of the time and keep polling whether there is a new async IO notification. Time that these threads spend processing a request directly maps into application latency, so it's very important that no other work gets done in this pool apart from receiving notifications and forwarding them to the rest of the application.
Blocking asynchronous operations:
- Unbounded cached pool. Unbounded because blocking operation can (and will) block a thread for some time, and we want to be able to serve other I/O requests in the meantime. Cached because we could run out of memory by creating too many threads, so it’s important to enable reusing of existing threads.
CPU-heavy operations:
- Fixed pool with number of threads = number of CPU cores. This is pretty straightforward. Back in the day the "golden rule" was number of threads = number of CPU cores + 1, but "+1" was coming from the fact that one extra thread was always reserved for I/O (as explained above, now we have separate pools for that).

Remember: whenever you are in doubt over which thread pool best suits your needs, optimal solution is to benchmark.

Java Executors

In Java, thread pools are modeled through Executor / ExecutorService interface. The difference is that the latter provides termination capabilities and some utility functions.

Two most commonly used Executor implementations are:

Java 5 ThreadPoolExecutor: Thread pool that executes each submitted task using one of possibly several pooled threads.
Java 7 ForkJoinPool: Work-stealing thread pool that tries to make use of all your CPU cores by splitting up larger chunks of work and assigning them to multiple threads. If one of the threads finishes its work, it can steal tasks from other threads that are still busy. You can set the number of threads to be used in the pool, bounded by some configured minimum and maximum.

There are many online sources on the difference between the two - I personally like this one.

Utility methods for obtaining various Executor implementations are available in the Executors class.

Here are some recommendations on which implementation to use for each of the scenarios described earlier:

Non-blocking asynchronous operations:
- newFixedThreadPool
Blocking asynchronous operations:
- newCachedThreadPool
CPU-heavy operations:
- For many small tasks: new ForkJoinPool
- For long-running tasks: newFixedThreadPool

Scheduling and ExecutionContext

Now that the thread pools are set up, we are ready to start submitting tasks to their respective thread pools. In our program, we will simply submit a task for execution, and it will get executed at some point when it's assigned a thread. This assignment is done by the scheduler.

There are two ways scheduling can be achieved:

Preemptive scheduling:
Scheduler suspends the currently running task in order to execute another one
Cooperative scheduling (or cooperative yielding):
Tasks suspend themselves, meaning that the currently running task at some point voluntarily suspends its own execution, so that the scheduler can give the thread to other tasks

The role of the scheduler is played by the ExecutionContext (docs). Every ExecutionContext schedules threads within one assigned thread pool.

In Scala, there is one global ExecutionContext, available as ExecutionContext.global. Global EC is backed by the ForkJoinPool.

Whichever underlying Java executor you rely on, your level of granularity is going to be threads. Threads don't cooperate. They execute their given set of commands, and the operating system makes sure that they all get some chunk of the CPU time. That's why I can be typing this text, listening to music and compiling some code, all at the same time.

So, in order to allow tasks submitted to the ExecutionContext to use the principle of cooperative yielding, we have to explore the concept of fibers. Later we will also look at cooperative yielding into more detail. Note that fibers belong to a more general concept of green threads which share cooperative yielding as one of the characteristics.

Cats IO basics

Introduction

Type IO (docs) is used for encoding side effects as pure values. In other words, it allows us to model the operations from the other side of asynchronous boundary in our synchronous code.

There are two main groups of IO values - those that model:

synchronous computations
asynchronous computations

A somewhat different definition of IO capabilities is that it contains:

an FFI for side-effectful asynchronous functions (e.g. async and cancelable; see section Asynchronous methods (FFI))
combinators defined either directly inside IO or coming from type classes (pure, map, flatMap, delay, start etc.)

Synchronous methods

Most common methods for modeling synchronous computations using IO are:

Here are several ways of producing IO values in a "static" way:

object IO {
  ...
  def pure[A](a: A): IO[A]
  def delay[A](body: => A): IO[A] // same as apply()
  def suspend[A](thunk: => IO[A]): IO[A] 
  ...
}

There is one more important method that we will be using heavily, but this one is not static; it's defined as a class method on values of type IO:

class IO[A] {
  ...
  def start(implicit cs: ContextShift[IO]): IO[Fiber[IO, A]]
  ...
}

Pure:

Wraps an already computed value into IO context, for example IO.pure(42). It comes from the Applicative type class (note: from cats, not cats-effect).

Delay:

We will be using delay / apply, in the form of IO(value) which desugars into IO.apply(value). It comes from the Sync type class and is used for suspension of synchronous side effects

Suspend:

Method suspend also comes from Sync, hence it also suspends a synchronous side effect, but this time it's an effect that produces an IO. Note that:

IO.pure(x).flatMap(f) <-> IO.suspend(f(x))

Asynchronous methods (FFI)

IO also serves as an FFI - a Foreign Function Interface. Most common usage of the term FFI is to serve as a translation layer between different programming languages, but IO translates side-effectful asynchronous Scala methods into pure, referentially-transparent values.

Translating such operations into IO world is done primarily via these two methods:

object IO {
  ...
  def async[A](k: (Either[Throwable, A] => Unit) => Unit): IO[A] 
  def cancelable[A](k: (Either[Throwable, A] => Unit) => CancelToken[IO]): IO[A]
  ...
}

Async:

Method async is used for modeling asynchronous operations in our cats-effect code.

It comes from the Async type class:

def async[A](k: (Either[Throwable, A] => Unit) => Unit): F[A]

(Note: in cats-effect 3, Async will contain more methods)

It provides us with a way to describe an asynchronous operation (that is, operation that happens on the other side of asynchronous boundary) in our synchronous code.

Let's say that there is some callback-based method fetchUser which queries for some user in a database and possibly returns an error in case something went wrong (e.g. connection to the database is closed). The user of this method will provide a callback which will do something with the user and react to error, if it happens. They could look something like this:

def fetchUser(userId: UserId): Future[User]
def callback(result: Try[User]): Unit

How do we now model an asynchronous operation in synchronous code? That's what methods like onComplete are for (see Future Scaladoc). We say that onComplete models an operation that happens on the other side of the asynchronous boundary, and it serves as an interface to our synchronous code.

Let's use onComplete to implement a helper function that, given a Future, provides us with a synchronous model of the underlying asynchronous process:

def asyncFetchUser(fetchResult: Future[User])(callback: Try[User] => Unit): Unit =
fetchResult.onComplete(callback)

We can say that onComplete is a method for providing a description (or a model) of some asynchronous process by translating it across the asynchronous boundary to our synchronous code.

So finally, what Async gives us is a method from such a description to an effect type F, in our case IO. We could therefore explain the signature of async with the following simplification in pseudocode:

def async[A](k: (Either[Throwable, A] => Unit) => Unit): F[A]

which translates to

def async[A](k: Callback => Unit): F[A]

which further translates to

def async[A](k: AsyncProcess): F[A]

For example, if method fromFuture weren't already implemented for IO, we could implement it as:

def fromFuture[A](future: => Future[A]): IO[A] =
Async[IO].async { cb =>
  future.onComplete {
    case Success(a) => cb(Right(a))
    case Failure(e) => cb(Left(e))
  }
}

We don't care about what callback cb really does. That part is handled by the implementation of async. Purpose of cb is to provide users of async method with a way to signal that the asynchronous process has completed.

You can find the full code to play around with in the code repository.

It comes from the Async type class.

Cancelable:

Method cancelable is present in IO, just like async, and they have similar signatures. Just like async, it creates an IO instance that executes an asynchronous process on evaluation. But unlike async, which comes from the Async type class, cancelable comes from Concurrent type class.

If you understand async, cancelable is simple. The difference is:

async models an asynchronous computation and puts it inside an IO effect. We use the callback to signal the completion of the asynchronous computation.
cancelable models a cancelable asynchronous computation and puts it inside an IO effect. We use the callback to signal the completion of the asynchronous computation, and we provide an IO value which contains the code that should be executed if the asynchronous computation gets canceled. This value is of type IO[Unit], declared in the signature by using the type alias CancelToken[IO].

It's important to emphasize that cancelable does not produce an IO that is cancelable by the user. You cannot say:

val io = IO.cancelable(...)
io.cancel // or something like that

Instead, what cancelable does is - it takes a foreign (meaning it comes from outside of our IO world) asynchronous computation that is cancelable in its nature, and puts it in the IO context. So, it allows us to model asynchronous computations in the same fashion that async does, but with the extra ability to define an effect that will be executed if that asynchronous computation gets canceled.

For example, such asynchronous computation could be a running thread, a database connection, a long poll to some HTTP API etc., and by using cancelable we can translate that foreign computation into IO world and define what should happen if that computation gets cancelled (e.g. somebody kills the database connection).

This is how we could modify our previous async example to include the cancelation feature:

def fromFutureCancelable[A](future: => Future[A]): IO[A] =
IO.cancelable { cb =>
  future.onComplete {
    case _ => // don't use the callback!
  }
  IO(println("Rollback the transaction!"))
}

Notice how we don't call the cb callback any more. Remember, cb is used to denote the completion of the asynchronous computation that we are modelling. By not calling cb, we can emulate a long running operation (one that we have enough time to cancel).

If you run the code, you will see that the computation is running indefinitely. But once you kill the process, you will see the printout "Rollback the transaction!". If you use async instead of cancelable (also provided in the code repository), you will notice that there is no transaction rollback printout.

Resource handling

Resource handling refers to the concept of acquiring some resource (e.g. opening a file, connecting to the database etc.) and releasing it after usage.

Cats effect model resource handling via Resource type.

Here's an example (pretty much c/p-ed from the cats website):

def mkResource(s: String): Resource[IO, String] = {
  val acquire = IO(println(s"Acquiring $s")) *> IO.pure(s)
  def release(s: String) = IO(println(s"Releasing $s"))
  Resource.make(acquire)(release)
}

val r = for {
  outer <- mkResource("outer")
  inner <- mkResource("inner")
} yield (outer, inner)

override def run(args: List[String]): IO[ExitCode] =
  r.use { case (a, b) => IO(println(s"Using $a and $b")) }.map(_ => ExitCode.Success)

For-comprehension that build r operates on the Resource level (hence, Resource is a monad). We can easily compose multiple Resources by flatmapping through them.

The output of the above program is:

Acquiring outer
Acquiring inner
Using outer and inner
Releasing inner
Releasing outer

As you can see, Resource takes care of the LIFO (Last-In-First-Out) order of acquiring / releasing.

If something goes wrong, Resources are released:

val r = for {
  outer <- mkResource("outer")
  _ <- Resource.liftF(IO.raiseError(new Throwable("Boom!")))
  inner <- mkResource("inner")
} yield (outer, inner)

results with

Acquiring outer
Releasing outer
java.lang.Throwable: Boom!

and

val r = for {
  outer <- mkResource("outer")
  inner <- mkResource("inner")
  _ <- Resource.liftF(IO.raiseError(new Throwable("Boom!")))
} yield (outer, inner)

results with

Acquiring outer
Acquiring inner
Releasing inner
Releasing outer
java.lang.Throwable: Boom!

For more details on resource handling in cats-effect, refer to this excellent blog post.

Fibers

Definition

You can think of fibers as lightweight threads which use cooperative scheduling, unlike actual threads which use preemptive scheduling.

Note that in some contexts / languages fibers are also known as coroutines; "fiber" is usually used in the system-level context, while coroutines are used in the language-level context. However, in Scala "fiber" is the preferred term.

Unlike OS and JVM threads which are managed in the kernel space, fibers are managed in the user space.

Fibers map to CPU / JVM threads many-to-few, similarly to how threads map to processes. Multiple fibers can run on multiple thread pools, or on the same set of threads from one thread pool, or even on the same thread. In the case of a single thread, they will take turns executing their code using the available thread. Depending on your code, as well as the library that you are using, you can decide to have them cooperate more often, thus achieving fairness, or to cooperate less often, thus sacrificing some fairness for more throughput (concepts of fairness and throughput have been introduced earlier).

Fibers are much more lightweight: they consume much less memory, have growable and shrinkable stacks, and can be garbage collected. Also, blocking a fiber doesn't block the underlying thread. As a consequence of all that, we don't have to be as careful when creating new fibers as the upper limit is denoted by available memory. On the other hand, when using threads, we're primarily constrained by the number of cores, but also other aspects.

I want to explicitly point out that fiber in Scala is a concept, not some native resource like process or thread. Project Loom (also see this blogpost) is aiming to introduce fibers as native JVM constructs, but until that happens, fibers in Scala will continue to be a manually-implemented thing. This also means that they might have some minor differences in implementation across different libraries (e.g. cats-effect vs ZIO).

In cats-effect, fiber is a construct with cancel and join:

trait Fiber[F[_], A] {
  def cancel: F[Unit]
  def join: F[A]
}

Joining a fiber can be thought of as blocking for completion, but only on a semantic level. Remember, blocking a fiber doesn't really block the underlying thread, since it can keep running other fibers.

Program defined as IO value can be executed on a fiber. IO type uses the method start (available as long as there is an implicit ContextShift[IO] defined) to start its execution on a fiber. ContextShift will be explained later; for now, think of it as cats-effect version of ExecutionContext. It's basically a reference to the desired thread pool that should execute the fiber.

By using start, we can make two or more IOs run in parallel. Note that it is also perfectly possible to describe the whole IO program without ever invoking start. This simply means that the whole IO program will run on a single fiber.

Here is some very simple code that demonstrates how IO describes side effects and runs them on a single fiber (there will be more examples in the ContextShift section:

import cats.effect.{ExitCode, IO, IOApp}

object MyApp extends IOApp {

  def io(i: Int): IO[Unit] = 
    IO(println(s"Hi from $i!")) // short for IO.delay or IO.apply

  val program1 = for {
    _ <- io(1)
    _ <- io(2)
  } yield ExitCode.Success

  override def run(args: List[String]): IO[ExitCode] = program1
}

You will notice that the main object extends IOApp. This is a very useful cats-effect trait that allows us to describe our programs as IO values, without having to actually run them manually by using unsafeRunSync or similar methods. Remember how we said earlier that invoking start on some IO requires an implicit instance of ContextShift[IO] in order to define the ExecutionContext (and hence the thread pool) to run on? Well, IOApp comes with a default ContextShift instance (which you can then override if you want to). This is why we didn't have to explicitly define any implicit ContextShift[IO] instance in our code.

For-comprehension is working on IO layer; we could flatMap io1 into a bunch of other IOs, for example reading some stuff from the console, then displaying some more stuff, then doing an HTTP request, then talking to a database a bit, etc.

Now let's see what happens if we want to run some IO on a separate fiber:

...
val program2 = for {
  fiber <- io(1).start
  _ <- io(2)
  _ <- fiber.join
} yield ExitCode.Success
...

We define a chain of IOs, and then at some point we run some part of that chain on a separate fiber. That's the only difference from the previous program - parts where we start and join the fiber. Invoking io1.start produces an IO[Fiber[IO, Unit]], which means we get a handle over the new fiber which we can then join later, or cancel it on error, or keep it running until some external mechanism tells us to cancel it, etc.

It's important to realize which exact instructions in the example above get executed on which fiber. After we started the execution of io1 on a separate fiber, everything we did afterwards was done in parallel to the original fiber. We say that the code captured in io1 was the source for the new fiber.

As a small exercise, try adding a small sleep to method io:

def io(i: Int): IO[Unit] = IO({
  Thread.sleep(3000)
  println(s"Hi from $i!")
})

If you now measure the execution time between program1 and program2, you will see that program1 runs in six seconds, while program2 runs in slightly over three seconds. You can find this code in the repository.

Continuations

In the previous section, we have seen that fiber runs a set of instructions (the source). Any IO can be run in a fiber as long as there is a ContextShift type class instance available for it. This is done by calling .start on it. It needs an instance of a ContextShift in order to know which thread pool to run the fiber on.

Basically, this is what a fiber really is under the hood - it's a continuation with a scheduler.

Continuation is a stack of function calls that can be stopped and stored in the heap at some point (with yield) and restarted afterward (with run), and we just saw how we can build up the continuation as a series of flatmapped instructions wrapped in a IO.
Scheduler schedules fibers on a thread pool so that the execution of fiber's code can be carried by multiple worker threads. Once we do .start on an IO, we start it in a separate fiber and scheduler schedules it on the thread pool. In cats-effect, role of the scheduler is performed by ContextShift, which uses the underlying ExecutionContext.

Run loop

Each fiber is associated with a run loop that executes the instructions from the source one by one.

Run loop needs access to two JVM resources:

execution context (forking & yielding)
scheduled executor service (sleeping before getting submitted again)

Run loop builds up a stack of tasks to be performed. We could model the stack in Scala code like this:

sealed trait IO[+A]
case class FlatMap[B, +A](io: IO[B], k: B => IO[A]) extends IO[A]
case class Pure[+A](v: A) extends IO[A]
case class Delay[+A](eff: () => A) extends IO[A]

Now, let's say we have the following program:

val program = for {
  _ <- IO(println(s"What's up?"))
  input <- IO(readLine)
  _ <- IO(println(s"Ah, $input is up!"))
} yield ExitCode.Success

Run loop stack for the program above would then look like this:

FlatMap(
  FlatMap(
    Delay(() => print("What's up?")),
    (_: Unit) => Delay(() => readLine)
  ),
  input => Delay(() => println(s"Ah, $input is up!"))
)

(here is the link to the actual cats-effect IO run loop)

When the program is run (at the "end of the world", in our case using unsafeRunSync()), the stack is submitted to the scheduler. Is it submitted all at once? Or is it submitting one layer of FlatMap at a time, each time yielding back when the task is completed, allowing other fibers to run?

Actually, this is up to the library / construct being used:

Scala Future: Yields back on every FlatMap
IO: Yields back when the whole stack is completed
Monix Task: Yields back every N operations (at the moment of writing this, I believe N = 1024).

This means that Scala Future optimizes for fairness, IO for throughput, and Monix takes the middle approach. Note that we could prevent Future from yielding all the time by using a non-shifting instance of ExecutionContext, and we could force IO to shift manually when we want it to.

You can think of Scala Future as being "fairness opt-out" and IO "fairness opt-in".

Quick side note: When a cancellation command has been issued for some running IO, it can only be cancelled at two particular points, and one such point is inserted by the library on every 512 flatMaps in the run loop stack. The other one is at the asynchronous boundary (see Context shift section).

Cooperative yielding

The relationship between fibers and threads is the following:

It is possible to have each fiber running on its dedicated thread
It is possible to have all fibers running on only one thread
It is possible to have one fiber switching between multiple threads
Usually, you will want to have M to N mapping (M = fibers, N = threads), with M > N
Whenever multiple fibers need to compete for the same thread, they will cooperatively yield to each other, thus allowing each fiber to run a bit of its work and then allow some other fiber to take the thread
How often and at which point the fiber yields depends on the underlying implementation; in cats-effect, it won't yield until you tell it to, which allows fine-tuning between fairness and throughput

Because the number of fibers is usually higher than the number of threads in the given thread pool, fibers yield control to each other in order to make sure that all fibers get their piece of the CPU time. For example, in a pool with two threads that's running three fibers, one fiber will be waiting at any given point.

In cats-effect 2, cooperative yielding is controlled via ContextShift.

ContextShift

Note that ContextShift is most likely going to be removed in cats-effect 3; see cats-effect 3 section. However, core principles explained here are useful to understand, because they will still be relevant in the next version)

In cats-effect, submitting the fiber to a thread pool is done via ContextShift type. It has two main abilities: to run the continuation on some ExecutionContext, and to shift it to a different ExecutionContext.

Here's the trait:

trait ContextShift[F[_]] {
  def shift: F[Unit]
  def evalOn[A](ec: ExecutionContext)(f: F[A]): F[A]
}

You can think of it as a type class, even though it is not really a valid type class because it doesn't have the coherence restriction - a type can implement a type class in more than one way. For example, you might want to have a bunch of instances of ContextShift[IO] lying around, each constructed using a different ExecutionContext and representing a different thread pool (one for blocking I/O, one for CPU-heavy stuff, etc.). As mentioned earlier, constructing an instance of ContextShift[IO] is easy: val cs = IO.contextShift(executionContext).

Method shift is how we achieve fairness. Every fiber will be executed synchronously until shifted, at which point other fibers have the chance to advance their work.

Don't confuse shift from ContextShift with IO.shift. The semantics are the same, but they come in slightly different forms. IO version has the following two overloads of shift method:

def shift(implicit cs: ContextShift[IO]): IO[Unit]
def shift(ec: ExecutionContext): IO[Unit]

These two methods are similar in nature - they both shift to the desired thread pool, one by providing the Scala's ExecutionContext, the other one by providing a ContextShift. It is recommended to use ContextShift by default, and to provide ExecutionContext only when you need fine-grained control over the thread pool in use. Note that you can simply provide the same ContextShift / ExecutionContext that you're already running on, which will have the effect of cooperatively yielding to other fibers on the same thread pool, same as shift from the type class (you can even invoke it simply as IO.shift, as long as you have your ContextShift available implicitly).

So, just to repeat, ContextShift can perform a "shift" which either moves the computation to a different thread pool or sends it to the current one for re-scheduling. Point at which the shift happens is often referred to as asynchronous boundary. Concept of asynchronous boundary has been described in the Asynchronous boundary section, and now it has been re-introduced in the cats-effect context.

Asynchronous boundary is one of two places at which an IO can be cancelled (the other one is every 512 flatMaps in the run loop; see the Run loop section).

Examples

Method shift will be demonstrated on two examples.

First, we will use a thread pool with only one thread, and we will start two fibers on that thread. Note that I'm removing some boilerplate to save space (IOApp etc.), but you can find the full code in the repository. Also note that Executors.newSingleThreadExecutor and Executors.newFixedThreadPool(1) are two alternative ways of declaring the same thing. I will use the latter, simply to keep the consistency with examples that use multi-threaded pools.

val ec = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(1))
val cs: ContextShift[IO] = IO.contextShift(ec)

def loop(id: String)(i: Int): IO[Unit] =
  for {
    _ <- IO(printThread(id))
    _ <- IO(Thread.sleep(200))
    result <- loop(id)(i + 1)
  } yield result

val program = for {
  _ <- loop("A")(0).start(cs)
  _ <- loop("B")(0).start(cs)
} yield ExitCode.Success

where printThread is a printline statement that includes the thread identifier, for extra clarity:

def printThread(id: String) = {
  val thread = Thread.currentThread.getName
  println(s"[$thread] $id")
}

Code is pretty straightforward - we have a recursive loop that goes on forever, and all it does is print out some ID (e.g. "A" or "B").

What gets printed out is an endless stream of "A", because first fiber never shifts (that is, never cooperatively yields) and the second fiber never gets a chance to be run.

Now let's add the shifting to the above code snippet:

def loop(id: String)(i: Int): IO[Unit] = for {
  _ <- IO(printThread(id))
  _ <- IO.shift(cs) // <--- now we shift!
  result <- loop(id)(i + 1)
} yield result

What gets printed out in this case is an alternating sequence of "A"s and "B"s:

[pool-1-thread-1] A
[pool-1-thread-1] A
[pool-1-thread-1] B
[pool-1-thread-1] A
[pool-1-thread-1] B
[pool-1-thread-1] A
...

Even though we have only one thread, there are two fibers running on it, and by telling them to shift after every iteration, they can work cooperatively together. At any given point only one fiber is running on the thread, but soon afterward it backs away from the thread and gives the opportunity to the other fiber to run on the same thread.

In the second example, we will have the same two fibers, but this time each fiber will get its own single thread.

val ec1 = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(1))
val ec2 = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(1))

val cs1: ContextShift[IO] = IO.contextShift(ec1)
val cs2: ContextShift[IO] = IO.contextShift(ec2)

def loop(id: String)(i: Int): IO[Unit] = for {
  _ <- IO(print(id))
  _ <- if (i == 10) IO.shift(cs1) else IO.unit
  result <- loop(id)(i + 1)
} yield result

val program = for {
  _ <- loop("A")(0).start(cs1)
  _ <- loop("B")(0).start(cs2)
} yield ExitCode.Success

We get:

[pool-1-thread-1] A
[pool-2-thread-1] B
[pool-1-thread-1] A
[pool-2-thread-1] B
[pool-1-thread-1] A
[pool-2-thread-1] B
...

So this time each fiber has the opportunity to run, because they are running each on its own thread (it's the operating system's job to make sure the CPU runs a little bit of each thread all the time). We would observe the same behaviour if we had used a single pool with two threads, e.g. Executors.newFixedThreadPool(2) (try it out!).

Now, pay attention to the shift that happens on 10th iteration:

...
  _ <- if (i == 10) IO.shift(cs1) else IO.unit
...

At the 10th iteration of the loop, each IOs will shift to thread pool number one. This means that, at that point, both fibers are going to get scheduled on the same single thread from that pool, and there will be no subsequent shifts. So soon after initial "ABAB..." we will suddenly stop seeing "B"s:

[pool-1-thread-1] A
[pool-2-thread-1] B
[pool-1-thread-1] A
[pool-2-thread-1] B
[pool-1-thread-1] A
[pool-1-thread-1] A
[pool-1-thread-1] A
[pool-1-thread-1] A
...

If we would keep shifting (e.g. by saying i > 10 instead of i == 10), we would keep getting "A"s and "B"s interchangeably like we have so far. But we only shifted once, both loops to the same ContextShift (that is, to the same single threaded thread pool), and we stopped shifting at that point. So both fibers ended up stuck on a single threaded thread pool, and without further shifts, the other one will starve to death.

IO vs Future

What do you think happens if we swap IO for a Future in the cases we saw earlier?

Let's start with the single thread example:

implicit val ec = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(1))

def printThread(id: String) = Future {
  val thread = Thread.currentThread.getName
  println(s"${LocalDateTime.now} [$thread] $id")
}

def loop(id: String)(i: Int): Future[Unit] =
for {
  _ <- printThread(id)
  _ <- Future(Thread.sleep(200))
  result <- loop(id)(i + 1)
} yield result

val program = for {
  _ <- loop("A")(0)
  _ <- loop("B")(0)
} yield ExitCode.Success

Await.result(program, Duration.Inf)

As expected, this prints out [pool-1-thread-1] A indefinitely.

But what happens if we now change to Executors.newFixedThreadPool(2)? In the case of fibers, both "A" and "B" would be executed concurrently.

But with Futures, we get

[pool-1-thread-1] A
[pool-1-thread-2] A
[pool-1-thread-1] A
[pool-1-thread-2] A
[pool-1-thread-2] A
[pool-1-thread-1] A

Note how threads are taking turns, but both are executing "A".

Why does this happen?

On every flatMap call (map too), Future needs to have access to the ExecutionContext. Its signature is literally:

def flatMap[S](f: T => Future[S])(implicit executor: ExecutionContext): Future[S]

So in every step of the for-comprehension, Future will dispatch its computation back to our two-threaded pool (note: I heard that implementation of Future might change in this regard and that calls to ExecutionContext are going to be "batched" to improve performance, but I couldn't find any official source for this at the time of writing).

This explains why we see alternating threads taking turns in computing "A".

But why is there no "B"? Because there are no fibers. Remember, with IO we ran two separate fibers on the same ContextShift (that is, on the same thread pool) by using .start, and we shifted from one to another whenever we invoked shift. And because IO is lazy, loop didn't run endlessly over and over again inside the first step of the for-comprehension before even getting to the second one. Instead, we lazily defined two (endless) IO computations and we declared that we wanted to run them on separate fibers, either on the same thread pool or on separate ones (we had both situations throughout the examples). Then, once we executed the full program, we observed the behaviour of two fibers running on the thread pool(s), either in a cooperative way or in a selfish way, depending on whether we shifted or not.

But with Futures, there is no concept of a fiber. This means that, instead of defining two separate fibers in our two-step for-comprehension, we simply defined a chain of two computations, both being infinitely recursive. So what happens is that loop "A" runs indefinitely, forever calling itself recursively, and our code never even gets to the loop "B". But on each recursive call of the "A" loop, underlying ExecutionContext delegates the computation to one of the two available threads, which is why we saw them alternating.

Note that we would have observed the same behaviour using IO if we hadn't started the two loops on separate fibers:

val program = for {
  _ <- loop("A")(0) // .start(cs)
  _ <- loop("B")(0) // .start(cs)
} yield ExitCode.Success

Homework: Try to run the example without .start, that is, without spawning any separate fibers, but keep the shift inside the loop. What happens? What gets printed out?

Leaking fibers

In real world scenarios, you want to join started fibers when you're done with them (unless you cancel them). But there's a lurking danger if you're using multiple fibers:

val f1 = for {
  f1 <- IO(Thread.sleep(1000)).start
   _ <- f1.join
   _ <- IO(println("Joined f1"))
} yield ()

val f2 = for {
  f2 <- IO.raiseError[Unit](new Throwable("boom!")).start
  _ <- f2.join
  _ <- IO(println("Joined f2"))
} yield ()

val program = (f1, f2).parMapN {
  case _ => ExitCode.Success
}

In the above example, we will not only never see "Joined f2", but we will also never see "Joined f1". Fiber f1 will leak.

Fibers should therefore always be used within a safe allocation mechanism, otherwise they might leak resources when cancelled. In the Resource handling section, one such mechanism has been shown, using the Resource construct.

Here is an example of using that mechanism to assure safety upon fiber cancelation:

def safeStart[A](id: String)(io: IO[A]): Resource[IO, Fiber[IO, A]] =
  Resource.make(io.start)(fiber => fiber.cancel >> IO(println(s"Joined $id")))

val r1 = safeStart("1")(IO(Thread.sleep(1000)))
val r2 = safeStart("2")(IO.raiseError[Unit](new Throwable("boom!")))

val program = (r1.use(_.join), r2.use(_.join)).parMapN {
  case _ => ExitCode.Success
}

This time you will notice both the "Joined 1" and "Joined 2" printout, which means that both fibers got joined and didn't leak.

Summary

Type class ContextShift gives us the ability to execute an effect on some desired thread pool via evalOn by providing the corresponding ExecutionContext, or to shift, which re-schedules the fiber in the current thread pool and enables cooperative yielding.

We get the same two abilities in IO, but in that case shift takes the thread pool as a parameter, either as ExecutionContext or (implicit) ContextShift. This means that we can cooperatively yield to another fiber within the same thread pool by passing the reference to the current one, and we can also shift to a different one.

In the case of Future, there are no fibers. We pass an ExecutionContext to each map / flatMap call, which means that every Future computation might be executed on a different thread (this is up to the passed ExecutionContext and how it decides to schedule the work). What we cannot do with Futures, however, is define two concurrent computations that will reuse the same thread cooperatively.

Cats-effect 3

At the time of writing this text, cats-effect 3 was still in the proposal phase.

Here are some important changes that are happening (there are many more, but I'm focusing on those that directly affect the things explained in this text):

ContextShift is being removed.

Even though cats-effect 3 still isn't out, the decision to remove ContextShift has already been made. But that doesn't mean that the principles explained in the previous couple of sections are becoming deprecated and irrelevant.

First of all, evalOn will still exist; we need the ability to run a fiber on a given thread pool. It will simply take ExecutionContext as a parameter instead of ContextShift. However, it's now being constrained in a way that it will move all of the actions to the given thread pool, reverting back to the enclosing thread pool when finished (as opposed to cats-effect 2 which reverts back to the default pool). This is explained further in point 3.
Method shift is being removed.

In cats-effect 2, method shift has two main roles:
- shifting to a desired thread pool, which will be done by Async#evalOn described in the previous point
- cooperative yielding, which will be done by yield / yielding / cede / pass / whatever name is eventually agreed upon, and which will be part of the Concurrent type class (this method will actually be a bit more general, but that's more of an implementation detail).
Note that this means removing shift from three different places:
- ContextShift#shift is being removed completely (see point 1)
- Async#shift(executionContext) is being replaced by Async[F[_]]#evalOn(f, executionContext) (note that the former is the companion object, while the latter is the type class)
- IO#shift(executionContext) and IO#shift(contextShift) are being replaced by Async[IO]#evalOn(executionContext) (although there might be an IO#evalOn(executionContext) for convenience)
Async type class will now hold a reference to the running ExecutionContext. This will enable fallback to the parent ExecutionContext once a fiber has terminated. Consider the following cats-effect 2 code:
```
val ec1 = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(1))
val ec2 = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(1))
val ec3 = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(1))

val cs: ContextShift[IO] = IO.contextShift(ec1)

def io(s: String) = IO(println(s"$s: ${Thread.currentThread.getName}"))

val r = cs
.evalOn(ec2)(io("A").flatMap(_ =>
  cs.evalOn(ec3)(io("B1")).flatMap(_ => io("B2"))
))
```
There are three distinct IOs, which we can refer to as "A", "B1" and "B2". Our intention is to run "A" on ec2 and then chain it into a mini-chain "B1" -> "B2". Thread pools are defined as follows:
- ContextShift is running on ec1
- "A" is running on ec2
- "B1" → "B2" is a flatmapped chain that follows after "A", and first part of that chain runs on ec3
The million dollar question is - which thread pool does "B2" run on? Answer: on ec1. This is very unintuitive. It would feel more natural if, after finishing "B1" on ec2, the follow-up "B2" would run on whatever "A" was running on. Instead, we fall back all the way to the default ExecutionContext that our ContextShift was initialised with.

In cats-effect 3, this will be fixed.

Fibers outside of Scala

Project Loom

Working with concurrent effects that has been described so far relies on the concept of fibers implemented by Scala libraries such as cats-effect, ZIO and Monix. There is an initiative, however, to move the fibers from custom library Scala code to the virtual machine itself.

Project Loom is a proposal for adding fibers to the JVM. This way, fibers would become native-level constructs which would exist on the call stack instead of as objects on the heap.

In Project Loom, fibers are called virtual threads. If you take a look at the basic description of a virtual thread, you will see that:

"It is a continuation and a scheduler that, together, make up a virtual thread. "

You might recall that we said the same thing in the continuations section:

This is what a fiber really is under the hood - it's a continuation with a scheduler.

Even though Loom's virtual threads are based on the same principles as the fiber mechanisms we explored in this article, there are still some implementation-specific details you would need to become familiar with. At the time of writing this text, latest update on Project Loom had some interesting information about that.

Green threads

Fibers are an implementation of green threads. Green threads are present in many languages. In some of them they are very similar to fibers, in some a bit different, but they all fit under the umbrella of "lightweight threads that are scheduled by a runtime library or a virtual machine, managed in the user space instead of in the kernel space, usually using cooperative instead of preemptive scheduling".

Here are some examples:

Kotlin Coroutines (this is a good doc)
Go Goroutines
Haskell green threads (don't have a better link)
Erlang processes
Julia Tasks
Common Lisp via green-threads library
And many others

References

Cats-effect documentation: https://typelevel.org/cats-effect/
Cats-effect repo: https://github.com/typelevel/cats-effect
Cats-effect 3 proposal: typelevel/cats-effect#634
Monix Task documentation: https://monix.io/docs/current/eval/task.html
Fabio Labella - How do Fibers work: https://www.youtube.com/watch?v=x5_MmZVLiSM
Pawel Jurczenko - Modern JVM Multithreading: https://pjurczenko.github.io/modern-jvm-multithreading.html
Bartłomiej Szwej - Composable resource management in Scala: https://medium.com/@bszwej/composable-resource-management-in-scala-ce902bda48b2
Daniel Spiewak's gist: https://gist.github.com/djspiewak/46b543800958cf61af6efa8e072bfd5c
Java Executors: https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Executors.html
Fork Join Pool vs Thread Pool Executor: http://www.h-online.com/developer/features/The-fork-join-framework-in-Java-7-1762357.html
Adam Warski about Loom: https://blog.softwaremill.com/will-project-loom-obliterate-java-futures-fb1a28508232?gi=c5487dba95ec
Loom update: http://cr.openjdk.java.net/~rpressler/loom/loom/sol1_part1.html
Loom-fiber repo: https://github.com/forax/loom-fiber

agaro1121 / concurrency-in-scala-with-ce Goto Github PK

concurrency-in-scala-with-ce's Introduction

Concurrency In Scala with Cats-Effect

Table of Contents

Introduction

Asynchronous boundary

Threading

Threads and thread pools

Java Executors

Scheduling and ExecutionContext

Cats IO basics

Introduction

Synchronous methods

Asynchronous methods (FFI)

Resource handling

Fibers

Definition

Continuations

Run loop

Cooperative yielding

ContextShift

Examples

IO vs Future

Leaking fibers

Summary

Cats-effect 3

Fibers outside of Scala

Project Loom

Green threads

References

concurrency-in-scala-with-ce's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org

Jobs