lewissbaker / cppcoro Goto Github PK

View Code? Open in Web Editor NEW

3.2K 3.2K 452.0 852 KB

A library of C++ coroutine abstractions for the coroutines TS

License: MIT License

CoffeeScript 2.26% C# 0.09% Batchfile 0.01% C++ 96.69% Shell 0.52% Python 0.43%

async async-await asynchronous-programming asyncio clang coroutines coroutines-ts cplusplus cpp linux msvc windows

cppcoro's People

Contributors

Stargazers

Watchers

Forkers

plumpmath alarouche yfinkelstein lunastorm mknejp gevarakelyan vincentlao osdi foreverhy mediabuff blapid yet afakihcpr neo5167 perrog richmondx sailfish009 modocache priteshrnandgaonkar think-cell bgidolov samehelnikety mwinterb toahabhuiyan qis kerorv selmar tavi-cacina bigo2050 yehezkelshb channgo2203 xpan magicod firejox linecode goudan-er bayonetta5 madmongo1 feng-y wanglch chentdxh buckaroo-pm woaitubage xxyyboy radhanathdas zhb1990 loopunit lofcek joemalle comfanter swpuzhang sg-lunch cwyark mmha msvetkin denchat mwang-lifesize alacazar anydream potatogim chinchilla-forest kevinmiles aimoonchen un1tz3r0 alinshans xc42 gaybro8777 qianqian121 yaoxinliu yshalabi zkas ssoft-hub dimitarpg13 tankijong mitsutaka-takeda davidwin gomez-addams brinkqiang2cpp igordzreyev rainmark akj7 andreadrian feiyunwill slxu wnxd anlongfei zhouchengming99 saurik vazgriz yangbo254 garcia6l20 maxihuesito tearshark powerudo renestein crixalis2013 ty-sutton blockspacer glynos d-walker

cppcoro's Issues

Deprecate and remove 'eager' task classes `task` and `shared_task`

Once we have a way to synchronously block waiting for a task in #27 and when_all in #10 there shouldn't be any need for the eagerly started task types any more and we can limit use to just the lazy task types (ie. lazy_task and shared_lazy_task)

One of the big motivations for getting rid of eagerly-started tasks is that it is difficult to write exception-safe code with use of eagerly-started tasks that aren't immediately co_awaited.

We don't want to leave dangling tasks/computation as this can lead to ignored/uncaught exceptions or unsynchronised access to shared resources.

For example, consider the case where we want to execute two tasks concurrently and wait for them both to finish before continuing, using their results:

task<A> do1();
task<B> do2();

task<> do1and2concurrently_unsafe()
{
  // The calls to do1() and do2() start the tasks eagerly but are potentially executed
  // in any order and either one of them could fail after the other has already started
  // a concurrent computation.
  auto [a, b] = cppcoro::when_all(do1(), do2());

  // use a and b ...
}

To implement this in an exception-safe way we'd need to modify the function as follows:

task<> do1and2concurrently_safe()
{
  // This implicitly starts executing the task and it executes concurrently with the
  // rest of the logic below until we co_await the task.
  task<A> t1 = do1();

  // Now we need to start the do2() task but since this could fail we need
  // to start it inside a try/catch so we can wait for t1 to finish before it
  // goes out of scope (we don't want to leave any dangling computation).
  task<B> t2;
  std::exception_ptr ex;
  try
  {
    // This call might throw, since it may need to allocate a coroutine frame
    // which could fail with std::bad_alloc.
    t2 = do2();
  }
  catch (...)
  {
    // Don't want to leave t1 still executing but we can't co_await inside catch-block
    // So we capture current exception and do co_await outside catch block.
    ex = std::current_exception();
  }

  if (ex)
  {
    // Wait until t1 completes before rethrowing the exception.
    co_await t1.when_ready();
    std::rethrow_exception(ex);
  }

  // Now that we have both t1 and t2 started successfully we can use when_all() to get the results.
  auto [a, b] = when_all(std::move(t1), std::move(t2));

  // use a, b ...  
}

Compared with lazy_task version which is both concise and exception-safe:

lazy_task<> do1();
lazy_task<> do2();

lazy_task<> do1and2concurrently()
{
  // The calls to do1() and do2() can still execute in any order but all they
  // do is allocate coroutine frames, they don't start any computation.
  // If the second call fails then the normal stack-unwinding will ensure the
  // first coroutine frame is destroyed. It doesn't need to worry about waiting
  // for the first task to complete. Either they both start or none of them do.
  auto [a, b] = cppcoro::when_all(do1(), do2());

  // use a and b ...
}

With a lazy_task, the task is either being co_awaited by some other coroutine or it is not executing (it has either not yet started, or has completed executing) and so the lazy_task is always safe to destruct and will free the coroutine frame it owns.

A side-benefit of using lazy_task everywhere is that it can be implemented without the need for std::atomic operations to synchronise coroutine completion and awaiter. This can have some potential benefits for performance by avoiding use of atomic operations for basic sequential flow of execution.

Add when_all() function for waiting for multiple tasks to complete

Add support for Windows thread-pool scheduler/io_context

The io_service class is currently a thin wrapper around Win32 IO completion-port and requires clients to manually manage their own threads to call process_events().

This can be fairly easily used to build a fixed-size thread-pool by spawning N threads and having each thread call io_service::process_events().

However, there are advantages to using the built-in Win32 thread-pool facilities to implement the thread-pool instead of manually spawning/managing your own pool of threads.

One benefit is that the OS is able to dynamically spin up more threads or shutdown idle threads as required.
Another benefit is that multiple libraries/frameworks can share the same underlying pool of threads without needing to go through the same API.

Otherwise, if your application uses, say 3 libraries, each of which create their own thread-pools, then you could end up with 3N threads being created across 3 separate thread-pools. If all of those libraries instead used the OS thread pool then a single pool of threads can be used to multiplex execution of tasks from each of those libraries.

Documentation clarification

in task.hpp there is

/// \brief
	/// A lazy task represents an asynchronous operation that is not started
	/// until it is first awaited.
	///
	/// When you call a coroutine that returns a task, the coroutine
	/// simply captures any passed parameters and returns exeuction to the
	/// caller. Execution of the coroutine body does not start until the
	/// coroutine is first co_await'ed.
	///
	/// Comparison with task<T>
	/// -----------------------
	/// The lazy task has lower overhead than cppcoro::task<T> as it does not
	/// require the use of atomic operations to synchronise potential races
	/// between the awaiting coroutine suspending and the coroutine completing.
	///
	/// The awaiting coroutine is suspended prior to the task being started
	/// which means that when the task completes it can unconditionally
	/// resume the awaiter.
	///
	/// One limitation of this approach is that if the task completes
	/// synchronously then, unless the compiler is able to perform tail-calls,
	/// the awaiting coroutine will be resumed inside a nested stack-frame.
	/// This call lead to stack-overflow if long chains of tasks complete
	/// synchronously.
	///
	/// The task<T> type does not have this issue as the awaiting coroutine is
	/// not suspended in the case that the task completes synchronously.
	template<typename T = void>
	class task

The section Comparison with task<T> is not clear: what is being compared with task<T>? This IS the task<T> definition after all.

Btw, what is the best venue to ask similar questions in the future?

Make when_all usable with arbitrary awaitables rather than only for task<T> and shared_task<T>

For example, allow passing a cppcoro::file_read_operation into the variadic when_all overload.

Also see if we can get rid of the need for intrusive get_starter() methods on task and shared_task and instead just create N temporary coroutines within when_all to use as the continuation for each of the awaitable operations.

If possible, I'd still like to retain the behaviour such that if when_all successfully returns a task without throwing std::bad_alloc that then you can guarantee that co_awaiting the returned task will start each of the async operations and wait until they all complete.

Add blocking `io_service::process_events_until_complete(task)` function

Once eager tasks are eliminated in #29 it will make it no longer possible to start executing a task on a single thread that enters the io_service::process_events() event loop.

The only way to start a task will be sync_wait(task) introduced in #27, however you can't then (easily) enter the process_events() event loop to process I/O completion events that are raised.

Adding an io_service::process_events_until_complete(task) function would allow starting a lazy_task and then entering the io_service event loop in such a way that it will exit from the event loop once the provided task completes.
eg.

lazy_task<> run(io_service& io);

int main()
{
  io_service io;
  io.process_events_until_complete(run());
  return 0;
}

Add ability to asynchronously wait for a duration of time

Add the ability to do something like:

cppcoro::task<> do_something(cppcoro::io_context ioCtx)
{
  using namespace std::chrono_literals;
  co_await ioCtx.schedule_after(30s);
}

io_service question

It would be good to cover the vision and strategy behind io_service in some doc or wiki. This is needed to have organized approach for supporting a generic platform.

Why program platform-specific IO polling backend directly in this project instead of integrating with one of omni-platform IO polling libraries such as libevent, libev, libuv, or boost asio.
Re boost in particular, as I understand it, Networking TS is based on boost asio, and it has been accepted for C++20(?). So, would it not be best to integrate with boost io_service from this TS?
IO polling is inevitably related to multithreading. There are several popular models: single-threaded event loop (ex: libevent), single threaded polling with multithreaded event processing (ASIO), fully multithreaded polling and processing (GRPC). Where is this project stand in this regard? Is there some specific set of goals in mind? Perhaps, there is a perimeter beyond which this project should not go and leave the rest for custom integrations?

Add a .natvis file to aid Visual Studio debugging of tasks, etc.

Make when_all_ready return task<tuple<expected<T>...>>

Rather than returning a collection (tuple/vector) of the original awaitable objects from when_all_ready, we should consider returning a collection of the results encapsulated in some kind of expected<T, std::exception_ptr> type.

This would allow us to generalise when_all_ready to support arbitrary awaitables rather than being restricted to just supporting task and shared_task awaitables.

It would also have the benefit of giving the caller synchronous access to the results without needing to subsequently re-co_await the awaitables to extract the values. This would allow use with awaitables that don't support being co_awaited multiple times.

Add support for std::optional<T> coroutine that allows unwrapping optional values using co_await

See https://github.com/toby-allsopp/coroutine_monad for inspiration.

std::optional<int> parse_int(const std::string_view& s);

std::optional<std::tuple<int, int>> parse_int_pair(const std::string_view& a, const std::string_view& b)
{
  co_return std::make_tuple(co_await parse_int(a), co_await parse_int(b));
}

Add recursive_generator<T> coroutine type

Add a new recursive_generator<T> type that allows for efficient enumeration over a sequence that is defined recursively.

This type would allow you to pass either a T or a recursive_generator<T> to the co_yield expression.
The increment operator on the iterator would then directly resume the leaf-most coroutine rather than the having to resume each coroutine in the stack until the leaf is resumed and then later suspend every coroutine on the stack for each item.

eg.

generator<directory_entry> list_directory(std::string path);

recursive_generator<directory_entry> recursive_list_directory(std::string path)
{
  for (auto& entry : list_directory(path))
  {
    co_yield entry;
    if (entry.is_directory())
    {
      co_yield recursive_list_directory(entry.path());
    }
  }
}

void usage()
{
  for (auto& entry : recursive_list_directory("foo/bar"))
  {
    std::cout << entry.path() << "\n";
  }
}

Add expected<T, E> type

This is needed for #41.

Also consider adding support for using co_await to unwrap an expected<T,E> value inside a function that returns expected<U,E>.
See https://github.com/toby-allsopp/coroutine_monad for inspiration.

Add generator<T> coroutine type

Add a new type cppcoro::generator<T> that allows you to write a coroutine that yields a sequence of values procedurally from within the coroutine using the co_yield keyword.

eg.

cppcoro::generator<int> range(int n)
{
  for (int i = 0; i < n; ++i) co_yield i;
}

// Outputs: 0 1 2 3 4 5 6 7 8 9
void usage()
{
  for (auto i : range(10))
  {
    std::cout << i << " ";
  }
}

Add .editorconfig and .clang-format files to help with consistent formatting by multiple contributors

From comment in #45.

Add blocking 'sync_wait(task)' function for synchronously waiting for a task to complete

Add support for asynchronously awaiting for a duration of time

Add io_context::schedule_after(std::chrono::duration<rep,ratio> duration) to suspend awaiter until specified duration has elapsed and resume on io context.

Add support for building with Clang under Windows

Add shared_task<T> and shared_lazy_task<T> classes

The ability to have multiple consumers wait on the result of a task is required for some scenarios.
eg. where you want to pass a prerequisite task into multiple sub-tasks that each need to await that task.

The task<T> and lazy_task<T> classes are move-only and support only a single awaiting coroutine at a time.

This issue is proposing to add a shared_task<T> class and a shared_lazy_task<T> class that support copy-construction and assignment with reference-counting semantics and support multiple concurrent awaiting coroutines.

It should be possible to implement in a lock-free fashion using std::atomic pointers.

Use a proper unit-testing system framework

The tests for cppcoro are currently written using plain functions and standard library asserts.
While these are functional, it would be nice to make use of a system that eliminated the boiler-plate code, made it easier to split tests across different source files, provided better reporting of tests and failures, command-line options for running individual tests.

Mac platform support

I tried to compile this library on Mac using the latest clang/c++ and cmake.
After excluding file io and win32 stuff similarly to how it's done for Linux, the only module that does not compile is lightweight_manual_reset_event.cpp for the reason that futex does not exist on Mac. A natural solve for this is to use std condition variable and mutex. It's not particularly "lightweight" but at least there won't be any compatibility issues which may be useful for other platforms.

I can contribute this code if this is seen useful. But the other part of this puzzle is making cake build work on Mac. Btw, any particular reason cake is used for this project instead of cmake or say scons which is similar ideologically as far as I can tell?

build problem: stddef.not found

Sorry if this is not the right venue for this question, but it looks like a problem with the cppcoro build.
On ubuntu 16.04,
I installed clang 6 using apt and manually build libc++ per your installation instructions.

yfinkelstein@ubuntu16:~/cppcoro$ ls -la /usr/lib/llvm-6.0/
total 48
drwxr-xr-x   8 root root  4096 Aug 23 13:49 .
drwxr-xr-x 166 root root 12288 Aug 23 16:19 ..
drwxr-xr-x   2 root root  4096 Aug 23 13:49 bin
drwxr-xr-x   2 root root  4096 Aug 23 13:49 build
lrwxrwxrwx   1 root root    14 Aug  6 21:32 cmake -> lib/cmake/llvm
drwxr-xr-x   4 root root  4096 Aug 23 15:31 include
drwxr-xr-x   4 root root 12288 Aug 23 15:31 lib
drwxr-xr-x   2 root root  4096 Aug 23 13:49 libexec
drwxr-xr-x   7 root root  4096 Aug 23 13:49 share

libc++ is installed under llvm:

yfinkelstein@ubuntu16:~/cppcoro$ ls -la /usr/lib/llvm-6.0/include/c++
total 12
drwxr-xr-x 3 root root 4096 Aug 23 15:31 .
drwxr-xr-x 4 root root 4096 Aug 23 15:31 ..
drwxr-xr-x 6 root root 4096 Aug 23 15:31 v1

But while building cppcoro I inevitably get this error:

/usr/bin/clang: failed with exit code -2
In file included from lib/async_auto_reset_event.cpp:6:
In file included from ./include/cppcoro/async_auto_reset_event.hpp:8:
In file included from /usr/lib/llvm-6.0/bin/../include/c++/v1/experimental/coroutine:50:
In file included from /usr/lib/llvm-6.0/bin/../include/c++/v1/new:89:
In file included from /usr/lib/llvm-6.0/bin/../include/c++/v1/exception:81:
/usr/lib/llvm-6.0/bin/../include/c++/v1/cstddef:44:15: fatal error: 'stddef.h' file not found
#include_next <stddef.h>
              ^~~~~~~~~~

I decided to also copy stdc++ headers under /usr/include just in case but that does not help.

Below are all versions of stddef.h that I have:

finkelstein@ubuntu16:~/cppcoro$ find /usr -name stddef.h
/usr/lib/llvm-6.0/lib/clang/6.0.0/include/stddef.h
/usr/lib/llvm-6.0/include/c++/v1/stddef.h
/usr/lib/gcc/x86_64-linux-gnu/5/include/stddef.h
/usr/lib/gcc/x86_64-linux-gnu/6/include/stddef.h
/usr/include/linux/stddef.h
/usr/include/c++/v1/stddef.h
/usr/src/linux-headers-4.4.0-92/include/uapi/linux/stddef.h
/usr/src/linux-headers-4.4.0-92/include/linux/stddef.h

I left the 2 key properties in config.cake intact:

  clangInstallPrefix = '/usr'

  # Set this to the install-prefix of where libc++ is installed.
  # You only need to set this if it is not installed at the same
  # location as clangInstallPrefix.
  libCxxInstallPrefix = None # '/path/to/install'

Is there something wrong with my setup?

Thanks!

P.S.
Your project is quite exciting and I'm really curious what kind of performance I would get with thousands of coroutines and multiple threads.

Add when_any() for waiting for at least one task to complete

The difficult part of designing when_any() will be how to handle cancellation of the co_await operations of the other tasks.

Currently, the task<T> and shared_task<T> types don't allow the caller to cancel the co_await operation once it has been awaited. We need to wait for the task to complete before the awaiting coroutine returns.

If the tasks themselves are cancellable, we could hook something up using cancellation_tokens.
eg. If we pass the same cancellation_token into each task then concurrently await all of the tasks and when any task completes, we then call request_cancellation() on the cancellation_source to request the other tasks to cancel promptly. Then we could just use when_all() to wait for all of the tasks.

To do this more generally, we'd need to be able to cancel the await operation on a task without necessarily cancelling the task itself. This would require a different data-structure in the promise object for keeping track of awaiters to allow unsubscribing an awaiter from that list in a lock-free way.
Maybe consider a similar data-structure to that used by cancellation_registration?

Add ability to specify scheduler that coroutine should be resumed on when awaiting various synchronisation operations

task<> f(some_scheduler& scheduler, async_mutex& mutex)
{
  auto lock = co_await mutex.scoped_lock_async();

  // Coroutine is now potentially executing on whatever execution context the
  // prior call to mutex.unlock() that released the mutex occurred.
  // We don't have any control over this here.

  // We can manually re-schedule ourselves for execution on a particular execution context.
  // This means that the mutex.unlock() call has resumed this coroutine only to immediately
  // suspend it again.
  co_await scheduler.schedule();

  // Also when the lock goes out of scope here and mutex.unlock() is called
  // we will be implicitly resuming the next coroutine that is waiting to
  // acquire the mutex. If that coroutine then unlocks the mutex without
  // suspending then it will recursively resume the next waiting coroutine, etc.,
  // blocking further execution of this coroutine until one of the lock holders
  // coroutines suspends its execution.
}

Some issues:

It could be more efficient to directly schedule the coroutine for resumption on the scheduler rather than resuming it and suspending it again.
This unconditionally re-schedules the coroutine, which may not be necessary if we were already executing on the right execution context before the acquiring the lock and we acquired the lock synchronously.

You could do something like this now to (mostly) solve (2):

task<> f(some_scheduler& scheduler, async_mutex& mutex)
{
  if (!mutex.try_lock())
  {
    // This might still complete synchronously if the lock was released
    // between call to try_lock and lock_async.
    co_await mutex.lock_async();

    // Only reschedule if we (probably) didn't acquire the lock synchronously.
    // NOTE: This needs to be noexcept to be exception-safe.
    co_await scheduler.schedule();
  }

  async_mutex_lock lock(mutex, std::adopt_lock);
}

For solving (1) I'm thinking of something like this:

task<> f(some_scheduler& scheduler, async_mutex& mutex)
{
  auto lock = co_await mutex.scoped_lock_async().resume_on(scheduler);

  // Or possibly more simply
  auto lock2 = co_await mutex.scoped_lock_async(scheduler);
}

It may be possible to make this a general facility that is applicable to other awaitables:

auto lock = co_await mutex.scoped_lock_async() | resume_on(scheduler);
// or
auto lock = co_await resume_on(scheduler, mutex.scoped_lock_async());

Add async_generator<T> type

Add a coroutine type that allows both co_yield and co_await.

Windows 8/Server 12 is minimum due to WaitOnAddress/WakeByAddressAll

I was doing some test integration of cppcoro into our product but linker errors meant we could not proceed due to the use of WaitOnAddress/WakeByAddressAll. We have a product with a significant number of users on post XP/2K3 but pre Windows 8 and we need to support those platforms.

NOTE: I will code a solution at some point if nothing official is implemented but I wanted to raise this as a potential issue.

Make async_mutex::lock_async behaviour more explicit by introducing a scoped_lock_async() method

The async_mutex::lock_async() method currently has the dual behaviour of allowing you to lock the mutex in such a way that requires manual unlocking (ie. no RAII) as well as allowing you to encapsulate the lock in an async_mutex_lock object that will ensure the lock is released when the lock object goes out of scope.

The behaviour is currently chosen based on whether the caller assigns the result of the co_await m.lock_async() expression to an async_mutex_lock variable.

task<T> f(async_mutex& m)
{
  // Needs manual unlocking
  co_await m.lock_async(); 
  m.unlock();
}

task<T> g(async_mutex& m)
{
  // Lock encapsulated in RAII object
  async_mutex_lock lock = co_await m.lock_async(); 
}

This seems potentially error-prone way of doing things.

Consider having lock_async() always require manual unlocking and introduce a new scoped_lock_async() that returns an async_mutex_lock object.

Don't yield sequence of mutable T& from generator<T> and async_generator<T>

Currently, the generator<T> and async_generator<T> classes return a mutable reference to the yielded value.

This can lead to unexpected consequences:

generator<int> ints(int end)
{
  for (int i = 0; i < end; ++i) co_yield i;
}

void usage()
{
  for (auto&& x : ints(100))
  {
    x *= 3; // Whoops, this just modified the 'i' variable in ints()
  }
}

This behaviour is potentially unexpected and could lead to bugs.

Perhaps a generator<T> should instead yield a sequence of const T&, or possibly just T?

I was initially avoiding having operator* returning a prvalue since it potentially penalises use-cases where the consumer does not need to take a copy of the value and where the generator would otherwise be able to reuse storage for the value. eg.

generator<std::string> read_lines(text_stream source)
{
  std::string value;
  while (true)
  {
    for (char c : source.chars())
    {
      if (c == '\n') break;
      value.push_back(c);
    }
    co_yield value;
    value.clear(); // clear contents, retaining capacity of string.
  }
}

void find_long_lines()
{
  int i = 1;
  // Don't want to force a copy/move of the string for each line here.
  for (const auto& line : read_lines(get_stream()))
  {
    if (line.size() > 80)
      std::cout << i << ": " << line << std::endl;
    ++i;
  }
}

However, making it return const T& could also penalise the consumers that do want to obtain the elements since it prevents the use of move-constructor. Perhaps in such cases the function should return a generator<T&&> which yields a sequence of T&& which would allow the consumer to decide whether or not to move the result.

Allow 'co_return task' as way of implementing tail-recursion of tasks

If a coroutine wants to return the value of another task as its result then you currently need to co_await the other task first and then co_return the result of that. eg.

lazy_task<T> foo(int n);

lazy_task<T> bar()
{
  co_return co_await foo(123);
}

task_task<> usage()
{
  lazy_task<T> t = bar();
  T result = co_await t;
}

However, this means that the coroutine frame of bar() remains alive until foo() completes as the bar() coroutine is registered as the continuation of the foo() task.

If the coroutine frame of bar() is large then this means we're not releasing the memory/resources of the bar coroutine frame as soon as we could which can lead to additional memory pressure in applications.

It is also potentially problematic for recursive tasks where the recursion depth could be arbitrarily deep, the memory consumption of the call chain could be unbounded.

It would be nice if instead we could perform a tail-recursion here by using co_return task instead of co_return co_await task. This would effectively have the semantics of moving the returned task into the lazy_task<T> object that is being awaited at the top of the call chain. ie. that the result of the returned task becomes the result of the current coroutine.

This would allow the current coroutine frame to be freed before resuming the returned task. This means that in a purely-tail-recursive set of tasks that we'd have at most 2 coroutine frames in existence at one time, even if the recursion had unbounded depth.

eg.

lazy_task<T> foo();

lazy_task<T> bar()
{
  co_return foo(123); // Don't co_await foo() task before returning.
}

task_task<> usage()
{
  lazy_task<T> t = bar();
  T result = co_await t;
}

The co_return foo(123); statement basically moves the lazy_task<T> value returned by foo() call into the t local variable being awaited in usage(). This frees the bar() coroutine frame before then resuming the foo() coroutine.

Make build system automatically run unit-tests during a build

Having to run each test .exe manually is error-prone and tedious.

Support running async cleanup operations when an async_generator is destroyed early

@ericniebler's recent blog entry, Ranges, Coroutines, and React: Early Musings on the Future of Async in C++, contained the following code-snippet that showed how async_generator<T> could be used to read a file in chunks:

auto async_file_chunk( const std::string& str ) -> async_generator<static_buf_t<1024>&>
{
  static_buf_t<1024> buffer;
 
  fs_t openreq;
  uv_file file = co_await fs_open(uv_default_loop(), &openreq, str.c_str(), O_RDONLY, 0);
  if (file > 0)
  {
    while (1)
    {
      fs_t readreq;
      int result = co_await fs_read(uv_default_loop(), &readreq, file, &buffer, 1, -1);
      if (result <= 0)
        break;

      buffer.len = result;
      co_yield buffer;
    }
    fs_t closereq;
    (void) co_await fs_close(uv_default_loop(), &closereq, file);
  }
}

Unfortunately, this snippet contains a flaw in that if the consumer stops consuming the elements of the async_generator before it reaches the end() of the sequence then the file handle will never be closed.

Currently, when the async_generator object is destructed it calls coroutine_handle<>::destroy() which destroys the coroutine frame and any object that was in-scope at the co_yield statement where the producer coroutine last produced a value. The coroutine does not get a chance to execute any code to clean up resources other than via running destructors. This allows you to perform synchronous cleanup operations via usage of RAII types but it means you can't perform any async cleanup operations (like the async fs_close example above, or gracefully shutting down a socket connection).

What if, instead of destroying the coroutine frame when the async_generator is destroyed we resume the generator coroutine but instead make the co_yield expression return an error-code (or throw a generation_cancelled exception)? This would then allow the coroutine to respond to the cancellation request and perform any cleanup operations.

For example, if the co_yield expression were to return an enum value of generator_op:::move_next or generator_op::cancel then the main-loop of the above snipped could have been modified thus:

auto async_file_chunk( const std::string& str ) -> async_generator<static_buf_t<1024>&>
{
  static_buf_t<1024> buffer;
 
  fs_t openreq;
  uv_file file = co_await fs_open(uv_default_loop(), &openreq, str.c_str(), O_RDONLY, 0);
  if (file > 0)
  {
    while (1)
    {
      fs_t readreq;
      int result = co_await fs_read(uv_default_loop(), &readreq, file, &buffer, 1, -1);
      if (result <= 0)
        break;

      buffer.len = result;

      // The next two lines are the only ones that have changed.
      generator_op yieldResult = co_yield buffer;
      if (yieldResult == generator_op::cancel) break;
    }
    fs_t closereq;
    (void) co_await fs_close(uv_default_loop(), &closereq, file);
  }
}

If the coroutine subsequently tried to execute another co_yield expression after it had been cancelled then the co_yield expression would complete immediately again with the generator_op::cancel result.

There are a couple of issues with this approach, however:

The first is that the syntax for writing a correct producer coroutine is now more complicated (you need to check return-value of co_yield expression).
This could be somewhat alleviated by either throwing an exception (which I'm also not liking that much since I don't like to use exceptions for expected/normal control-flow)
Or it could be an opt-in behaviour (eg. by constructing some scoped object on the stack, or otherwise communicating with the promise that you want to continue after a cancellation request)
The continued execution of the coroutine now represents a detached computation that you cannot synchronise against, or otherwise know when it will complete. eg. If it is closing a file asynchronously I may still want to know when the file has been closed so that I can delete it or open it again once the file lock has been released.

The second issue is perhaps the greater one here.

This could possibly be mitigated by requiring the caller to explicitly start the production of the generator by awaiting a task<> returned by the async_generator<T>::produce() member function.
With such a task, the generator would not start executing until both the produce() task and the task returned by begin() had been awaited. The caller would need to use something like when_all() to concurrently execute both tasks.

The producer task would not complete until the generator coroutine has run to completion.
The consumer task is free to destroy the async_generator and complete early without consuming the entire sequence. This would send a cancel request to the producer task so it can terminate quickly.

There are potential issues with composability of such an async_generator that I haven't yet worked through.

Add async network/socket capability

Extend I/O support to include support for sockets (at least tcp/ip and udp/ip protocols) using winsock and I/O completion ports on top of cppcoro::io_service.

Needs async methods for: accept, connect, disconnect, send/sendto, recv/recvfrom
Ideally also support gather-send and scatter-recv operations.

This will also need some abstraction for dealing with IP address (IPv4 + IPv6).
Need to look into what the networking TS provides towards this.

Add async_semaphore

Add async_manual_reset_event

Add a more general version of single_consumer_event that supports multiple concurrent awaiters.

Add 'transform' operator that can be applied to various monad types

Support applying a function that transforms value of type A to value of type B to:

task<A> to produce task<B> (etc. for other task types)
generator<A> to prouduce generator<B> (etc. for other generator types)

Example syntax:

B a_to_b(const A& a);
cppcoro::task<A> make_an_a();

cppcoro::task<B> b = make_an_a() | cppcoro::transform(a_to_b);

For task<T> this is kind of similar to the proposed std::future<T>::then() method.

Remove use of std::atomic in task and async_generator once symmetric transfer is available

The current implementation of task and async_generator make use of atomics to arbitrate between a potential race between the awaiting consumer suspending while waiting for a value to be produced and the completion of the next value.

An alternative approach which doesn't require use of atomic operations is to suspend the awaiting coroutine first, attach its coroutine_handle as the continuation of the task/async_generator and then resume the task/async_generator. When the task/async_generator produces a value, it calls .resume() on the coroutine handle. It can do this without needing any conditionals or atomic operations since the continuation is attached before the coroutine starts executing.

The problem with this alternative approach is that it can lead to stack-overflow if the coroutine produces its value synchronously and the awaiting coroutine awaits many tasks in a row that all complete synchronously. This is because the awaiting coroutine resumes the producer coroutine inside an await_suspend() method by calling coroutine_handle::resume(). If the producer coroutine produces a value synchronously then inside the call to coroutine_handle::resume() it will call await_suspend() at either a co_yield or final-suspend point and then resume the continuation by calling coroutine_handle::resume() on the awaiting coroutine.

For example, if you have a coroutine foo that continually awaits task<> bar() which completes synchronously then you can end up with a stack-trace that looks like:

foo [resume]
task::promise_type::final_awaiter::await_suspend()
bar [resume]
task::await_suspend()
foo [resume]
task::promise_type::final_awaiter::await_suspend()
bar [resume]
task::await_suspend()
foo [resume]
etc...

Under Clang optimised builds these calls can be performed as tail-calls which avoids the stack-overflow issue (they are all void-returning functions).
However this isn't guaranteed - Clang debug and both MSVC optimised and debug builds are not able to perform tail-calls here.

To work around this, the current implementation first resumes the task's coroutine and waits until it suspends. Then it checks to see if it completes synchronously and if so then returns false from await_suspend() to continue execution of the awaiting coroutine without incurring an extra stack-frame. However, this means we then need to use atomics to decide the race between the producer running to completion on another thread and the consumer suspending on the current thread.

@GorNishanov has recently added an extension to Clang that allows returning coroutine_handle from await_suspend() that will be immediately resumed, but this time with guarantee that the compiler will perform a tail-call resumption of the returned coroutine. This is called "symmetric transfer" and allow suspending one coroutine and resuming another without consuming any additional stack-space.

Once this capability is fully implemented in clang and is also available in MSVC we can get rid of the use of atomics in task and async_generator and make use of symmetric transfer to avoid stack-overflow instead. This should improve performance of these classes.

Add support for compiling with Visual Studio 2017

The Cake build system currently relies on the registry to determine whether a particular Visual Studio version is installed and if so where it is installed.

Once lewissbaker/cake#8 has been addressed in Cake then we need to update the config.cake for cppcoro to detect whether VS 2017 is available and if so then configure a new MsvcCompiler tool that makes use of it.

Add 'bind' operator for task types

Basic usage:

cppcoro::task<A> makeA();
cppcoro::task<B> foo(A a);

cppcoro::task<B> b = makeA() | cpporo:bind(foo);
cppcoro::task<B> b2 = cppcoro::bind(foo, makeA());

Variadic usage:

cppcoro::task<A> makeA();
cppcoro::task<B> makeB();
cppcoro::task<C> func(A a, B b);

cppcoro::task<C> c = cppcoro::bind(func, makeA(), makeB());

Or alternatively, should this be done with an apply operator composed with bind and when_all?

cppcoro::task<C> c =
  cppcoro::when_all(makeA(), makeB())
  | cppcoro::bind(cppcoro::apply(func));

// Equivalent to
cppcoro::task<C> c = [](cppcoro::task<A> a, cppcoro::task<B> b) -> cppcoro::task<C>
{
  co_return co_await std::apply(func, co_await cppcoro::when_all(std::move(a), std::move(b)));
}(makeA(), makeB());

Make sync_wait() support arbitrary awaitables not just task/shared_task

Add support for building with clang

@GorNishanov has been working on adding C++ coroutine support to clang and llvm.

The build system needs to be updated to add support for building this library with clang so that it can be tested on compilers other than MSVC.

As far as I can tell, not all of the changes have been upstreamed to clang yet so I'll need to use a custom build of clang in the mean-time.

Make lazy_task safe to use in a loop when the task completes synchronously

The lazy_task implementation currently unconditionally suspends the awaiter before starting execution of the task and then unconditionally resumes the awaiter when it reaches the final_suspend point.

This approach means that we don't need any synchronisation (ie. std::atomic usages) to coordinate awaiter and task.

However, it has the downside that if the lazy_task completes synchronously then the awaiter is recursively resumed. This can potentially consume a little bit of stack space every time a coroutine awaits the a lazy_task that completes synchronously if the compiler is not able to perform tail-call optimisation on the calls to void await_suspend() and void coroutine_handle<>::resume() (note that MSVC is not currently able to perform this tail-call optimisation and Clang only does this under optimised builds).

If the coroutine is awaiting lazy_task values in a loop and can possibly have a large number of these tasks complete synchronously then this could lead to stack-overflow.

eg.

lazy_task<int> async_one() { co_return 1; }

lazy_task<int> sum()
{
  int result = 0;
  for (int i = 0; i < 1'000'000; ++i)
  {
    result += co_await async_one();
  }
  co_return result;
}

The lazy_task implementation needs to be modified to not recursively resume the awaiter in the case that it completes synchronously. This, unfortunately means it's going to need std::atomic to decide the race between the awaiter suspending and the task completing.

Split documentation into separate files/sections

The documentation is currently all sitting in the README which is getting pretty long now.

The README should be kept pretty high-level and contain a table-of-contents with some motivating examples, status and build instructions.

API docs for individual classes/abstractions should be moved out to separate files under a top-level docs/ folder.

Add concurrency abstraction for server connection-handling workloads

The when_all concurrency primitive currently requires that all of the tasks are created up front whereas with server workloads we'll typically be dynamically starting new tasks to handle the connections as clients connect.

Consider adding some kind of task_pool or possibly when_all_ready overload that takes async_generator<task<>> and returns a task<> that can start new tasks as they are created but still ensure there is no potential for dangling tasks.

Decide on approach for handling memory allocation failure of coroutine frame

The default approach is for the call to the coroutine function to throw an exception itself if allocation of the coroutine frame fails (typically due to std::bad_alloc).

For coroutine types that return a task, should we instead implement the promise_type::get_return_object_on_allocation_failure method to return a special task value that defers throwing the std::bad_alloc exception until the returned task is awaited?

This would make it possible to declare coroutine functions as noexcept (provided captured parameter copies all have noexcept copy/move constructors).

Nested coroutines possible??

Hi there. Whilst trying to get my head around how coroutines work I had a naive program that ended up bring completely broken. I was wondering if cppcoro has the ability, or could be made to have the ability to build nested coroutines? Here's a toy, pseudo code example to highlight the concept I was trying to achieve.

SleepAwaitable sleep_some()
{
    return SleepAwaitable{1s};    // await_suspends launches thread and sleeps, then coro.resume()
}

SleepAwaitable sleep_launcher()
{
    co_await sleep_some();
}
DoAwaitable do_thing()
{
    co_await sleep_launcher();
    co_await sleep_launcher();   // never executed
}

int main()
{
    do_thing();
}

It's obvious now, but the SleepAwaitable{} thread resumes after co_await sleep_some then returns back to the thread. The do_thing progress has completely gone.

My main reason for asking as that I intend to have a coroutine of data transfer, but at some point I might need to pass control to another thread pool, have some work done (Potentially with it's own co_await's to other async operations) and then return to the original caller.

Thanks

Add support for Linux

While most of cppcoro is platform agnostic, some of the thread-pooling and I/O code is OS-specific.

This issue is for providing an implemention of the cppcoro I/O abstractions for Linux.

Will need to get it building under Clang as a first step. See #3.
Then:

Look into writing an epoll I/O event loop, based io_service implementation.
Port file and socket (once written) interfaces to Linux.
Add build-system support for Linux

Split async I/O functionality into a separate 'cppcoro_io' library

As discussed in #46 (comment) the async I/O facilities provided by cppcoro should be split out into a separate library to allow applications to make use of the generic, core components of cppcoro (task, async_generator, when_all etc.) without pulling in the platform-specific async I/O subsystems.

This will allow applications to more easily use other I/O frameworks like libuv, boost::asio or the Networking TS in conjunction with cppcoro.

Add an async disruptor/ring-buffer queue abstraction for buffered communication between producer/consumer coroutines

We have async_generator<T> that can be used for producer/consumer coroutines to communicate, however it has no buffering capability which means that the elements can only be generated and processed one at a time.

We should look at something similar to https://github.com/lewissbaker/disruptorplus but that uses co_await and coroutine suspension as a wait-strategy for handling the full/empty buffer cases.

Add support for a single-threaded producer as well as a multi-threaded producer.
Add a sequence_barrier abstraction.
Needs integration with schedulers to allow producer to resume consumer asynchronously.

I have some code kicking around in a side-project that I can port to cppcoro.

Add async_auto_reset_event

Add an implementation of async_auto_reset_event that allows multiple concurrent awaiters.

It differs from async_manual_reset_event in that a call to set() releases at one pending waiter rather than releasing all pending waiters.

Use custom allocator for when_all_ready_task objects in when_all_ready(std::vector<AWAITABLE>)

With the when_all_ready(std::vector<AWAITABLE>) overloads we need to allocate N coroutine frames, one for each awaitable in the list. As the number of coroutine frames that needs to be allocated is not known at compile time, the compiler will be unable to elide the allocation of the coroutine frames.

Rather than perform N separate memory allocations, it might be useful to use a custom allocator that allocates space for all N coroutine frames up front so that we only perform a single heap allocation.

We can't know the size of the coroutine frame at compile time so we'll need to initialise the allocator with the number of tasks that will need to be allocated. Then when the first task is created the allocator will be passed the coroutine frame size. We can then allocate a pool of size N * coroutine frame size and allocate subsequent coroutine frames from that pool.

lewissbaker / cppcoro Goto Github PK

cppcoro's People

Contributors

Stargazers

Watchers

Forkers

cppcoro's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs