cgarciae / pypeln Goto Github PK

View Code? Open in Web Editor NEW

1.5K 40.0 96.0 2.04 MB

Concurrent data pipelines in Python >>>

Home Page: https://cgarciae.github.io/pypeln

License: MIT License

Python 99.38% Shell 0.51% Dockerfile 0.11%

pypeln's Introduction

Pypeln

Pypeln (pronounced as "pypeline") is a simple yet powerful Python library for creating concurrent data pipelines.

Main Features

Simple: Pypeln was designed to solve medium data tasks that require parallelism and concurrency where using frameworks like Spark or Dask feels exaggerated or unnatural.
Easy-to-use: Pypeln exposes a familiar functional API compatible with regular Python code.
Flexible: Pypeln enables you to build pipelines using Processes, Threads and asyncio.Tasks via the exact same API.
Fine-grained Control: Pypeln allows you to have control over the memory and cpu resources used at each stage of your pipelines.

For more information take a look at the Documentation.

Installation

Install Pypeln using pip:

pip install pypeln

Basic Usage

With Pypeln you can easily create multi-stage data pipelines using 3 type of workers:

Processes

You can create a pipeline based on multiprocessing.Process workers by using the process module:

import pypeln as pl
import time
from random import random

def slow_add1(x):
    time.sleep(random()) # <= some slow computation
    return x + 1

def slow_gt3(x):
    time.sleep(random()) # <= some slow computation
    return x > 3

data = range(10) # [0, 1, 2, ..., 9] 

stage = pl.process.map(slow_add1, data, workers=3, maxsize=4)
stage = pl.process.filter(slow_gt3, stage, workers=2)

data = list(stage) # e.g. [5, 6, 9, 4, 8, 10, 7]

At each stage the you can specify the numbers of workers. The maxsize parameter limits the maximum amount of elements that the stage can hold simultaneously.

Threads

You can create a pipeline based on threading.Thread workers by using the thread module:

import pypeln as pl
import time
from random import random

def slow_add1(x):
    time.sleep(random()) # <= some slow computation
    return x + 1

def slow_gt3(x):
    time.sleep(random()) # <= some slow computation
    return x > 3

data = range(10) # [0, 1, 2, ..., 9] 

stage = pl.thread.map(slow_add1, data, workers=3, maxsize=4)
stage = pl.thread.filter(slow_gt3, stage, workers=2)

data = list(stage) # e.g. [5, 6, 9, 4, 8, 10, 7]

Here we have the exact same situation as in the previous case except that the worker are Threads.

Tasks

You can create a pipeline based on asyncio.Task workers by using the task module:

import pypeln as pl
import asyncio
from random import random

async def slow_add1(x):
    await asyncio.sleep(random()) # <= some slow computation
    return x + 1

async def slow_gt3(x):
    await asyncio.sleep(random()) # <= some slow computation
    return x > 3

data = range(10) # [0, 1, 2, ..., 9] 

stage = pl.task.map(slow_add1, data, workers=3, maxsize=4)
stage = pl.task.filter(slow_gt3, stage, workers=2)

data = list(stage) # e.g. [5, 6, 9, 4, 8, 10, 7]

Conceptually similar but everything is running in a single thread and Task workers are created dynamically. If the code is running inside an async task can use await on the stage instead to avoid blocking:

import pypeln as pl
import asyncio
from random import random

async def slow_add1(x):
    await asyncio.sleep(random()) # <= some slow computation
    return x + 1

async def slow_gt3(x):
    await asyncio.sleep(random()) # <= some slow computation
    return x > 3


def main():
    data = range(10) # [0, 1, 2, ..., 9] 

    stage = pl.task.map(slow_add1, data, workers=3, maxsize=4)
    stage = pl.task.filter(slow_gt3, stage, workers=2)

    data = await stage # e.g. [5, 6, 9, 4, 8, 10, 7]

asyncio.run(main())

Sync

The sync module implements all operations using synchronous generators. This module is useful for debugging or when you don't need to perform heavy CPU or IO tasks but still want to retain element order information that certain functions like pl.*.ordered rely on.

import pypeln as pl
import time
from random import random

def slow_add1(x):
    return x + 1

def slow_gt3(x):
    return x > 3

data = range(10) # [0, 1, 2, ..., 9] 

stage = pl.sync.map(slow_add1, data, workers=3, maxsize=4)
stage = pl.sync.filter(slow_gt3, stage, workers=2)

data = list(stage) # [4, 5, 6, 7, 8, 9, 10]

Common arguments such as workers and maxsize are accepted by this module's functions for API compatibility purposes but are ignored.

Mixed Pipelines

You can create pipelines using different worker types such that each type is the best for its given task so you can get the maximum performance out of your code:

data = get_iterable()
data = pl.task.map(f1, data, workers=100)
data = pl.thread.flat_map(f2, data, workers=10)
data = filter(f3, data)
data = pl.process.map(f4, data, workers=5, maxsize=200)

Notice that here we even used a regular python filter, since stages are iterables Pypeln integrates smoothly with any python code, just be aware of how each stage behaves.

Pipe Operator

In the spirit of being a true pipeline library, Pypeln also lets you create your pipelines using the pipe | operator:

data = (
    range(10)
    | pl.process.map(slow_add1, workers=3, maxsize=4)
    | pl.process.filter(slow_gt3, workers=2)
    | list
)

Run Tests

A sample script is provided to run the tests in a container (either Docker or Podman is supported), to run tests:

$ bash scripts/run-tests.sh

This script can also receive a python version to check test against, i.e

$ bash scripts/run-tests.sh 3.7

Related Stuff

Contributors

License

MIT

pypeln's People

Contributors

Stargazers

Watchers

Forkers

pankeshgupta benthomasson batermj 0xflotus greydoubt devhttps melaniemkwon abodacs shaunstanislauslau emuhedo gridl wuxiaobo shabadlamba matthieugonnet pursh2002 cappelchi nick-choudhary muke-sh belalmohsen dwtcourses vedsgit lokeshgithub knageswara78 canonhui prashant118 dthboyd zzygyx9119 onisimchukv esskay0000 khannaum showhilllee tezeo earlbabson reloadbrain yijxiang saudbinhabib galdamour meysamhamel bolaben iun1xmd5 disk-kk amir22010 sahanduiuc gregmuellegger harveyntt dickronez new07 eysdo yantoumu sackh promyloph arbazkiraak ondrocks lovepocky davidnet bryant1410 dafner maybeee18 isaacjoy spiffy1 umang2608 charlielito wijijo slowresonance noeltautges quarckster chscheller jmfontaine milpplim tebellox awesomedatatool drahnreb rodrigoalvort radiotherapyai webclinic017 phanak-sap crazyqipython ohadrubin lalo jamesanto jeffamaxey zzl221000 duonghieumai icodein emmanuel50-dev mikpim01 maiconq richardscottoz ebetica osforlife42 lehmaning yesuki chenfei5201213 hiroishida carlos-jimenez-mlops tapanhaz

pypeln's Issues

Create sync module

Complete API Refence / document code

freeze_support() RuntimeError on Mac OS

Hi,

I'm trying to simply import pypeln

import pypeln as pl

on Mac OS Catalina 10.15.5, Python 3.8, pypeln 0.4.3 and I get the following error:

Traceback (most recent call last):
  File "/Users/.../python3.8/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "/Users/.../python3.8/multiprocessing/spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/Users/.../python3.8/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "/Users/.../python3.8/runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/Users/.../python3.8/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/Users/.../python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/.../main.py", line 1, in <module>
    import pypeln as pl
  File "/Users/pisonir/anaconda3/envs/.../python3.8/site-packages/pypeln/__init__.py", line 4, in <module>
    from . import thread
  File "/Users/pisonir/anaconda3/envs/.../lib/python3.8/site-packages/pypeln/thread/__init__.py", line 34, in <module>
    from .api.concat import concat
  File "/Users/pisonir/anaconda3/envs/.../lib/python3.8/site-packages/pypeln/thread/api/concat.py", line 8, in <module>
    from .to_stage import to_stage
  File "/Users/pisonir/anaconda3/envs/.../lib/python3.8/site-packages/pypeln/thread/api/to_stage.py", line 5, in <module>
    from ..stage import Stage
  File "/Users/pisonir/anaconda3/envs/.../lib/python3.8/site-packages/pypeln/thread/stage.py", line 7, in <module>
    from . import utils
  File "/Users/pisonir/anaconda3/envs/.../lib/python3.8/site-packages/pypeln/thread/utils.py", line 11, in <module>
    MANAGER = multiprocessing.Manager()
  File "/Users/pisonir/anaconda3/envs/.../lib/python3.8/multiprocessing/context.py", line 57, in Manager
    m.start()
  File "/Users/pisonir/anaconda3/envs/.../lib/python3.8/multiprocessing/managers.py", line 579, in start
    self._process.start()
  File "/Users/pisonir/anaconda3/envs/.../lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Users/pisonir/anaconda3/envs/.../lib/python3.8/multiprocessing/context.py", line 283, in _Popen
    return Popen(process_obj)
  File "/Users/pisonir/anaconda3/envs/.../lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/pisonir/anaconda3/envs/.../lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/pisonir/anaconda3/envs/.../lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/Users/pisonir/anaconda3/envs/.../lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "/Users/pisonir/anaconda3/envs/.../lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.
        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:
            if __name__ == '__main__':
                freeze_support()
                ...
        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

Any help or suggestion?

I saw a previous issue but mine it's performed on Mac OS.

Thank you!

[documentation] thread.py/process.py docstring mix-up

Hey there! pypeln has proven really handy for a small/medium data concurrency task I'm dealing with by abstracting python concurrency the right amount, so thanks a bunch for that!

https://github.com/cgarciae/pypeln/blob/master/pypeln/thread.py#L1-L108

This is the same text as the one in process.py. It's slightly confusing, would you consider removing it until it's replaced by the proper description?

Finish readme

Add demo gif to README

Disclaimer: This is a bot

It looks like your repo is trending. The github_trending_videos Instgram account automatically shows the demo gifs of trending repos in Github.

Your README doesn't seem to have any demo gifs. Add one and the next time the parser runs it will pick it up and post it on its Instagram feed. If you don't want to just close this issue we won't bother you again.

[Bug] pytest-cov in dependencies

Describe the bug
pytest-cov should be specified in tool.poetry.dev-dependencies of pyproject.toml. I'm pretty sure that pytest-cov is not needed for running pypeln

Parallel reduce

Hey, are there any plans for a parallel reduce stage?

Create cleanup mechanism

implement __del__

Serialization of unpickable object on multiprocessing (e.g. cloudpickle)

One inherent hassle of python multiprocessing is pickling. Currently, pypeln does not consider this case, hence an error:

In [2]: list(([lambda i: i] * 2) | pl.process.map(lambda x:x, workers=2))   
Traceback (most recent call last):
  File "..../lib/python3.7/multiprocessing/queues.py", line 236, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "..../lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function <lambda> at 0x7f255a027f80>: attribute lookup <lambda> on __main__ failed
... (omitted) ...

Out [2]: []

(1) Why does it return a valid list? It should throw an Error rather than returning a "wrong" output.

(2) Can you add cloudpickle support to handle with non-pickleable objects?

For example, joblib supports cloudpickle: https://joblib.readthedocs.io/en/latest/auto_examples/serialization_and_wrappers.html

BrokenPipeError [Errno 32] when using process

First of all, love pypeln and thank you for your work.

Submitting this issue because even on the most basic scripts using process, like your Process example, raise a BrokenPipeError. I've tried pypeln versions 0.3.3 down to 0.2.0 in a clean venv with only pypeln & its requirements installed.

[Errno 32] Broken pipe Process Process-3: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "/Users/MYUSERNAME/.virtualenvs/pypeln-testl/lib/python3.7/site-packages/pypeln/process/stage.py", line 109, in run worker_namespace.done = True File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/managers.py", line 1127, in __setattr__ return callmethod('__setattr__', (key, value)) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/managers.py", line 818, in _callmethod conn.send((self._id, methodname, args, kwds)) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes self._send(header + buf) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 368, in _send n = write(self._handle, buf) BrokenPipeError: [Errno 32] Broken pipe

Please let me know if I can provide any further details. Unfortunately, I am not skilled enough to assist in the fix, hence why I lean on pypeln for multiprocessing and queuing :)

Showing progress with tqdm not working for pl.*.each

Hi,

I would like to use pl.thread.each as my stage is returning nothing but I would like to see some progress:

import pypeln as pl
from tqdm import tqdm

total = 3000


def f(x):
    i = 0
    for _ in range(5000):
        i += 1


stage = range(total)
stage = pl.thread.each(f, stage, workers=8, maxsize=2000)
stage = tqdm(stage, total=total, desc="pipeline")

pl.process.run(stage)

but progress is not working, showing:
pipeline: 0%| | 0/3000 [00:00<?, ?it/s]

With pl.thread.map it is working fine.

Create Diagram for the Readme + docs

Create high cpu benchmark

yield ordered items that finish in order immediately

For reference the description of ordered from the docs:
Creates a stage that sorts its elements based on their order of creation on the source iterable(s) of the pipeline. This stage will not yield until it accumulates all of the elements from the previous stage, use this only if all elements fit in memory.
It would be nice if ordered would yield items that finish in order immediately.

on_done is not called with on_start args

Hello Cristian,

In your last release you changed the way the callback functions work. The return values of on_start are not passed to on_done as input arguments anymore. I hope you didn't do it on purpose, that makes it hard to close open connections if a worker has finished.

Your old code:

args = params.on_start(worker_info)
params.on_done(stage_status, *args)

Your new code:

f_kwargs = self.on_start(**on_start_kwargs)
on_done_kwargs = {}
done_resp = self.on_done(**on_done_kwargs)

Implement observe function

def observe(f, every = utlis.TIMEOUT, only_changes = False):

Task timeout

Hi there,

Great project, thanks for your work!

Do you have any way to force the timeout on long running tasks?

pr.map(fn, stage, timeout=3)  # fn would time out after 3 seconds and skip the computation

could this lib support python 3.6?

install by poetry:

[SolverProblemError]
The current project must support the following Python versions: ^3.6

Because pypeln (0.3.0) requires Python >=3.7,<4.0

Improve error handling

Create guide

tdqm

Hello! First of all, amazing library, I am a huge fan. I was wondering how can I add tdqm (https://github.com/tqdm/tqdm) to pypeln to see the progress.

asyncio_task example fails on Jupyter Notebook

Maybe pypeln interferes with Jupyters own event loop, maybe I did something wrong. Do you have any idea?

RuntimeError: Task <Task pending coro=<_run_task() running at /opt/conda/lib/python3.7/site-packages/pypeln/asyncio_task.py:203> cb=[gather.<locals>._done_callback() at /opt/conda/lib/python3.7/asyncio/tasks.py:691]> got Future <Future pending> attached to a different loop

[Question] Could you help me with using tqdm

When i try to use tqdm with pypeln it's not working for me, could you please help ?

from aiohttp import ClientSession, TCPConnector
import asyncio
import pypeln as pl
from tqdm.asyncio import trange, tqdm

limit = 1
users = list(range(1,10))

async def fetch(users, session):
    None

async def main():
	async for i in trange(len(users)):
	   pl.task.each(
	      fetch,
	      users,
	      workers=limit,
	      on_start=lambda: dict(session=ClientSession(connector=TCPConnector(limit=None,ssl=False))),
	      on_done=lambda session: session.close(),
	    run=True,
	   )

asyncio.run(main())

When i run the code the progress bar is not progressing. could you tell me why it happens ?

[Bug] How to handle exceptions raised in parallelized function

Describe the bug
I would like to know how I can handle any exception that would occur in the function that I'm trying to parallelize

Minimal code to reproduce

#!/ust/bin/env python3

import pypeln as pl

def compute(x):
    if x == 3:
        raise ValueError("Value 3 is not supported")
    else:
        return x*x

data = [1, 2, 3, 4, 5]
stage = pl.process.map(compute, data, workers=4)

for x in stage:
    print(f"Result: {x}")

Results

Result: 1
Result: 4
Traceback (most recent call last):
  File "test.py", line 14, in <module>
    for x in stage:
  File "/home/wenzel/local/test_python/pypeln/venv/lib/python3.8/site-packages/pypeln/process/stage.py", line 83, in to_iterable
    for elem in main_queue:
  File "/home/wenzel/local/test_python/pypeln/venv/lib/python3.8/site-packages/pypeln/process/queue.py", line 48, in __iter__
    raise exception
ValueError: 

('Value 3 is not supported',)

Traceback (most recent call last):
  File "/home/wenzel/local/test_python/pypeln/venv/lib/python3.8/site-packages/pypeln/process/worker.py", line 99, in __call__
    self.process_fn(
  File "/home/wenzel/local/test_python/pypeln/venv/lib/python3.8/site-packages/pypeln/process/worker.py", line 186, in __call__
    self.apply(worker, elem, **kwargs)
  File "/home/wenzel/local/test_python/pypeln/venv/lib/python3.8/site-packages/pypeln/process/api/map.py", line 27, in apply
    y = self.f(elem.value, **kwargs)
  File "test.py", line 7, in compute
    raise ValueError("Value 3 is not supported")
ValueError: Value 3 is not supported

Expected behavior
I have no expected behavior.
instead, i was looking for a way to use the API and get some error recovery.
In this situation the whole pipeline is broken, and I'm not sure how to recover.

I'm trying to see if I can switch to your library, coming from concurrent.futures.

This is the operation i would like to do (demo with concurrent.futures):

class Downloader(AbstractContextManager):

    def __init__(self):
        # let Python decide how many workers to use
        # usually the best decision for IO tasks
        self._logger = logging.getLogger(f"{self.__class__.__module__}.{self.__class__.__name__}")
        self._dl_pool = ThreadPoolExecutor()
        self._future_to_obj: Dict[Future, FutureData] = {}
        self.stats = Counter()

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        self._dl_pool.shutdown()

    def submit(self, url: str, callback: Callable):
        user_data = (url, )
        future_data = FutureData(user_data, callback)
        future = self._dl_pool.submit(self._download_url, *user_data)
        self._future_to_obj[future] = future_data
        future.add_done_callback(self._on_download_done)
        self.stats["submitted"] += 1

    def _download_url(self, url: str) -> str:
         # this function might raise multiple network errors
         # .....
         return r.read()

    def _on_download_done(self, future: Future):
        try:
            future_data: FutureData = self._future_to_obj[future]
        except KeyError:
            self._logger.debug("Failed to find obj in callback for %s", future)
            self.stats["future_fail"] += 1
            return
        else:
            # call the user callback
            url, *rest = future_data.user_data
            try:
                data = future.result()
            except Exception:   # Here we have error recovery
                self._logger.debug("Error while fetching resource: %s", url)
                self.stats["fetch_error"] += 1
            else:
                future_data.user_callback(*future_data.user_data, data)
            finally:
                self.stats["total"] += 1

⬆️ TLDR I'm using add_done_callback in order to chain my futures into the next function and create a pipeline.
But as i'm dealing with Future objects, their exception is only raised when you try to access their result() (which is not the case with pypeln)

Library Info
0.4.6

Additional context
Add any other context about the problem here.

Thanks for your library, it looks amazing !

Usage as traditional pipe

Hi there 👍

I come from FSharp, can I use the Pipeline from your library as a replacement for it?

Not working with python 3.9

I tried the Tasks example code from the pypeln README but it fails:

Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 790, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/Users/sebastian/test/venv/lib/python3.9/site-packages/pypeln/__init__.py", line 4, in <module>
    from . import thread
  File "/Users/sebastian/test/venv/lib/python3.9/site-packages/pypeln/thread/__init__.py", line 34, in <module>
    from .api.concat import concat
  File "/Users/sebastian/test/venv/lib/python3.9/site-packages/pypeln/thread/api/concat.py", line 8, in <module>
    from .to_stage import to_stage
  File "/Users/sebastian/test/venv/lib/python3.9/site-packages/pypeln/thread/api/to_stage.py", line 5, in <module>
    from ..stage import Stage
  File "/Users/sebastian/test/venv/lib/python3.9/site-packages/pypeln/thread/stage.py", line 8, in <module>
    from .queue import IterableQueue, OutputQueues
  File "/Users/sebastian/test/venv/lib/python3.9/site-packages/pypeln/thread/queue.py", line 17, in <module>
    class PipelineException(tp.NamedTuple, BaseException):
  File "/usr/local/Cellar/[email protected]/3.9.0_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/typing.py", line 1820, in _namedtuple_mro_entries
    raise TypeError("Multiple inheritance with NamedTuple is not supported")
TypeError: Multiple inheritance with NamedTuple is not supported
python-BaseException

If I'm correct this has to do with python/cpython#19363

Version 0.4.0+ doesn't work on Python 3.6

You have added "from typing import Protocol" in the latest versions, and Protocol is not available in Python before version 3.8.
This is breaking some of our code that runs on Python 3.6 and until now assumed that as long as you are in version 0.* everything will be fine

Update medium blog post

Feature Request: Performance Tuning Output

I've been wondering if you had a way to help evaluate which stages were the bottlenecks. Something like if the queue fills up in a stage, report that in some performance tuning mode. Perhaps more workers need to be allocated to that stage (or fewer if there is too much context switching), more CPU should be allocated to the container, more RAM, etc. Currently it requires some manual intervention to attempt to ascertain where the bottlenecks are.

RuntimeError: Timeout context manager should be used inside a task

installed pypeln==0.3.3 running on Python3.7 MacOS 10.15.5

...
  File "/Users/ggao/miniconda3/lib/python3.7/site-packages/pypeln/task/api.py", line 552, in each
    for _ in stage:
  File "/Users/ggao/miniconda3/lib/python3.7/site-packages/pypeln/task/stage.py", line 234, in to_iterable
    raise error
RuntimeError: 

Original Traceback (most recent call last):
  File "/Users/ggao/miniconda3/lib/python3.7/site-packages/pypeln/task/stage.py", line 80, in run
    **{key: value for key, value in kwargs.items() if key in self.f_args}
  File "/Users/ggao/miniconda3/lib/python3.7/site-packages/pypeln/task/stage.py", line 53, in process
    await tasks.put(task)
  File "/Users/ggao/miniconda3/lib/python3.7/site-packages/pypeln/task/utils.py", line 58, in join
    await asyncio.gather(*self.tasks)
  File "/Users/ggao/miniconda3/lib/python3.7/asyncio/tasks.py", line 442, in wait_for
    return fut.result()
  File "/Users/ggao/miniconda3/lib/python3.7/site-packages/pypeln/task/api.py", line 471, in apply
    y = await y
  File "/Users/ggao/github/intuit/finpal-qb/lambda-notifications/eventbus.py", line 105, in send
    self.URL, data=data, raise_for_status=True
  File "/Users/ggao/miniconda3/lib/python3.7/site-packages/aiohttp/client.py", line 1012, in __aenter__
    self._resp = await self._coro
  File "/Users/ggao/miniconda3/lib/python3.7/site-packages/aiohttp/client.py", line 426, in _request
    with timer:
  File "/Users/ggao/miniconda3/lib/python3.7/site-packages/aiohttp/helpers.py", line 579, in __enter__
    raise RuntimeError('Timeout context manager should be used '

I have my code as

self.session = aiohttp.ClientSession(
            headers=strategy.header, connector=aiohttp.TCPConnector(limit=None)
        )

def event_gen(self) -> Iterator[str]:
       for i, row in enumerate(self.flo):
            data = make_event(row, self.timestamp, self.uuid)
            yield data

async def send(
        self, data: str, session: aiohttp.ClientSession
    ) -> str:
        try:
            async with session.post(
                self.URL, data=data, raise_for_status=True
            ) as response:
                return await response.text()
        except aiohttp.ClientConnectionError or aiohttp.ServerConnectionError as e:
            self.logger.error(f"Unable to sent {data} due to exception: {e}")
            raise

  def run(self):
        data = event_gen()
        pl.task.each(
            send,
            data,
            workers=self.batch_size,
            on_start=lambda: dict(session=self.session),
            on_done=lambda session: session.close(),
            timeout=5,
            run=True,
        )

maxsize not being respected for process.map

Hello.
First of all. Let me just say that you changed my world yesterday when I found pypeln. I've wanted exactly this for a very long time. Thank you for writing it!!

Since I'm a brand new user, I might be misunderstanding, but I think I may have found a bug. I am running the following

conda python 3.6.8
pypeln==0.4.4
Running in Jupyter Lab with the following installed to view progress bars

pip install ipywidgets
jupyter labextension install @jupyter-widgets/jupyterlab-manager

Here is the code I am running

from tqdm.auto import tqdm
import pypeln as pyp
import time

in_list = list(range(300))
bar1 = tqdm(total=len(in_list), desc='stage1')
bar2 = tqdm(total=len(in_list), desc='stage2')
bar3 = tqdm(total=len(in_list), desc='stage3')

def func1(x):
    time.sleep(.01)
    bar1.update()
    return x

def func2(x):
    time.sleep(.2)
    return x
    
def func2_monitor(x):
    bar2.update()
    return x
    
def func3(x):
    time.sleep(.6)
    bar3.update()
    return x

(
    in_list
    | pyp.thread.map(func1, maxsize=1, workers=1)
    | pyp.process.map(func2, maxsize=1, workers=2)
    | pyp.thread.map(func2_monitor, maxsize=1, workers=1)
    | pyp.thread.map(func3, maxsize=1, workers=1)
    | list
    
);

This code runs stages while showing progress bars of when each node has processed data. Here is what I am seeing.

It appears that the first stage is consuming the entire source without respecting the maxsize argument. If this is expected behavior, I would like to understand more.

Thank you.

[Bug] maxsize not being respected for thread.map

Describe the bug
maxsize is not being respected so I encountered memory leaks because I was using it to load some big numpy arrays.

Minimal code to reproduce
Small snippet that contains a minimal amount of code.

import time

import numpy as np
import pypeln as pl
from tqdm import tqdm

N = 100000
bar1 = tqdm(total=N, desc="get arrays")


def get_arr(x):
    time.sleep(0.05)
    bar1.update()
    return np.random.random((1000, 1000, 3))


task = pl.thread.map(get_arr, list(range(N)), maxsize=3)

for image in tqdm(task, total=N, desc="Processing array"):
    # process array
    time.sleep(0.5)

Library Info
Please provide os info and elegy version.
pypeln version: 0.4.6

Support shared memory between processes for Numpy arrays

Numpy arrays are pickled when sent between processes, this can add an overhead for large arrays. Adding the ability to by pass the serialization via shared memory can improve performance.

Refactor Stream to Stage

[Bug] TypeError: Multiple inheritance with NamedTuple is not supported

Describe the bug
Exception during import pypeln .
For testing this bug i am creating a new virtual environment and installing pypeln in it using the tool python-poetry.

Minimal code to reproduce

% poetry new test_pypeln
Created package test_pypeln in test_pypeln

% cd test_pypeln 

% poetry add pypeln
Creating virtualenv test-pypeln-vKtIMPjL-py3.9 in /home/lks/.cache/pypoetry/virtualenvs
Using version ^0.4.7 for pypeln

Updating dependencies
Resolving dependencies... (0.8s)

Writing lock file

Package operations: 12 installs, 0 updates, 0 removals

  • Installing pyparsing (2.4.7)
  • Installing attrs (20.3.0)
  • Installing more-itertools (8.7.0)
  • Installing packaging (20.9)
  • Installing pluggy (0.13.1)
  • Installing py (1.10.0)
  • Installing wcwidth (0.2.5)
  • Installing coverage (5.5)
  • Installing pytest (5.4.3)
  • Installing pytest-cov (2.11.1)
  • Installing stopit (1.1.2)
  • Installing pypeln (0.4.7)

% poetry run python
Python 3.9.2 (default, Feb 20 2021, 18:40:11) 
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pypeln
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/lks/.cache/pypoetry/virtualenvs/test-pypeln-vKtIMPjL-py3.9/lib/python3.9/site-packages/pypeln/__init__.py", line 4, in <module>
    from . import thread
  File "/home/lks/.cache/pypoetry/virtualenvs/test-pypeln-vKtIMPjL-py3.9/lib/python3.9/site-packages/pypeln/thread/__init__.py", line 34, in <module>
    from .api.concat import concat
  File "/home/lks/.cache/pypoetry/virtualenvs/test-pypeln-vKtIMPjL-py3.9/lib/python3.9/site-packages/pypeln/thread/api/concat.py", line 8, in <module>
    from .to_stage import to_stage
  File "/home/lks/.cache/pypoetry/virtualenvs/test-pypeln-vKtIMPjL-py3.9/lib/python3.9/site-packages/pypeln/thread/api/to_stage.py", line 5, in <module>
    from ..stage import Stage
  File "/home/lks/.cache/pypoetry/virtualenvs/test-pypeln-vKtIMPjL-py3.9/lib/python3.9/site-packages/pypeln/thread/stage.py", line 8, in <module>
    from .queue import IterableQueue, OutputQueues
  File "/home/lks/.cache/pypoetry/virtualenvs/test-pypeln-vKtIMPjL-py3.9/lib/python3.9/site-packages/pypeln/thread/queue.py", line 15, in <module>
    class PipelineException(tp.NamedTuple, BaseException):
  File "/usr/lib/python3.9/typing.py", line 1881, in _namedtuple_mro_entries
    raise TypeError("Multiple inheritance with NamedTuple is not supported")
TypeError: Multiple inheritance with NamedTuple is not supported
>>>

Expected behavior
import successful, no output

Library Info
pypeln (0.4.7)

Create high io benchmark

tasks.put with a coroutine that returns a value

Hello,

How do you get a return value from fetch when it is wrapped in tasks.put?

example from https://medium.com/@cgarciae/making-an-infinite-number-of-requests-with-python-aiohttp-pypeln-3a552b97dc95

async def _main(url, total_requests):
    connector = TCPConnector(limit=None)
    async with ClientSession(connector=connector) as session, TaskPool(limit) as tasks:

        for i in range(total_requests):
              await tasks.put(fetch(url.format(i), session))

Optimize pr._from_iterable

Make this Stage optionally create a Thread instead of a Process
Add the Stage.worker_constructor field
Make this method public

This aviod having to pickle the iterable since threads share memory.

Install requires should include 'six'

I just wanted to give it a quick test in a fresh new 3.6 virtualenv:

▶ pip install pypeln
Collecting pypeln
  Downloading https://files.pythonhosted.org/packages/e4/e7/49cffe147b72ebcf4dd7964f409c5b225f2173265b0c12fa9e3fe295d956/pypeln-0.1.6.tar.gz
Building wheels for collected packages: pypeln
  Running setup.py bdist_wheel for pypeln ... done
  Stored in directory: /Users/peterbe/Library/Caches/pip/wheels/08/f7/35/53e47573a9e6893ceca469c0a8115cee5b33672cefe12c50bd
Successfully built pypeln
Installing collected packages: pypeln
Successfully installed pypeln-0.1.6

▶ python dummy.py
Traceback (most recent call last):
  File "dummy.py", line 1, in <module>
    from pypeln import thread as th
  File "/private/tmp/venv/lib/python3.6/site-packages/pypeln/__init__.py", line 10, in <module>
    from . import thread
  File "/private/tmp/venv/lib/python3.6/site-packages/pypeln/thread.py", line 143, in <module>
    from six.moves.queue import Queue, Empty, Full
ModuleNotFoundError: No module named 'six'

ordered in pypeln.task is not always ordered

Hi,
First of all, I would like to thank you for writing such a versatile, powerful and yet easy to use library for working with concurrent data pipelines.
One of my office projects had an use case where I needed to make multiple independent post requests to a REST API with certain payloads. We chose pypeln module for making multiple concurrent requests. As we required API responses in the same order of the post requests, we tried using pypeln.task.ordered, but the received responses were not always in the same order as expected.

Therefore I experimented with the following piece of code:

import pypeln as pl
import asyncio
from random import random

async def slow_add1(x):
    await asyncio.sleep(random())
    return x+1

async def main():
    data = range(20)
    stage = pl.task.map(slow_add1, data, workers=1, maxsize=4)
    stage = pl.task.ordered(stage)
    out = await stage

    print("Output: ", out)

for i in range(15):
    print("At Iteration:",i)
    asyncio.run(main())

I obsereved the results over multiple runs & found that the responses are not always in proper order.
One such sample output is:

Please notice that output for iteration 3 as well as 11 is out of order (others are OK).
Since I am a new user, I might be misunderstanding something here.
My doubt is that, doesn't pypeln.task.ordered ensures that the response received would be in same order as in request, irrespective of uneven/unequal processing time of requests? Am I missing something here ?

TypeError: 'coroutine' object is not iterable in pypeln 0.4.2

I have a scrapper that uses pypeln for opening different pages concurrently. Nonetheless, with the latest version the production code broke, I had to stick with version 0.3.0. A minmal version of the code looks like the following:

import asyncio
import typing as tp
from copy import copy

import pypeln as pl
import pyppeteer

async def basic(result: dict) -> tp.AsyncIterable[dict]:
    result = copy(result)
    yield result


async def process_url(
    result: dict, search_contact: bool = True
) -> tp.AsyncIterable[dict]:

    results = basic(result)

    async for result in results:
        yield result


async def scrape() -> tp.AsyncIterable[dict]:
    async def get_urls(page_offset):
        browser = await pyppeteer.launch(headless=True)
        page = await browser.newPage()
        await page.goto(
            f"https://google.com/search?q=some python example&num=10&start={page_offset}",
        )
        search_results = await page.evaluate(
            """() => {
                var search_rows = Array.from(document.querySelectorAll("div.r"));
                var data = search_rows.map((row, index) => {
                    //Get the url for this search result
                    return { "url": row.querySelector("a").href, "rank": index }
                });
                return (data)
            }
            """
        )
        return [
            dict(url=element["url"], rank=int(element["rank"]) + int(page_offset))
            for element in search_results
        ]

    offsets = [i * 10 for i in range(1)]
    search_urls = pl.task.flat_map(get_urls, offsets, workers=10)
    stage = pl.task.flat_map(
        lambda url_obj: process_url(url_obj), search_urls, workers=10,
    )

    async for result in stage:
        yield result


async def main():
    results = scrape()
    data = []
    async for result in results:
        data.append(result)

    print(data)

asyncio.get_event_loop().run_until_complete(main())

In version 0.4.2 I get the following error:

/home/charlie/miniconda3/lib/python3.7/asyncio/base_events.py:1776: RuntimeWarning: coroutine 'scrape.<locals>.get_urls' was never awaited
  handle = None  # Needed to break cycles when an exception occurs.
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Traceback (most recent call last):
  File "test_bug.py", line 68, in <module>
    asyncio.get_event_loop().run_until_complete(main())
  File "/home/charlie/miniconda3/lib/python3.7/asyncio/base_events.py", line 584, in run_until_complete
    return future.result()
  File "test_bug.py", line 62, in main
    async for result in results:
  File "test_bug.py", line 55, in scrape
    async for result in stage:
  File "/home/charlie/data/snappr/pg-scraper/.venv/lib/python3.7/site-packages/pypeln/task/stage.py", line 98, in to_async_iterable
    async for elem in main_queue:
  File "/home/charlie/data/snappr/pg-scraper/.venv/lib/python3.7/site-packages/pypeln/task/queue.py", line 79, in __aiter__
    raise exception
TypeError: 

("'coroutine' object is not iterable",)

Traceback (most recent call last):
  File "/home/charlie/data/snappr/pg-scraper/.venv/lib/python3.7/site-packages/pypeln/task/worker.py", line 100, in __call__
    for key, value in kwargs.items()
  File "/home/charlie/data/snappr/pg-scraper/.venv/lib/python3.7/site-packages/pypeln/task/worker.py", line 248, in __aexit__
    await self.join()
  File "/home/charlie/data/snappr/pg-scraper/.venv/lib/python3.7/site-packages/pypeln/task/worker.py", line 242, in join
    await asyncio.gather(*self.tasks)
  File "/home/charlie/data/snappr/pg-scraper/.venv/lib/python3.7/site-packages/pypeln/task/worker.py", line 222, in get_task
    await coro
  File "/home/charlie/data/snappr/pg-scraper/.venv/lib/python3.7/site-packages/pypeln/task/api/flat_map.py", line 38, in apply
    for i, y in enumerate(ys):
TypeError: 'coroutine' object is not iterable

multiprocessing.Manager issue

Windows 10 x64, Python 3.6.6 (Anaconda), VS Code.
I've tried to run several examples but it gives error on pypeln import line:
from pypeln import process as pr :

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Anaconda3\lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "C:\Anaconda3\lib\multiprocessing\spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Anaconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "C:\Anaconda3\lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "C:\Anaconda3\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "C:\Anaconda3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "f:\python\projects\multiproc\pypeline.py", line 26, in <module>
    from pypeln import thread as th
  File "C:\Anaconda3\lib\site-packages\pypeln\__init__.py", line 9, in <module>
    from . import process
  File "C:\Anaconda3\lib\site-packages\pypeln\process.py", line 132, in <module>
    _MANAGER = Manager()
  File "C:\Anaconda3\lib\multiprocessing\context.py", line 56, in Manager
    m.start()
  File "C:\Anaconda3\lib\multiprocessing\managers.py", line 513, in start
    self._process.start()
  File "C:\Anaconda3\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "C:\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Anaconda3\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()
  File "C:\Anaconda3\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

I also tried to put pypeln import inside if __name__ == '__main__': block but then child processes give the same error.

freeze_support() line to be omitted..

I'm getting this error

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Program Files\Python37\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Program Files\Python37\lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "C:\Program Files\Python37\lib\multiprocessing\spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Program Files\Python37\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "C:\Program Files\Python37\lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "C:\Program Files\Python37\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "C:\Program Files\Python37\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\HARDIPINDER\w00rkspace\tufin\client-pypeln-pl.task.py", line 6, in <module>        
    import pypeln as pl
  File "C:\Users\HARDIPINDER\w00rkspace\tufin\lib\site-packages\pypeln\__init__.py", line 4, in <module>
    from . import thread
  File "C:\Users\HARDIPINDER\w00rkspace\tufin\lib\site-packages\pypeln\thread\__init__.py", line 34, in <module>
    from .api.concat import concat
  File "C:\Users\HARDIPINDER\w00rkspace\tufin\lib\site-packages\pypeln\thread\api\concat.py", line 8, in <module>
    from .to_stage import to_stage
  File "C:\Users\HARDIPINDER\w00rkspace\tufin\lib\site-packages\pypeln\thread\api\to_stage.py", line 5, in <module>
    from ..stage import Stage
  File "C:\Users\HARDIPINDER\w00rkspace\tufin\lib\site-packages\pypeln\thread\stage.py", line 7, in 
<module>
    from . import utils
  File "C:\Users\HARDIPINDER\w00rkspace\tufin\lib\site-packages\pypeln\thread\utils.py", line 11, in <module>
    MANAGER = multiprocessing.Manager()
  File "C:\Program Files\Python37\lib\multiprocessing\context.py", line 56, in Manager
    m.start()
  File "C:\Program Files\Python37\lib\multiprocessing\managers.py", line 563, in start
    self._process.start()
  File "C:\Program Files\Python37\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)
  File "C:\Program Files\Python37\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Program Files\Python37\lib\multiprocessing\popen_spawn_win32.py", line 46, in __init__   
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Program Files\Python37\lib\multiprocessing\spawn.py", line 143, in get_preparation_data  
    _check_not_importing_main()
  File "C:\Program Files\Python37\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
Traceback (most recent call last):
  File "client-pypeln-pl.task.py", line 6, in <module>
    import pypeln as pl
  File "C:\Users\HARDIPINDER\w00rkspace\tufin\lib\site-packages\pypeln\__init__.py", line 4, in <module>
    from . import thread
  File "C:\Users\HARDIPINDER\w00rkspace\tufin\lib\site-packages\pypeln\thread\__init__.py", line 34, in <module>
    from .api.concat import concat
  File "C:\Users\HARDIPINDER\w00rkspace\tufin\lib\site-packages\pypeln\thread\api\concat.py", line 8, in <module>
    from .to_stage import to_stage
  File "C:\Users\HARDIPINDER\w00rkspace\tufin\lib\site-packages\pypeln\thread\api\to_stage.py", line 5, in <module>
    from ..stage import Stage
  File "C:\Users\HARDIPINDER\w00rkspace\tufin\lib\site-packages\pypeln\thread\stage.py", line 7, in 
<module>
    from . import utils
  File "C:\Users\HARDIPINDER\w00rkspace\tufin\lib\site-packages\pypeln\thread\utils.py", line 11, in <module>
    MANAGER = multiprocessing.Manager()
  File "C:\Program Files\Python37\lib\multiprocessing\context.py", line 56, in Manager
    m.start()
  File "C:\Program Files\Python37\lib\multiprocessing\managers.py", line 567, in start
    self._address = reader.recv()
  File "C:\Program Files\Python37\lib\multiprocessing\connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "C:\Program Files\Python37\lib\multiprocessing\connection.py", line 306, in _recv_bytes      
    [ov.event], False, INFINITE)

I was trying the code from [https://medium.com/@cgarciae/making-an-infinite-number-of-requests-with-python-aiohttp-pypeln-3a552b97dc95] and was to hoping to get an error for server not on or something but it throwed me this.

I tried adding the call to each task in if clause but still this error.

if name == 'main':
pl.task.each(
fetch,
urls,
workers=limit,
on_start=lambda: dict(session=ClientSession(connector=TCPConnector(limit=None))),
on_done=lambda session: session.close(),
run=True,
)

can you please help out where I might be going wrong?

Create a buffering stage

Love the package! Thanks for writing it.

I have a question that I've spent about a day poking at without any good ideas. I'd like to make a stage that buffers and batches records from previous batches. For example, let's say I have an iterable that emits records and a map stage that does some transformation to each record. What I'm looking for is a stage that would combine records into groups of, say, 100 for batch processing. In other words:

>>> (
    range(100)
    | aio.map(lambda x: x)
    | aio.buffer(10)  # <--- This is the functionality I'm looking for
    | aio.map(lambda x: sum(x))
    | list
)
[45, 145, 245, ...]

Is this at all possible?

Thanks!

[Question] Retrieve results from TaskPool

Hi! Cool lib :)

Quick Q, how does one gather the results of tasks from the TaskPool?

Thanks!

[documentation] Shouldn't it be Pypeln instead of Pypeline?

In the docs we can read parts like «Pypeline is a simple yet powerful python library» or «A Pypeline pipeline»...

Wouldn't it better to use the name of the package everywhere? «*Pypeln is a simple yet powerful python library *» or «A Pypeln pipeline»...

Thanks for the awesome API!

Feel free to close if I'm completely muddled :)

Make ordered as the default behavior

map() preserving the order is much more intuitive behavior. Python's builtin Pool executor, ray, joblib, etc. all work in such a way.

I realized that one can still pipe to pl.process.ordered, but the documentation is limited and this is quite difficult to use.

def slow_identity(x):
   time.sleep(random.random())
   return x

s = list(range(100)) | pl.process.map(slow_identity, workers=N)
list(s)     # should be ordered by default

optimize io module

http://www.artificialworlds.net/blog/2017/06/12/making-100-million-requests-with-python-aiohttp/comment-page-1/#comment-186106

worker start and end mechanism

https://www.reddit.com/r/Python/comments/9iexxi/pypeline_a_python_library_that_enables_you_to/e6jo6z8

cgarciae / pypeln Goto Github PK

pypeln's Introduction

Pypeln

Main Features

Installation

Basic Usage

Processes

Threads

Tasks

Sync

Mixed Pipelines

Pipe Operator

Run Tests

Related Stuff

Contributors

License

pypeln's People

Contributors

Stargazers

Watchers

Forkers

pypeln's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs