GithubHelp home page GithubHelp logo

niedakh / pqdm Goto Github PK

View Code? Open in Web Editor NEW
253.0 4.0 8.0 89 KB

Comfortable parallel TQDM using concurrent.futures

Home Page: https://pqdm.readthedocs.io/en/latest/

License: MIT License

Makefile 16.71% Python 83.29%
parallel-computing tqdm concurrent-futures python progress-bar

pqdm's Introduction

Parallel TQDM

Documentation Status Updates

PQDM is a TQDM and concurrent futures wrapper to allow enjoyable paralellization of iterating through an Iterable with a progress bar.

Install & Use

To install

pip install pqdm

and use

from pqdm.processes import pqdm
# If you want threads instead:
# from pqdm.threads import pqdm

args = [1, 2, 3, 4, 5]
# args = range(1,6) would also work

def square(a):
    return a*a

result = pqdm(args, square, n_jobs=2)

For more examples variants check the Usage section of the docs.

Features

  • parellize your tqdm runs using processes or threads thanks to concurrent.futures,
  • just import pqdm from pqdm.threads or pqdm.processes to start,
  • automatic usage of tqdm.notebook when iPython/Jupyter notebook environment detected, custom tqdm class accepted
  • automatic parsing of pqdm kwargs and separating between concurrent.Executor args and tqdm args,
  • support for any iterable and passing items as kwargs, args or directly to function which is being applied
  • support bounded exectutors via https://github.com/mowshon/bounded_pool_executor

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

pqdm's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

pqdm's Issues

pqdm.processes 'AttributeError' object is not iterable while it works with pdqm.threads

  • Parallel TQDM version: 0.2.0
  • Python version: 3.11.4
  • Operating System: Windows 11

Description

I have working code for parallel execution using pqdm. It works if I use from pqdm.threads import pqdm and fails with 'AttributeError' object is not iterable if I switch to from pqdm.processes import pqdm instead.

What is it that I do not understand here? What can be the problem here?

What I Did

from pqdm.threads import pqdm # Works
#from pqdm.processes import pqdm # 'AttributeError' object is not iterable

When will you release new version in pip or conda?

  • Parallel TQDM version: 4.61.2
  • Python version: 3.9.6
  • Operating System: MacOS

Description

Hi, I'm interested in the exception handling behaviours feature added by this PR in September 2020.
This feature is currently not available from pip nor conda because is not in tag 0.1.0.

Do you plan to release a new version any soon?

Passing iterable and non-iterable args to function

  • Parallel TQDM version:
    0.1.0
  • Python version:
    3.6.5
  • Operating System:
    Mac OS 10.14.6

Description

I have a function that takes multiple arguments but only iterates over one of them. How would I implement this?

What I Did

I've had to re-write the function to take only the iterable as its argument but this is not ideal.

RuntimeError with example in multiprocessing.spawn

Hi! I wanted to use your library, but got quickly stuck by this error. The concept is very interesting. I hope there is a solution to this issue :)

  • Parallel TQDM version: 0.1.0
  • Python version: 3.7
  • Operating System: Windows 10

Description

Provided example raises RuntimeError in multiprocessing.spawn.

What I Did

from pqdm.processes import pqdm
# If you want threads instead:
# from pqdm.threads import pqdm

args = [
    {'a': 1, 'b': 2},
    {'a': 2, 'b': 3},
    {'a': 3, 'b': 4},
    {'a': 4, 'b': 5}
]
# args = range(1,6) would also work

def multiply(a, b):
    return a*b

result = pqdm(args, multiply, n_jobs=2, argument_type='kwargs')

What I got

SUBMITTING | : 100%|██████████| 4/4 [00:00<00:00, 400.22it/s]
SUBMITTING | :   0%|          | 0/4 [00:00<?, ?it/s]Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\multiprocessing\spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\eric.brunner\Documents\Personnal\S1PR-E02-RDF-DPS\02_Development\Python\dps\test_parallel_loading.py", line 45, in <module>
    result = pqdm(args, multiply, n_jobs=2, argument_type='kwargs')
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\site-packages\pqdm\processes.py", line 24, in pqdm
    **kwargs
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\site-packages\pqdm\_base.py", line 47, in _parallel_process
    for a in TQDM(iterable, **submitting_opts)
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\site-packages\pqdm\_base.py", line 47, in <listcomp>
    for a in TQDM(iterable, **submitting_opts)
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\concurrent\futures\process.py", line 641, in submit
    self._start_queue_management_thread()
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\concurrent\futures\process.py", line 583, in _start_queue_management_thread
    self._adjust_process_count()
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\concurrent\futures\process.py", line 607, in _adjust_process_count
SUBMITTING | :   0%|          | 0/4 [00:00<?, ?it/s]    p.start()
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\multiprocessing\context.py", line 322, in _Popen
Traceback (most recent call last):
  File "<string>", line 1, in <module>
    return Popen(process_obj)
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\multiprocessing\popen_spawn_win32.py", line 46, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\multiprocessing\spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\eric.brunner\Documents\Personnal\S1PR-E02-RDF-DPS\02_Development\Python\dps\test_parallel_loading.py", line 45, in <module>
    result = pqdm(args, multiply, n_jobs=2, argument_type='kwargs')
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\site-packages\pqdm\processes.py", line 24, in pqdm
    **kwargs
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\site-packages\pqdm\_base.py", line 47, in _parallel_process
    for a in TQDM(iterable, **submitting_opts)
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\site-packages\pqdm\_base.py", line 47, in <listcomp>
    for a in TQDM(iterable, **submitting_opts)
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\concurrent\futures\process.py", line 641, in submit
    self._start_queue_management_thread()
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\concurrent\futures\process.py", line 583, in _start_queue_management_thread
    self._adjust_process_count()
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\concurrent\futures\process.py", line 607, in _adjust_process_count
    p.start()
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\multiprocessing\popen_spawn_win32.py", line 46, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()
  File "C:\Users\eric.brunner\Anaconda3\envs\dps\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
SUBMITTING | :   0%|          | 0/4 [00:00<?, ?it/s]
SUBMITTING | :   0%|          | 0/4 [00:00<?, ?it/s]
PROCESSING | : 100%|██████████| 4/4 [00:00<00:00,  6.16it/s]
COLLECTING | : 100%|██████████| 4/4 [00:00<?, ?it/s]

Is it possible that pqdm passes over error messages

  • Parallel TQDM version: 0.1.0
  • Python version: 3.6
  • Operating System: linux cluster

Description

I wanted:
Read chunk of a big file
process and save chunk to a new files
Run this in parallel with pqdm.threads

What I Did

        def process_chunck(genome):

            D=pd.read_hdf(input_tmp_file,where=f"Genome == {genome}")
            out= process ...

            out.to_parquet(output_file)




        from pqdm.threads import pqdm
        pqdm(all_genomes, process_chunck, n_jobs=threads)

Now there was a bug in my function process_chunk which was not raised.

What can I do to do better error handling with pqdm?

BrokenProcessPool returned

  • Parallel TQDM version: 4.50.2
  • Python version: 3.8.5
  • Operating System: win10

Description

I run the example in Jupyter Notebook:

from pqdm.processes import pqdm
# If you want threads instead:
# from pqdm.threads import pqdm

args = [1, 2, 3, 4, 5]
# args = range(1,6) would also work

def square(a):
    return a*a

result = pqdm(args, square, n_jobs=2)

However, when I check the value of result, it shows:

[concurrent.futures.process.BrokenProcessPool('A process in the process pool was terminated abruptly while the future was running or pending.'),
 concurrent.futures.process.BrokenProcessPool('A process in the process pool was terminated abruptly while the future was running or pending.'),
 concurrent.futures.process.BrokenProcessPool('A process in the process pool was terminated abruptly while the future was running or pending.'),
 concurrent.futures.process.BrokenProcessPool('A process in the process pool was terminated abruptly while the future was running or pending.'),
 concurrent.futures.process.BrokenProcessPool('A process in the process pool was terminated abruptly while the future was running or pending.')]

concurrent.futures.process.BrokenProcessPool('A process in the process pool was terminated abruptly while the future was running or pending.')

  • Parallel TQDM version: latest
  • Python version: latest
  • Operating System: Windows

Description

Describe what you were trying to get done.
Tell us what happened, what went wrong, and what you expected to happen.

What I Did

from textblob import TextBlob
from pqdm.processes import pqdm
import multiprocessing


def translate_text(text):
    try:
        return str(TextBlob(text).translate(from_lang="pt", to="en"))
    except:
        return np.nan

result = pqdm(df["texts"].tolist(), translate_text, n_jobs=multiprocessing.cpu_count())
df["texts_en"] = result
df.head()

I´m trying to run the following code, it runs properly in google collab, but when I try to run on my local machine all I get from output is this error "concurrent.futures.process.BrokenProcessPool('A process in the process pool was terminated abruptly while the future was running or pending.')", do you have anny idea how to solve it?

Arguments not parsing for class method

  • Parallel TQDM version: pqdm==0.2.0
  • Python version: Python 3.9.16
  • Operating System: Ubuntu 18.04.5 LTS

Description

Thanks for this cool package!

I was able to get the parallel processing working for a custom function. However, when I added this function to class as a method, it appears that it is no longer receiving the arguments (neither args or kwargs works) as expected. I was able to recreate this with another simple example, using the math package (below).

Am I missing something here? Is there a way to pass these arguments to a class method?

What I Did

from pqdm.processes import pqdm
import math
args = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
result = pqdm(args, math.sqrt(), n_jobs=2)

Results in the following error:

TypeError: math.sqrt() takes exactly one argument (0 given)

Note, for future reference, that the first error I encountered (trying to use a class method) was:

TypeError: 'module' object is not callable

Error displaying widget: model not found

  • Parallel TQDM version: 0.2.0
  • Python version: 3.9.14
  • Jupyter lab version: 3.4.2
  • Jupyter 1.0.0
  • Operating System: Ubuntu 22.04

Description

When using pqdm in a jupyterlab notebook I receive the following message instead of the progress bar: Error displaying widget: model not found. For the rest the code seems to be executing like normal.

What I Did

I am also able to replicate this exact problem when running the example:

from pqdm.processes import pqdm
# If you want threads instead:
# from pqdm.threads import pqdm

args = [1, 2, 3, 4, 5]
# args = range(1,6) would also work

def square(a):
    return a*a

result = pqdm(args, square, n_jobs=2)

Allow control over which progress bars to show.

  • Parallel TQDM version: 0.1.0
  • Python version: 3.6.10
  • Operating System: macOS

Description

I'm using this in a place where some users are confused by the additional "submitted" and "collected" progress bars. It would be great to be able to optionally hide them.

Happy to submit a PR if you would consider this feature!

Feature Request: return iterable instead of list

  • Parallel TQDM version: 0.2.0

Description

I need to post-process the results returned by pqdm(). The number of tasks is very large, so it is not acceptable to store all the results in the memory. Is it possible to return an iterable, instead of a list, so I can consume the results on the fly? In this case, pqdm pre-fetch a number of results instead of produce as quick as possible to keep the memory usage constant.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.