GithubHelp home page GithubHelp logo

python's Introduction

Python Parallelization ⏩

CPU

Threading

Due to python GIL (global interpreter lock), only a single thread can acquire that lock at a time, which means the interpreter ultimately runs the instructions serially ☹️. This bottleneck, however, becomes irrelevant if your program has a more severe bottleneck elsewhere, for example in network, IO, or user interaction.

Threading is useful in:

  • GUI programs: For example, in a text editing program, one thread can take care of recording the user inputs, another can be responsible for displaying the text, a third can do spell-checking, and so on.
  • Network programs: For example web-scrapers. In this case, multiple threads can take care of scraping multiple webpages in parallel. The threads have to download the webpages from the Internet, and that will be the biggest bottleneck, so threading is a perfect solution here. Web servers, work similarly.
import threading

def func(x):
    return x*x

thread1 = threading.Thread(target=func, args=(4))
thread2 = threading.Thread(target=func, args=(5))

thread1.start() # Starts the thread asynchronously
thread2.start() # Starts the thread asynchronously

thread1.join()  # Wait to terminate
thread2.join()  # Wait to terminate

Multiprocessing

Multiprocessing outshines threading in cases where the program is CPU intensive and doesn’t have to do any IO or user interaction. For example, any program that just crunches numbers.

import multiprocessing

def func(x):
    return x*x

process1 = multiprocessing.Process(target=func, args=(4))
process2 = multiprocessing.Process(target=func, args=(5))

process1.start() # Start the process
process2.start() # Start the process

process1.join()  # Wait to terminate
process2.join()  # Wait to terminate

Multiprocessing pool

import multiprocessing

def f(x):
    return x*x

cores = 4
pool = multiprocessing.Pool(cores)
pool.map(f, [1, 2, 3])

PyTorch (Multiprocessing)

PyTorch multiprocessing is a wrapper around the native multiprocessing module. It supports the exact same operations, but extends it, so that all tensors sent through a multiprocessing.Queue

import torch.multiprocessing as mp

if __name__ == '__main__':
    num_processes = 4
    processes     = []
    for rank in range(num_processes):
        p = mp.Process(target=func, args=(x))
        p.start()
        processes.append(p)
    for p in processes:
        p.join()

Numba

Just-in-time (JIT) compiler for python. Works well with loops and numpy, but not with pandas

Numba also caches the functions after first use as a machine code. So after first time it will be even faster because it doesn’t need to compile that code again.

Scenarios

  • Object mode @jit: Only good for checking errors with python
  • Compile mode @jit(nopython=True) or also @njit: Good machine code performance
  • Multithreading @jit(nopython=True, parallel=True): Good if your code is parallelizable
    • Automatic multithreading of array expressions and reductions
    • Explicit multithreading of loops with prange(): for i in prange(10):
    • External multithreading with tools like concurrent.futures or Dask.
  • Vectorization SIMD @vectorize
    • @vectorize(target='cpu'): Single-threaded CPU
    • @vectorize(target='parallel'): Multi-core CPU
    • @vectorize(target='cuda'): CUDA GPU
from numba import jit

@jit
def function(x):
    # your loop or numerically intensive computations
    return x
    
@jit(nopython=True)
def function(a, b):
    # your loop or numerically intensive computations
    return result
    
@jit(nopython=True, parallel=True)
def function(a, b):
    # your loop or numerically intensive computations
    return result

GPU

PyTorch (CUDA)

import torch

print("GPU available:", torch.cuda.is_available())
print("GPU name:     ", torch.cuda.get_device_name(0))

Usage

tensor = torch.FloatTensor([1., 2.]).cuda()
tensor = tensor.operations ...
result = tensor.cpu()

Memory management

torch.cuda.memory_allocated() # Memory usage by tensors
torch.cuda.memory_cached()    # Cache memory (visible in nvidia-smi)
torch.cuda.empty_cache()      # Free cache memory

CuPy

import cupy as cp

Resources

python's People

Contributors

javiabellan avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.