GithubHelp home page GithubHelp logo

lidatong / genstream Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 26 KB

Method chaining comes to python

License: The Unlicense

Python 100.00%
python generators chaining method-chaining stream stream-processing stream-api

genstream's Introduction

genstream

Quickstart

pip install genstream

from genstream import Stream


def main():
    a_list_containing_four = (
        Stream.of(1, 2, 3)
            .map(lambda x: x * 2)
            .take(2)
            .tail()
            .to(list) # prints [4]
    )
    print(a_list_containing_four)


if __name__ == '__main__':
    main()

Soapbox

While generators are one of Python's best and most distinctive language features, I personally find it tiresome to read generator code that undergoes successive transformations. The (x for x in xs) pattern has a low signal-to-noise ratio, especially when it spans across many lines. Can the repetition be abstracted away?

The itertools module is another pain point: the module is very useful, but I don't like the two argument nature of many of the provided functions. I'm always trying to remember which goes first, the parameterization or the iterable, as the ordering is inconsistent across functions (e.g. compare take with map)`.

genstream provides a Stream structure that aims to address these two nits. It provides the infix method chaining syntax (.map, .filter, etc.) found in many other programming languages. While I agree with the python community consensus that map(f, xs) is less readable than (f(x) for x in xs), how about xs.map(f)? I prefer method syntax when sequencing many operations on an iterable.

Example of reading lines from many large files without running out of memory

# Located under examples/concat_files.py
import os
from genstream import Stream


def read_lines_in_file(filename):
    with open(filename) as fp:
        yield from fp


# Using `Stream`
def concat_files(dirname):
    return (
        Stream(os.listdir(dirname))
            .sort()
            .map(lambda fname: f"{dirname}/{fname}")
            .bind(read_lines_in_file)
    )


# Using generators
def concat_files_gen(dirname):
    fnames = os.listdir(dirname)
    sorted_fnames = sorted(fnames)
    fnames_with_dir = (f"{dirname}/{fname}" for fname in sorted_fnames)
    for fname in fnames_with_dir:
        yield from read_lines_in_file(fname)


# A more concise way, but perhaps less readable
# In particular, `sorted(os.listdir(dirname))` is very dense
def concat_files_concise(dirname):
    for fname in sorted(os.listdir(dirname)):
        yield from read_lines_in_file(f"{dirname}/{fname}")


def main():
    for line in concat_files("very_large_files"):
        print(line, end="")


if __name__ == '__main__':
    main()

Example using Stream's symbolic operators

from functools import partial
from itertools import count
from operator import add

from genstream import Stream


def main():
    add_one = partial(add, 1)
    xs = Stream(count(0))  # infinite stream counting from 0
    one_thru_five = xs[:5] | add_one > list
    print(one_thru_five)


if __name__ == '__main__':
    main()

Note on implementation

Given a particular set of primitive operations (e.g. __init__ and reduce), it is possible to derive almost all stream ops in terms of one another.

However, the methods on Stream instead make calls to a corresponding itertools function whenever possible. This is primarily for performance reasons: itertools is a highly-optimized module implemented in C.

genstream's People

Contributors

lidatong avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.