GithubHelp home page GithubHelp logo

tuw-geo / eotransform Goto Github PK

View Code? Open in Web Editor NEW
0.0 4.0 0.0 291 KB

Defines the protocol for transformations, to be used in a generic source to sink streaming concept, and provides some generic transformer implementations. Project badge

License: MIT License

Makefile 3.62% Python 96.38%

eotransform's Introduction

Coverage badge Documentation Status DOI

eotransform

Defines the basic transform protocol to be used in the streamed source to sink concept. Also provides some generic transformer implementations such as Compose or Result.

What can I use eotransform for?

The eotransform package defines Source, Transform, and Sink protocols, to facilitate the creation of modularised processing pipelines. Adhering to a common contract, makes it easier to mix and match processing blocks, allowing for better code reusage, and more flexible pipelines. We also provide a streamed_process function, which you can use for I/O hiding when implementing these protocols. The package also provides some common transformations, and sinks like Compose or Result.

Getting Started

Installation

pip install eotransform

Examples

Transformer protocol

This example shows how to implement the Transformer protocol for a simple multiplication:

class Multiply(Transformer[int, int]):
    def __init__(self, factor: int):
        self.factor = factor

    def __call__(self, x: int) -> int:
        return x * self.factor

snippet source | anchor

Sink protocol

This code snippet illustrates how to implement the Sink protocol, using a simple accumulation example:

class AccumulatingSink(Sink[int]):
    def __init__(self):
        self.result = 0

    def __call__(self, x: int) -> None:
        self.result += x

snippet source | anchor

Streamed pipeline using the "Result" pattern

In the following example we show how to combine ApplyToOkResult and SinkUnwrapped to process data in a streamed fashion with proper error handling across thread boundaries.

def a_data_source():
    for i in range(4):
        if i == 1:
            yield Result.error(RuntimeError("A runtime error occured!"))
        else:
            yield Result.ok(i)

accumulated = AccumulatingSink()
sink = SinkUnwrapped(accumulated, ignore_exceptions={RuntimeError})
with ThreadPoolExecutor(max_workers=3) as ex:
    streamed_process(a_data_source(), ApplyToOkResult(Multiply(2)), sink, ex)

assert accumulated.result == 10

snippet source | anchor

Streaming

The following briefly describes the concept of streaming, and how it can be used to hide I/O processes.

The most straightforward way to process data is to first load it and then process it:

serial process

This has the advantage of being simple to implement and maintain, as you don't need to be concerned with issues of parallelism.

For many cases this will work sufficiently well, however, it can stall your processing pipeline because it needs to wait for data to be fetched. Often an easy way to increase throughput, is to interleave the I/O or data fetching with processing chunks:

streamed process

With this streaming process you can utilise resources more effectively.

Support & Documentation

Dependencies

eotransform requires Python 3.8 and has these dependencies:

more-itertools

snippet source | anchor

Citation

If you find this repository useful, please consider giving it a star or a citation:

@software{raml_bernhard_2023_8002789,
  author       = {Raml, Bernhard},
  title        = {eotransform},
  month        = jun,
  year         = 2023,
  publisher    = {Zenodo},
  version      = {1.8.2},
  doi          = {10.5281/zenodo.8002789},
  url          = {https://doi.org/10.5281/zenodo.8002789}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.