GithubHelp home page GithubHelp logo

sspipe / sspipe Goto Github PK

View Code? Open in Web Editor NEW
147.0 5.0 6.0 55 KB

Simple Smart Pipe: python productivity-tool for rapid data manipulation

Home Page: https://sspipe.github.io/

License: MIT License

Python 99.36% Shell 0.64%
python pipe data-science readability productivity magrittr dplyr syntax-sugar pandas numpy

sspipe's Introduction

Downloads Build Status PyPI

Simple Smart Pipe

SSPipe is a python productivity-tool for rapid data manipulation in python.

It helps you break up any complicated expression into a sequence of simple transformations, increasing human-readability and decreasing the need for matching parentheses!

As an example, here is a single line code for reading students' data from 'data.csv', reporting those in the class 'A19' whose score is more than the average class score into 'report.csv':

from sspipe import p, px
import pandas as pd

pd.read_csv('data.csv') | px[px['class'] == 'A19'] | px[px.score > px.score.mean()].to_csv('report.csv')

As another example, here is a single line code for plotting sin(x) for points in range(0, 2*pi) where cos(x) is less than 0 in red color:

from sspipe import p, px
import numpy as np
import matplotlib.pyplot as plt

np.linspace(0, 2*np.pi, 100) | px[np.cos(px) < 0] | p(plt.plot, px, np.sin(px), 'r')

# The single-line code above is equivalent to the following code without SSPipe:
# X = np.linspace(0, 2*np.pi, 100)
# X = X[np.cos(X) < 0]
# plt.plot(X, np.sin(X), 'r')

If you're familiar with | operator of Unix, or %>% operator of R's magrittr, sspipe provides the same functionality in python.

Installation and Usage

Install sspipe using pip:

pip install --upgrade sspipe

Then import it in your scripts.

from sspipe import p, px

The whole functionality of this library is exposed by two objects p (as a wrapper for functions to be called on the piped object) and px (as a placeholder for piped object).

Examples

Description Python expression using p and px Equivalent python code
Simple
function call
"hello world!" | p(print) X = "hello world!"
print(X)
Function call
with extra args
"hello" | p(print, "world", end='!') X = "hello"
print(X, "world", end='!')
Explicitly positioning
piped argument
with px placeholder
"world" | p(print, "hello", px, "!") X = "world"
print("hello", X, "!")
Chaining pipes 5 | px + 2 | px ** 5 + px | p(print) X = 5
X = X + 2
X = X ** 5 + X
print(X)
Tailored behavior
for builtin map
and filter
(
range(5)
| p(filter, px % 2 == 0)
| p(map, px + 10)
| p(list) | p(print)
)
X = range(5)
X = filter((lambda x:x%2==0),X)
X = map((lambda x: x + 10), X)
X = list(X)
print(X)
NumPy expressions range(10) | np.sin(px)+1 | p(plt.plot) X = range(10)
X = np.sin(X) + 1
plt.plot(X)
Pandas support people_df | px.loc[px.age > 10, 'name'] X = people_df
X.loc[X.age > 10, 'name']
Assignment people_df['name'] |= px.str.upper() X = people_df['name']
X = X.str.upper()
people_df['name'] = X
Pipe as variable to_upper = px.strip().upper()
to_underscore = px.replace(' ', '_')
normalize = to_upper | to_underscore
" ab cde " | normalize | p(print)
_f1 = lambda x: x.strip().upper()
_f2 = lambda x: x.replace(' ','_')
_f3 = lambda x: _f2(_f1(x))
X = " ab cde "
X = _f3(X)
print(X)
Builtin
Data Structures
2 | p({px-1: p([px, p((px+1, 4))])}) X = 2
X = {X-1: [X, (X+1, 4)]}

How it works

The expression p(func, *args, **kwargs) returns a Pipe object that overloads __or__ and __ror__ operators. This object keeps func and args and kwargs until evaluation of x | <Pipe>, when Pipe.__ror__ is called by python. Then it will evaluate func(x, *args, **kwargs) and return the result.

The px object is simply p(lambda x: x).

Please notice that SSPipe does not wrap piped objects. On the other hand, it just wraps transforming functions. Therefore, when a variable like x is not an instance of Pipe class, after python evaluates y = x | p(func), the resulting variable y has absolutely no trace of Pipe. Thus, it will be exactly the same object as if we have originally evaluated y = func(x).

Common Gotchas

  • Incompatibility with dict.items, dict.keys and dict.values:

    The objects returned by dict.keys(), dict.values() and dict.items() are called view objects. Python does not allow classes to override the | operator on these types. As a workaround, the / operator has been implemented for view objects. Example:

    # WRONG ERRONEOUS CODE:
    {1: 2, 3: 4}.items() | p(list) | p(print)
    
    # CORRECT CODE (With / operator):
    {1: 2, 3: 4}.items() / p(list) | p(print)

Compatibility with JulienPalard/Pipe

This library is inspired by, and depends on, the intelligent and concise work of JulienPalard/Pipe. If you want a single pipe.py script or a lightweight library that implements core functionality and logic of SSPipe, Pipe is perfect.

SSPipe is focused on facilitating usage of pipes, by integration with popular libraries and introducing px concept and overriding python operators to make pipe a first-class citizen.

Every existing pipe implemented by JulienPalard/Pipe library is accessible through p.<original_name> and is compatible with SSPipe. SSPipe does not implement any specific pipe function and delegates implementation and naming of pipe functions to JulienPalard/Pipe.

For example, JulienPalard/Pipe's example for solving "Find the sum of all the even-valued terms in Fibonacci which do not exceed four million." can be re-written using sspipe:

def fib():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

from sspipe import p, px

euler2 = (fib() | p.where(lambda x: x % 2 == 0)
                | p.take_while(lambda x: x < 4000000)
                | p.add())

You can also pass px shorthands to JulienPalard/Pipe API:

euler2 = (fib() | p.where(px % 2 == 0)
                | p.take_while(px < 4000000)
                | p.add())

sspipe's People

Contributors

mohammad7t avatar msekhavat-xnor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

sspipe's Issues

A composition framework that involves sspipe

Hi,
I have built such python framework that allows for composition of functions . sspipe really enables me to take the next level.
I also did a fork with name and contextvars feature (needed). Let me know if you would include it (it is backward-comptabile, but won't add value without the entire framework). Could do a PR.

And also wanted to hear WDYT and if you would recommend it , that would be appreciated.

https://github.com/eyalk11/composition

Thanks.

Stuck when zipping

I'm trying to zip/unzip an array, but the console always hangs up. Is this a bug, or am I doing something wrong?

from sspipe import p, px

[[1, 2, 3], [4, 5, 6]] | p(zip, px)       # zip, console/notebook gets stuck
[[1, 2], [3, 4], [5, 6]] | p(zip, *px)    # unzip, console/notebook gets stuck

Question: Applying multiple operations in one call to p

Generally - a great library! The simplifications for usability on Julien's Pipe are important.

The question here is: I generally define a simple function

def lmap(x): return list(map(x))

This is useful since python3 map,filter, etc return a generator and it is chatty to have list(blah) everywhere to compensate. But with this sspipe library it does not work: needs to be

| p(map) | p(list)

i.e. the list and map can not be combined into a single invocation of p. We can extrapolate this to be a hassle. You're a bright guy - can you come up with a solution/workaround to be able to combine multiple operations in one call to p ? Thx!

`TypeError` on conditional

Is there a way to make the following working Pandas code more concise?

value = data | px[px.SOME_COLUMN == 10] # data: DataFrame
if not value.empty:
    print(value.ANOTHER_COLUMN.astype(str).values[0])

I tried this:

data | px[px.SOME_COLUMN == 40] | (px if not px.empty else exit()) | px.ANOTHER_COLUMN.astype(str).values[0] | p(print)

but I got a TypeError:

Traceback (most recent call last):
  File "/test_pandas.py", line 25, in <module>
    data | px[px.SOME_COLUMN == 40] | (px if not px.empty else exit()) | px.ANOTHER_COLUMN.astype(str).values[0] | p(print)
TypeError: 'Pipe' object cannot be interpreted as an integer

Essentially, I'm trying not to print if the DataFrame is empty.

Questions: unpipe() and general compatibility

Hey man, first let me say that sspipe is my favorite python package, it is a game changer in my opinion.
I believe the combination of functional programming and pipe operator is the best paradigm and Python was really deficient in that regard.
Thank you very much to provide this package and I hope it gets the all attention it deserves!

I have 2 questions:

(1) What are the uses of unpipe? Are there some documentation and examples using it?

(2) Can sspipe be used robustly throughout python ecosystem or there are some cases/environments that it should be avoided to prevent some critical incompatibility?

Best regards

pipe iterator into next function

sspipe hangs up when the output is an iterator and the next function iterates over the prior output.

reduce(lambda x, y: x + y, [1, 2, 3, 4]) yields 10 but [1, 2, 3, 4] | reduce(lambda x, y: x + y, px) hangs up

[x + 1 for x in [1, 2, 3, 4]] yields [2, 3, 4, 6] but [1, 2, 3, 4] | [x + 1 for x in px] hangs up

Would love a way to pass iterators!

TypeError: 'Pipe' object cannot be interpreted as an integer

Hi,

My pipelines built with sspipe often fail due to TypeError on one of the transformations. Even though each step returns a DataFrame object I still get this failure.

Sample Code:


import pandas as pd
from sspipe import px
from urllib.request import urlretrieve

# Get Data to work with
url = 'http://files.zillowstatic.com/research/public/Neighborhood/Neighborhood_MedianRentalPrice_AllHomes.csv'
local_path = './Neighborhood_MedianRentalPrice_AllHomes.csv'
urlretrieve(url, local_path)

zillow = pd.read_csv(local_path)

# This Works
zillow_regions =  zillow \
    .loc[zillow.State.isin(['CA', 'NY', 'WA', 'DC']), ['RegionName', 'City', 'State']] \
    .sort_values(['City', 'RegionName']) \
    .drop_duplicates()

# This Does Not Work
zillow_regions = (
    zillow
    | px.loc[px.State.isin(['CA', 'NY', 'WA', 'DC']), ['RegionName', 'City', 'State']]
    | px.sort_values(['City', 'RegionName'])
    | px.drop_duplicates()
)
zillow_regions.head()

Error Message when using sspipe:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
TypeError: 'Pipe' object cannot be interpreted as an integer
Exception ignored in: 'pandas._libs.tslibs.util.is_period_object'
TypeError: 'Pipe' object cannot be interpreted as an integer
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
TypeError: 'Pipe' object cannot be interpreted as an integer
Exception ignored in: 'pandas._libs.lib.is_interval'
TypeError: 'Pipe' object cannot be interpreted as an integer
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
TypeError: 'Pipe' object cannot be interpreted as an integer
Exception ignored in: 'pandas._libs.tslibs.util.is_offset_object'
TypeError: 'Pipe' object cannot be interpreted as an integer

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.