GithubHelp home page GithubHelp logo

pipepy's People

Contributors

kbairak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

nissmogt

pipepy's Issues

PosixPath doesn't work with redirections

Redirection works with strings and file-like objects.
It should work with Path objects too.

>>> from pipepy import ls
>>> ls > "deleteme"
>>> from pathlib import Path
>>> ls > Path("deleteme")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '>' not supported between instances of 'PipePy' and 'PosixPath'

Document patterns for caching results of commands

This is a really really awesome library, I can't believe it isn't more widely known (I found it via Bing+GPT)

I am interested in using this library in place of where I might use Makefiles/snakemake/dagster/etc. I really like the bundled pymake, but I'm interested in writing the logic in a more pythonic way, e.g. using nested function calls, where each function wraps a shell command.

Using pipepy out the box won't give me dependency management. E.g. I want to avoid recomputing expensive commands if the inputs don't change.

However, it's pretty easy to compose existing python solutions like lru_cache.

To do this I create a simple wrapper class for filesystem objects:

@dataclass
class FileObject:
    path: str
    use_md5: bool = False

    @property
    def modified(self):
        return Path(self.path).stat().st_mtime

    @property
    def md5(self):
        return md5(self.path)

    # hashable
    def __hash__(self):
        if self.use_md5:
            return hash((self.path, self.md5))
        else:
            return hash((self.path, self.modified))

This allows for two strategies in triggering re-running a command - upstream file modification (as per Makefiles) or change in md5 hash.

I can then write my own wrapper commands:

@lru_cache
def cached_grep(pattern: str, filename: FileObject) -> str:
    return pipepy.grep(pattern, filename.path)

I confirmed this works:

@pytest.fixture()
def setup():
    # setup: make sure the output directory exists
    OUTPUT_DIR.mkdir(exist_ok=True)
    test_text = "\n".join([f'test {i}' for i in range(100)])
    TEST_DATA.write_text(test_text)

@pytest.mark.parametrize("use_md5", [True, False])
def test_cache(setup, use_md5):
    test_text = "\n".join([f'test {i}' for i in range(100)])
    TEST_DATA.write_text(test_text)
    fo = FileObject(TEST_DATA, use_md5=use_md5)
    out = str(cached_grep("1", fo))
    assert "test 1" in str(out)
    # 2nd time. TODO: check this is actually not running grep
    out = str(cached_grep("1", fo))
    assert "test 1" in str(out)
    # 
    TEST_DATA.write_text("")
    out = str(cached_grep("1", fo))
    assert "test 1" not in str(out)

I don't mind the overhead of the FileObject wrapper class - in fact it let's me use python's type checking in my workflows. I also like how I could extend this to other ways of triggering changes.

I don't mind so much the wrapper cached_grep function, I typically don't have so many of these. However, this feels a trifle inelegant, and that there might be a more dynamic way of doing this with less boilerplate.

Is the pattern above something that others might use, or does this go against the grain? Would something the the FileObject class be a useful addition, or is this better as a separate package?

(I'm also interested in patterns for weaving in asyncio for running processes in parallel, but that would be a new question)

Debug print statements do not work

Hi,
The debug print statement does not work:

>>> from pipepy import date
>>> mydate=date()
>>> print(f"mydate={mydate}")
mydate=Wed Mar 17 22:22:36 CET 2021

>>> print(f"{mydate=}")
mydate=PipePy('date', _returncode=0, _stdout='Wed Mar 17 2...36 CET 2021\n')

binary mode incompatible with encoding

Hi,
I tried the following code:

from pipepy import gzip
gzip=gzip(_text=False)
gzip < 'deleteme' > 'deleteme.gz'

and i get the following traceback

Traceback (most recent call last):
  File "test.py", line 3, in <module>
    gzip < 'deleteme' > 'deleteme.gz'
  File "/Users/elkarouh/opt/anaconda3/lib/python3.8/site-packages/pipepy/pipepy.py", line 484, in __lt__
    with open(filename, mode, encoding=self._encoding) as f:
ValueError: binary mode doesn't take an encoding argument

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.