GithubHelp home page GithubHelp logo

chrispcharlton / trufflepig Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 0.0 60 KB

A python package for rooting around your filesystem

License: GNU General Public License v3.0

Python 100.00%
filesystem hacktoberfest os python

trufflepig's People

Contributors

chrispcharlton avatar erupomare avatar herector8 avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

trufflepig's Issues

class-scoped test directory

It would be great to have a fixture to create a test dir3ectory at the class scope. The current testing_directory functionality is great, but some tests don't really need a new directory for every test, for example TestSearch uses the exact same set-up for each function. It would make a lot of sense if set-up and tear-down of the test directory was only performed once.

Add more test cases to test_dates.py

Additional test cases for date extraction functionality would be useful. The current cases test different delimiters and date formats, we should also test:

  • date at different positions in a filename (current tests all have date at the start of the string)
  • multiple extensions (e.g. .tar.gz)
  • non-date numbers in filename

Function to search for files that match a certain pattern and/or file type

This function is one of the key reasons we are creating this package. It will be very useful for automating some common filesystem tasks in reporting, like finding the most recent copy of a report, especially when they're buried in subfolders with stupid naming conventions.

This function should take the arguments dir, pattern, ext where dir is the root directory to search (subdirectories should also be searched), pattern should be a string (likely regex) and ext should be a string of the form ".*". Both pattern and ext should be optional and if neither is passed the function should essentially behave like os.walk.

It would be good to be able to order the results based on file metrics such as those output by os.stat like updated or created time (mtime and ctime respectively). It would also be good if there was a similar function that didn't search subdirectories, similar to os.listdir.

Extracting dates from filenames?

We may need a function(s) that can extract dates from filenames, for example

  • report_20201012.xlsx
  • report_12_10_2020.xlsx
  • 2020_10_12_report.xlsx

Should all return a date of 2020-10-12.

This could let us create a function that would rename a set of files to all have the same specified date format or search for files that contain a date.

Create contribution guidelines

A contributing.md file is a standard way of adding guidelines for contributions to a GitHub repo. We should add one to trufflepig for any future contributors. Feel free to leave suggestions in this issue for potential guidelines we should include.

Utilise extract_date_from_str

@herector8 recently added extract_date_from_str, a function with the purpose of extracting dates from filenames. This is another way (other than ctime) of finding when files were created (or when the data they reference was created/recorded).

We should create some functionality leveraging this. For example we could have a function that takes standard search parameters (regex + ext) and returns the latest (or oldest) date found in a filename that matches those parameters. This would be useful in automation for example, where this type of search could determine whether or not to run a process.

Open to other ideas on how to utilise this functionality, comment on this issue if you think of any (or feel free to implement them).

Inspiration - annoying file management tasks

What are some annoying common filesystem tasks. For example, this library is sort of based on the fact that finding the latest version of a constantly updated file can be annoying. What are some other tasks that would be good to automate?

Test Fixtures - Set-Up and Tear-Down of Directory Structure

In order to test trufflepig we will need to be able to create and populate a directory tree with dummy files to search through. This could possibly be from a structure outlined by a config file or class. Whatever solution is used needs to be easily extendible (e.g. to include a variety of file types) so that future functionality can be tested.

PyTest provides a built-in tmpdir fixture which creates a unique directory for the duration of a test. Some wrappers around this to create specific sub-directories and files would probably do the trick.

Directories will need to be created using something like os.mkdir. Files could be created using something like pathlib.Path.touch or just by opening and closing files e.g. open(fname, 'a').close().

An example of a test run with a fixture test_dir to check that filesearchfunction returns something could look like:

def test_filesearchfunction(test_dir):
    result = filesearchfunction(testdir)
    assert len(result)

Define the project structure

Create:

  • a main script file
  • test dir
  • documentation dir
  • requirements.txt
  • requirements-dev.txt (e.g. for pytest, sphinx, etc)

Anything else that would be important to this project?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.