chrispcharlton / trufflepig Goto Github PK
View Code? Open in Web Editor NEWA python package for rooting around your filesystem
License: GNU General Public License v3.0
A python package for rooting around your filesystem
License: GNU General Public License v3.0
It would be great to have a fixture to create a test dir3ectory at the class scope. The current testing_directory functionality is great, but some tests don't really need a new directory for every test, for example TestSearch uses the exact same set-up for each function. It would make a lot of sense if set-up and tear-down of the test directory was only performed once.
Would detecting duplicate files in a directory tree be a useful feature? This could be easily done by hashing the bytes or similar.
Additional test cases for date extraction functionality would be useful. The current cases test different delimiters and date formats, we should also test:
.tar.gz
)This function is one of the key reasons we are creating this package. It will be very useful for automating some common filesystem tasks in reporting, like finding the most recent copy of a report, especially when they're buried in subfolders with stupid naming conventions.
This function should take the arguments dir, pattern, ext
where dir
is the root directory to search (subdirectories should also be searched), pattern
should be a string (likely regex) and ext
should be a string of the form ".*"
. Both pattern
and ext
should be optional and if neither is passed the function should essentially behave like os.walk.
It would be good to be able to order the results based on file metrics such as those output by os.stat like updated or created time (mtime
and ctime
respectively). It would also be good if there was a similar function that didn't search subdirectories, similar to os.listdir.
We may need a function(s) that can extract dates from filenames, for example
Should all return a date of 2020-10-12.
This could let us create a function that would rename a set of files to all have the same specified date format or search for files that contain a date.
A contributing.md file is a standard way of adding guidelines for contributions to a GitHub repo. We should add one to trufflepig for any future contributors. Feel free to leave suggestions in this issue for potential guidelines we should include.
@herector8 recently added extract_date_from_str, a function with the purpose of extracting dates from filenames. This is another way (other than ctime
) of finding when files were created (or when the data they reference was created/recorded).
We should create some functionality leveraging this. For example we could have a function that takes standard search parameters (regex + ext) and returns the latest (or oldest) date found in a filename that matches those parameters. This would be useful in automation for example, where this type of search could determine whether or not to run a process.
Open to other ideas on how to utilise this functionality, comment on this issue if you think of any (or feel free to implement them).
What are some annoying common filesystem tasks. For example, this library is sort of based on the fact that finding the latest version of a constantly updated file can be annoying. What are some other tasks that would be good to automate?
In order to test trufflepig we will need to be able to create and populate a directory tree with dummy files to search through. This could possibly be from a structure outlined by a config file or class. Whatever solution is used needs to be easily extendible (e.g. to include a variety of file types) so that future functionality can be tested.
PyTest provides a built-in tmpdir fixture which creates a unique directory for the duration of a test. Some wrappers around this to create specific sub-directories and files would probably do the trick.
Directories will need to be created using something like os.mkdir. Files could be created using something like pathlib.Path.touch or just by opening and closing files e.g. open(fname, 'a').close()
.
An example of a test run with a fixture test_dir
to check that filesearchfunction
returns something could look like:
def test_filesearchfunction(test_dir):
result = filesearchfunction(testdir)
assert len(result)
Create:
Anything else that would be important to this project?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.