bitranox / igittigitt Goto Github PK
View Code? Open in Web Editor NEWA spec-compliant gitignore parser for Python
License: MIT License
A spec-compliant gitignore parser for Python
License: MIT License
Git provides for reading default rules from the user's home directory (e.g. ~/.gitignore). This requires special treatment since ignore rules are interpreted relative to the directory in which the ignore file is located and not just the user's home directory.
Emulating this functionality is already present with the optional base_dir
argument to _parse_rule_file
(except it is currently broken). I would like to recommend that the base_dir
parameter be returned to working order, and that _parse_rule_file
be promoted to a public method (i.e. just remove the leading underline '_').
Fixing base_dir can be accomplised with a single line of code:
path_base_dir = base_dir if base_dir is not None else path_rule_file.parent
User's can then effectively append multiple default ignore files as if they were all part of a single ignore file in the target directory.
igittigitt.parse_rule_file("~/.myignore', base_dir=target_dir)
igittigitt.parse_rule_files(target_dir, '.myignore')
If behavior is intended, I need support, if not, it might be a bug
Currently files in subdirectories of ignored directories are not matched.
rule in .gitignore
: src_old/
file not matched: src_old/foo/bar.py
.gitignore
src_old/
.idea/
code
import igittigitt
import pprint
import pathlib
current_path = pathlib.Path(__file__).parent.absolute()
gitignore_parser = igittigitt.IgnoreParser()
gitignore_parser.parse_rule_files(current_path)
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(gitignore_parser.rules)
print(gitignore_parser.match(pathlib.Path(f'{current_path}/src_old')))
print(gitignore_parser.match(pathlib.Path(f'{current_path}/src_old/foo.py')))
print(gitignore_parser.match(pathlib.Path(f'{current_path}/src_old/foo')))
print(gitignore_parser.match(pathlib.Path(f'{current_path}/src_old/foo/bar.py')))
output
[ IgnoreRule(pattern_fnmatch='/home/md/PycharmProjects/test/**/.idea', pattern_original='.idea/', is_negation_rule=False, source_file=PosixPath('/home/md/PycharmProjects/test/.gitignore'), source_line_number=2),
IgnoreRule(pattern_fnmatch='/home/md/PycharmProjects/test/**/.idea/*', pattern_original='.idea/', is_negation_rule=False, source_file=PosixPath('/home/md/PycharmProjects/test/.gitignore'), source_line_number=2),
IgnoreRule(pattern_fnmatch='/home/md/PycharmProjects/test/**/src_old', pattern_original='src_old/', is_negation_rule=False, source_file=PosixPath('/home/md/PycharmProjects/test/.gitignore'), source_line_number=1),
IgnoreRule(pattern_fnmatch='/home/md/PycharmProjects/test/**/src_old/*', pattern_original='src_old/', is_negation_rule=False, source_file=PosixPath('/home/md/PycharmProjects/test/.gitignore'), source_line_number=1),
IgnoreRule(pattern_fnmatch='/home/md/PycharmProjects/test/.idea', pattern_original='.idea/', is_negation_rule=False, source_file=PosixPath('/home/md/PycharmProjects/test/.gitignore'), source_line_number=2),
IgnoreRule(pattern_fnmatch='/home/md/PycharmProjects/test/.idea/*', pattern_original='.idea/', is_negation_rule=False, source_file=PosixPath('/home/md/PycharmProjects/test/.gitignore'), source_line_number=2),
IgnoreRule(pattern_fnmatch='/home/md/PycharmProjects/test/src_old', pattern_original='src_old/', is_negation_rule=False, source_file=PosixPath('/home/md/PycharmProjects/test/.gitignore'), source_line_number=1),
IgnoreRule(pattern_fnmatch='/home/md/PycharmProjects/test/src_old/*', pattern_original='src_old/', is_negation_rule=False, source_file=PosixPath('/home/md/PycharmProjects/test/.gitignore'), source_line_number=1)]
True
True
True
False
Every subdir and file in subdirs of a ignored dir should be matched.
in 1.0.6 it was matching all sub* of a ignored directory
**I'm submitting a ... **
What is the current behavior?
parse_rule_files
relies on Pathlib's glob method that is affected by a bug that prevents it from following symlinks. As a result, .gitignore
files under symlinked directories are ignored.
Place a .gitigore file in a symlinked directory and use parse_rule_files
on the parent folder.
Following this solution, it would be best to rely on glob.glob
that follows symlinks correctly.
What is the motivation / use case for changing the behavior?
Please tell us about your environment:
https://stackoverflow.com/questions/46529760/getting-glob-to-follow-symlinks-in-python
Proposed solution:
rule_files = sorted(list(
glob(
f"{os.path.abspath(base_dir)}/**/.gitignore", recursive=True
)
)
)
The CLI isnt very useful at the moment.
It would be useful to be able to specify an ignore file, and list files like git ls-files
, and even better be able to give glob(s) args that the program expands and then filters out anything which should be ignored.
This is a follow up to issue #16. The glob issue is now resolved, however, there are still problems that I encountered.
pathlib.Path().resolve()
follows symlinksThe match() method resolve()
s the target before matching it against the ignore rules. That means that you may not be able to match against a file under a symlinked directory. Consider the following example:
prey/
if-you-can.txt
catch_me/ -> prey/
.gitignore
where catch_me
is a symlink to prey
. The gitignore
contains the following:
catch_me
Now, match('catch_me')
will fail as it is resolved to prey
first before matching against the rule happens. A solution would be to avoid the resolving, for example:
def match(self, file_path) -> bool:
str_file_path = os.path.abspath(file_path)
is_file = os.path.isfile(str_file_path)
match = self._match_rules(str_file_path, is_file)
if match:
match = self._match_negation_rules(str_file_path)
return match
In version 3 and later, globmatch does not follow symlinks unless the FOLLOW
flag is set, e.g.
if wcmatch.glob.globmatch(
str_file_path,
[self.last_matching_rule.pattern_glob],
flags=wcmatch.glob.DOTGLOB
| wcmatch.glob.GLOBSTAR
| wcmatch.glob.FOLLOW,
I believe the follow flag should be included.
Currently, if the base directory provided to an IgnoreParser
method is a symlink or a descendant of a symlink, the parser will not match any files provided to it.
MVCE:
#!/usr/bin/env python3
from pathlib import Path
from igittigitt import IgnoreParser
dir01 = Path("/tmp/igittigitt01")
dir01.mkdir()
dir02 = Path("/tmp/igittigitt02")
dir02.symlink_to(dir01)
(dir02 / "file.txt").touch()
parser = IgnoreParser()
parser.add_rule("*.txt", dir02)
print(parser.match(dir02 / "file.txt")) # This prints "False", but it should print "True"
A rule added for a directory path p
should be honored for any filepath that starts with p
.
A rule added for a directory path p
should be honored for any filepath that starts with p
.
I believe this bug is due to the use of the pathlib.Path.resolve()
method, which resolves symlinks.
**I'm submitting a ... **
What is the current behavior?
Based on gitignore specification it is not possible to re-include a file if a parent directory of that file is excluded.
https://git-scm.com/docs/gitignore#_pattern_format
I try to exclude specific directory based on example
Example to exclude everything except a specific directory foo/bar (note the /* - without the slash, the wildcard would also exclude everything within foo/bar):
$ cat .gitignore
# exclude everything except directory foo/bar
/*
!/foo
/foo/*
!/foo/bar
import igittigitt
"""
Based on https://git-scm.com/docs/gitignore#_examples
Example to exclude everything except a specific directory foo/bar
(note the /* - without the slash, the wildcard would also exclude everything within foo/bar):
$ cat .gitignore
# exclude everything except directory foo/bar
/*
!/foo
/foo/*
!/foo/bar
"""
gitignore = igittigitt.IgnoreParser()
base_path = "/example/"
gitignore.add_rule("/*", base_path)
gitignore.add_rule("!/foo", base_path)
gitignore.add_rule("/foo/*", base_path)
gitignore.add_rule("!/foo/bar", base_path)
assert gitignore.match(base_path + "foo/bar/file.txt") == False
assert gitignore.match(base_path + "foo/other/tile.txt") == True #failed on current version
What is the expected behavior?
Based above example gitignore.match(base_path + "foo/other/tile.txt") should return True
Please tell us about your environment:
**I'm submitting a ... **
Do you want to request a feature or report a bug?
None
What is the current behavior?
**If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem
Not a bug
What is the expected behavior?
What is the motivation / use case for changing the behavior?
Please tell us about your environment:
I want to process all files and subdirectories in a given directory and match them with ignore file.
For example,
My project is located in: D:\project
The ignore file is located in D:\project\.ignore
and contains *.txt
I am trying to pass a system wide file path or directory to check if they match or not. like c:\myfile.txt
All files and directories are not matched if the basepath is not the same.
Is it even possible to do something like this?
**I'm submitting a ... **
Do you want to request a feature or report a bug?
Report a bug.
What is the current behavior?
The example in the README appears to be incorrect or outdated. The parse_rule_file
does not appear to exist. If I attempt to use this function, Python raises an error: AttributeError: 'IgnoreParser' object has no attribute 'parse_rule_file'
.
**If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem
Running Python in a directory with a .gitignore file:
>>> import igittigitt
>>> parser = igittigitt.IgnoreParser()
>>> parser.parse_rule_file(".gitignore")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'IgnoreParser' object has no attribute 'parse_rule_file'
parse_rule_file
function in the source code. Guessing parse_rule_files
is the intended function to use.>>> parser.parse_rule_files(pathlib.Path('/home/bitranox/project/'))
What is the motivation / use case for changing the behavior?
I spent way too long trying to get this inexistant function working, And I'd rather no one else ever has to go through the pain again.
Please tell us about your environment:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.