GithubHelp home page GithubHelp logo

dazinator / dotnet.glob Goto Github PK

View Code? Open in Web Editor NEW
360.0 11.0 26.0 278 KB

A fast globbing library for .NET / .NETStandard applications. Outperforms Regex.

License: MIT License

C# 95.54% Batchfile 0.40% PowerShell 3.99% Shell 0.07%
glob glob-pattern globbing-library csharp

dotnet.glob's Issues

Defect: Spaces After Comma Doesn't Match

The following results seems to indicate a defect:

  • DotNet.Globbing.Glob.Parse("Stuff,*").IsMatch("Stuff, x");: true
  • DotNet.Globbing.Glob.Parse("Stuff *").IsMatch("Stuff x");: true
  • DotNet.Globbing.Glob.Parse("Stuff, *").IsMatch("Stuff, x");: false

Am I missing something?

Add back in net4 and net46 targets.

The current stable release supports net4, 4.5, 4.6 and netstandard 1.1.
After latest dev changes, the current unstable nugget package now only support net.4.5 and net standard 1.1.

I am going to add back in the other targets.

IndexOutOfRangeException

The following lines generate an IndexOutOfRangeException:

var glob = Glob.Parse("C:\\Bin\\*.pdb");
glob.Match("C:\\Bin\\.vs");

Glob object model support for **

Your globbing library is pretty nice. I was wondering if there’s a way to glob a pattern like this using the GlobBuilder: [a-zA-Z0-9]**

I can do it with string parsing, Glob.Parse(); but I seek high performance and was wondering if it would be possible to use the GlobBuilder to achieve the same.
new GlobBuilder().LetterInRange('a', 'z').Wildcard().WildCard() -> [a-z]**

.Wildcard().WildCard() doesn't seem to be equivalent to "**".

Also, another thing that’s problematic is being able to do, [a-zA-Z0-9]**.

Finally, I want to glob files with a certain extension for e.g., *.pdb this doesn't seem to work at the moment?

Single asterisk behaviour

I seem to get odd results when using the single asterisk wildcard *. For example, let's take the following string:

HKEY_LOCAL_MACHINE\SOFTWARE\Adobe\Shockwave 12

The pattern *ave 12 matches, as expected, but *ave*2 (which uses 2 asterisks) does not, neither doe Shock* 12 (which uses a single asterisk, but not as the first character in the pattern).

Is this the expected behaviour? If so, what's the rationale behind it?

Characters missing from list of allowable path characters

The readme says:

By default, when your glob pattern is parsed, DotNet.Glob will only allow literals which are valid for path / directory names. These are:

Any Letter (A-Z, a-z) or Digit
., , !, #, -, ;, =, @, ~, _, :

Maybe I'm misunderstanding this section, but on all of Windows, Linux and MacOS, lot's of other characters are valid in file system paths, such as:

  • Any printable Unicode character 你好!
  • { } [ ] ( ) + ; % ? *

Also, on Windows : is not valid.

endless loop in version 1.6.6

try this code:
localPattern="C:\sources\COMPILE*\MSVC120.DLL"
localInput="C:\sources\COMPILE\ANTLR3.RUNTIME.DLL"

var glob = Glob.Parse( localPattern ); return glob.IsMatch( localInput );

endless loop in "WildcardDirectoryTokenEvaluator.IsMatch":

currentPosition=51
maxPos=51
isMatch=false

` // Match until maxpos, is reached.
while (currentPosition <= maxPos)
{
// Test at current position.
isMatch = _subEvaluator.IsMatch(allChars, currentPosition, out newPosition);
if (isMatch)
{
return isMatch;
}

                // Iterate until we hit a seperator or maxPos.
                while (currentPosition < maxPos)
                {
                    currentPosition = currentPosition + 1;
                    currentChar = allChars[currentPosition];
                    if (currentChar == '/' || currentChar == '\\')
                    {
                        // advance past the seperator.
                        currentPosition = currentPosition + 1;
                        break;
                    }
                }
            }

`

Is the * operator greedy or non-greedy?

In other words, does it make the longest possible or shortest possible match?

Is it possible to specify a non-greedy (like *? in regex) in case it is greedy ?

Escaping glob patterns

Is escaping glob patterns supported?

For example, let's say I have a literal path like:

/my*files/more[stuff]/is-there-more?/

Is there a supported mechanism my which *, [, ] and ? can be escaped, such that they will be treated as literals instead of glob characters? For example, can I escape characters with a backslash?

/my\*files/more\[stuff\]/is-there-more\?/

The nuget package have no strong name

I'm getting error:
System.IO.FileLoadException: 'Could not load file or assembly 'DotNet.Glob, Version=2.0.1.0, Culture=neutral, PublicKeyToken=null' or one of its dependencies. A strongly-named assembly is required.
How I can use your package now? Or when will you fix it?

IsMatch throws IndexOutOfRangeException

If I run:

var file = "x";
var glob = "**/y"
Glob.Parse(glob).IsMatch(file);

The following exception is thrown by IsMatch

Unhandled Exception: System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at DotNet.Globbing.Evaluation.WildcardDirectoryTokenEvaluator.IsMatch(String allChars, Int32 currentPosition, Int32& newPosition)
   at DotNet.Globbing.Evaluation.CompositeTokenEvaluator.IsMatch(String allChars, Int32 currentPosition, Int32& newPosition)
   at DotNet.Globbing.Glob.IsMatch(String subject)

It seems that if the length of the string to match is the same as what follows the **/ the exception is thrown.

Additional information:

λ dotnet --version
2.1.4

Path separator insensitive with wildcards

I'm having some trouble using wildcard globs in situations where I get mixed forward and backward slashes (cross-platform vscode extension, it's a nightmare, vscode gives you wonderful things like //c/users)

"**/gfx/*.gfx" seems to work fine on mixed slashes.

"**/gfx/**/*.gfx" only seems to work on paths with forward slashes.

"**\\gfx\\**\\*.gfx" only seems to work on paths with backwards slashes.

Is this working as designed, or am I missing something?

"C:\THIS_IS_A_DIR\**\somefile.txt" matches wrongly to "C:\THIS_IS_A_DIR\awesomefile.txt"

see the test "DotNet.Glob.Tests.GlobTests.Does_Not_Match"
should not match but does it

[Theory]
[InlineData( "C:\\THIS_IS_A_DIR\\**\\somefile.txt", "C:\\THIS_IS_A_DIR\\awesomefile.txt" )]
public void Does_Not_Match(string pattern, params string[] testStrings)
{
    var glob = Globbing.Glob.Parse(pattern);
    foreach (var testString in testStrings)
    {
        Assert.False(glob.IsMatch(testString));
    }
}

Glob Formatter

Write a formatter that can iterate a tokenised glob pattern and output the relevent glob string.

This will be useful if building up a glob programtically. i.e



Glob Builder

Create a glob builder so can fluently build up globs.

So to build a glob: /foo?\\*[abc][!1-3].txt

 var glob = new GlobBuilder()
                .PathSeperator()
                .Literal("foo")
                .AnyCharacter()
                .PathSeperator(PathSeperatorKind.BackwardSlash)
                .Wildcard()
                .OneOf('a', 'b', 'c')
                .NumberNotInRange('1', '3')
                .Literal(".txt")
                .ToGlob();

/DIR1/DIR2/file.txt won't match glob /DIR1/*/*

Code example:

Glob glob = Glob.Parse(@"/DIR1/*/*");
MatchInfo matchInfo = glob.Match(@"/DIR1/DIR2/file.txt");
Console.Out.WriteLine("matchInfo.Success = {0}", matchInfo.Success);

gives:
matchInfo.Success = False

More Performance Improvement Ideas and Heurisitcs

When parsing a glob pattern I can do the following:

  1. Calculate the minimum required char length that a string needs in order to match the pattern overall.

For example, this pattern, would require a string atleasts 4 character's long in order to match.
**/*.txt

This would require 9:
*/[a-z][!1-9]/f.txt

This is because certain tokens require atleast a single character in order to match, so you can sum that total up when analysing the glob pattern, and get a minimum required length for any set of tokens that a string needs to be in order to have the possibility of matching.

Some tokens (* and **) will match against 0 or many characters so they won't add any weight to the minimum required length.

With this information available, when matching strings against the Glob using Glob.IsMatch(somestring) I can allow the glob to fail much faster on certain strings using a simple string length check.

For example:

var glob = Glob.Parse("*some/fol?er/p*h/file.*")
Glob.IsMatch("aaaaaaaasome/foo")

That can fail pretty much straight away, becuase the computed min length of a matching string is 20 chars, and the test string is only 16 chars long. I can fail this without even attempting to match any of the tokens.

I expect to use this new length information for the IsMatch() evaluation, where you just want a bool result quickly. The Match() method is different because it actually returns more in depth analysis about the match, including which tokens failed to match and why. For example, in the case of a string aaaaaaaasome/foo and a pattern *some/fol?er/p*h/file.* it might be important to know that "*some/" actually matches "*aaaaaaaasome/" but that fol fails to match "foo" , and the closest it came to matching was "fo". This kind of in-depth match analysis can only be returned if the match is actually attempted, which means not failing fast due to length checks. However failing early is desirable when doing IsMatch() because a boolean result is all you want to know.

Once I have the min required length computed, I can also put in some improvements for the Wildcard(*) evaluator and WildcardDirectory(**) evaluators.

Those evaluators will now be able to match against characters only within a range where it doesn't take them past minimum length required for the remaining tokens to be matched.

Match vs IsMatch

The Match method is different from IsMatch in that rather than return a simple bool it returns information about how a match progressed i.e what tokens matched at which positions of the string which is useful if you need to analyze the match. However its implementation is not consistent with IsMatch implementation and also has some bugs. My first choice is to make this method obsolete and then eventually remove it. If people want this method, ill add a message to the obsolete directive to add feedback to this issue, and then if there is demand to keep it i'll refactor it rather than remove it.

Case sensitivity option

I like the project, works well so far for me. However, am missing an option to make glob case insensitive to mimic how Windows treats paths. Currently I have to make both glob pattern and file names lower case to achieve this. Seems like this could easily be baked in.

Keep up the good work!

Glob.Match InvalidOperationException

Calling Glob.Match() on certain globs/input strings will produce an InvalidOperationException with the message "Index was outside the bounds of the array". Repro for v1.6.9:

Glob.Parse("*://*wikia.com/**").Match("chrome://extensions");

** Directory wildcard not matching correctly

At the moment /**/some.* doesn't match /some.txt due to the fact that the / is matching, and then the directory wildcard ** is matching from position 1 with subtokens that match /some.* against text from position 1 which is some.txt. /some.* doesn't match some.txt.

This all boils down to the fact that ** token needs to also know if it has a trailing / so that I can omit a / token for the trailing slash in its place. That way rather than the glob /**/ being tokenised into

  • / token
  • ** token
  • / token

it can be tokenised into just:

  • / token
  • ** token (with information to say the token has a trailing / character.

This will result in the trailing slash in / being omitted as token, so then //some.* will match /some.txt.

/**/some.* will not however match some.txt as the first / still expects to match as it is a path separator token. I think this is ok, as to match "some.txt" in any directory you could use **/some.txt and then this wont require a leading slash.
.

Benchmark Non Matches

I have some benchmarks that benchmark glob.IsMatch() for loads of successful matches.

However, I also need to benchmark glob.IsMatch() for unsuccessful matches - as dotnet glob should be highly efficient at evaluating an unsuccesful match, and returning a result as fast as possible,

wrong match for "C:\name.ext" to glob pattern "C:\name\**"

given the following test code ...

var glob = DotNet.Globbing.Glob.Parse( @"C:\name\**" );
bool result = glob.IsMatch( @"C:\name.ext" );
result = glob.IsMatch( @"C:\name_longer.ext" );`

... the result of IsMatch() is true in both cases. To my mind this is wrong. The result should be false. I am using DotNet.Glob-1.6.1 from nuget.org.

Extending globbing patterns

This is a new feature to add support for a set of extended globbing patterns - documented here:

https://www.linuxjournal.com/content/bash-extended-globbing

I see this as an opt-in feature - so I'll add another property on the options class, so you can opt-in like so:

GlobParseOptions.Default.Evaluation.EnableExtendedPatterns = true;

Once enabled, you can use the following additional patterns as supported by bash:

?(pattern-list) Matches zero or one occurrence of the given patterns
*(pattern-list) Matches zero or more occurrences of the given patterns
+(pattern-list) Matches one or more occurrences of the given patterns
@(pattern-list) Matches one of the given patterns
!(pattern-list) Matches anything except one of the given patterns

Here a pattern-list is a list of items separated by a vertical bar "|" (aka the pipe symbol).

For example, the following pattern would match all the JPEG and GIF files that start with either "ab" or "def":

+(ab|def)*+(.jpg|.gif)

Glob Match Generator

To facilitate testing, create a generator, that given a glob, can generate random strings en-masse that will match that glob.

Pattern **/ not working

The pattern "**/app*.js" for example should match when a path looks like dist/app.js or dist/app.a72ka8234.js. The issue is it is not evaluating the "**/" part of the glob pattern as the documentation only mentions "/**/" as a valid pattern. Any plan to implement this soon?

Spanification

With the recent 'spanification' of .NET Core, in particular the addition of the Span-based FileSystemEnumerable and System.IO.Path Span-based methods, I'm wondering if it would be possible to add Span-based methods to Glob? The point would be to avoid allocating strings when performing Glob-based matching.

Token parsing and AllowInvalidPathCharacters = true

Discovered whilst investigating #46

When setting the following:


options.Parsing.AllowInvalidPathCharacters = true;

and tokenising **/foo/

It causes the tokeniser to parse the literal foo as foo/. This causes a literal match on the path seperator which is problematic if mixed slashes are used.

Build failing - cake issue.

The build has started failing and it looks like because the version of cake nugget package wasn't constrained, its automatically rolled to a new release and this has broken incompatible cake addins. I need to lock it down to the previous version.

Defect: Patterns with Unsupported non alph-numeric characters fail to match

It appers that a patterns with escape sequences fail:

  • DotNet.Globbing.Glob.Parse("\"Stuff*").IsMatch("\"Stuff"): false
  • DotNet.Globbing.Glob.Parse("\0Stuff*").IsMatch("\0Stuff"): false
  • DotNet.Globbing.Glob.Parse("\nStuff*").IsMatch("\nStuff"): false
  • DotNet.Globbing.Glob.Parse("\r\nStuff*").IsMatch("\r\nStuff"): false

Infinite loop?

I'm using the latest Nuget package, 1.7.0-unstable0022, and one of my tests found an issue whereby the call to IsMatch never returns (I assume due to an infinite loop).

Pattern: C:\Test\**\*.txt
Test String: C:\Test\file.dat

This issue was not present in 1.7.0-unstable0018

Unexpected match with **

Hi,

I just found an unexpected match when fiddling around with your library. I am not sure if this is a bug, or if my understanding is lacking:

var glob = DotNet.Globbing.Glob.Parse("Bumpy/**/AssemblyInfo.cs");

// success - expected
Assert.IsTrue(glob.IsMatch("Bumpy/Properties/AssemblyInfo.cs"));

// failure - unexpected
Assert.IsFalse(glob.IsMatch("Bumpy.Test/Properties/AssemblyInfo.cs"));

match assembly version and nuget version

I noticed that all of the DLLs in the nuget have the version 1.0.0.0 in their Win32 version resources and the .NET assembly names. It would be nice if this would be the same as the nuget version.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.