GithubHelp home page GithubHelp logo

microsoft / sarif-pattern-matcher Goto Github PK

View Code? Open in Web Editor NEW
38.0 7.0 15.0 13.07 MB

Quality domain agnostic regular expression pattern matcher that persists results to SARIF

License: MIT License

Batchfile 0.01% C# 95.29% C 0.18% C++ 1.05% PowerShell 3.47% Python 0.01%

sarif-pattern-matcher's Introduction

sarif-pattern-matcher

release releases license

Quality domain agnostic regular expression pattern matcher that persists results to SARIF

NuGet packages

The following packages are published from this repository:

Latest Official Release
Sarif.Pattern.Matcher Nuget
Sarif.Pattern.Matcher.Cli Nuget
Sarif.Pattern.Matcher.Sdk Nuget
Sarif.Pattern.Matcher.Security Nuget
RE2.Managed Nuget
Strings.Interop Nuget

Getting started

How To Contribute

sarif-pattern-matcher is accepting contributions. If you've submitted a PR for an existing issue, please post a comment in the issue to avoid duplication of effort. See our CONTRIBUTING file for more information - it also contains guidelines for how to submit a PR.

License

"Sarif-pattern-matcher" is licensed under MIT license. View license.

sarif-pattern-matcher's People

Contributors

bpendragon avatar cfaucon avatar dependabot[bot] avatar eddynaka avatar fbeaty4 avatar hulonjenkins avatar jameswinkler avatar lingzhou-gh avatar marmegh avatar michaelcfanning avatar schlaman-ms avatar shaopeng-gh avatar suvamm avatar v-kevinvu avatar virtualvivek avatar yongyan-gh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

sarif-pattern-matcher's Issues

We shouldn't flatten all files to .spam folder

If we do this, we can't easily support plug-ins with validator files that have different dependencies.

We should probably have a top-level directory with the plugin name:

.spam\Security

Copy json files from Security to solution folder

Idea: when we build the solution, our JSON files inside the Security project should be copied to a folder called .spam where the solution exists.

Suggestion: use MSBuild tasks to accomplish this.

GoogleGCMServiceAccountValidator casing incorrect and fingerprint is wrong

@eddynaka , @jameswinkler , this is a small fix but high priority

The correct casing of this rule name is GoogleGcmServiceAccountValidator. In .NET, you never case more than two letters in sequence as upper-case.

Also, the fingerprint for this check is broken. Resource is not the right property to use. A 'resource' is a sub-component of a container. So, a database is a resource of a database server.

This rule in general looks weak. What's the insecurity here? If we can't detect an exposure of interest, that can be validated, please just consider deleting this check for now.

[Versioning] Check how to create versioning increasing number

Today, we are using nerdbank to generate the versioning.

But when we do that and we publish, it's adding the commit to the versioning, making it hard to check if you are using the latest or not.

Idea: we don't want to create a new branch, for example v1.3.2, update the files and release, since we are doing pre-releases.

Analyzing Google API keys with referer/APK SHA1 cert restrictions should return pass results

Results such as the following aren't necessarily warnings, they may reflect proper security applied to an API key. Generally, it's best not to expose a key but this can be difficult in some web site scenarios or when embedding a key in a mobile app (even in this latter case, though, Google recommends encrypting or obfuscating).

For these use cases, an API key should be restricted, for example, by applying a referer filter or enforcing a check against the mobile app identity. When our validator detects these conditions today, we fire warnings but arguably, these are 'pass' conditions! :)

e:\repros\test.txt(3,1-40): warning SEC101/003: 'test.txt' contains an apparent Google API key, the validity of which could not be determined by runtime analysis (an unexpected exception was caught attempting to validate api key: RequestDenied: API keys with referer restrictions cannot be used with this API).

e:\repros\test.txt(6,1-40): warning SEC101/003: 'test.txt' contains an apparent Google API key, the validity of which could not be determined by runtime analysis (an unexpected exception was caught attempting to validate api key: RequestDenied: This IP, site or mobile application is not authorized to use this API key. Request received from IP address 2601:600:877f:8c60:3033:d967:d038:6ee3, with empty referer).

Merge SqlCredential and PSCredential

These two validators are almost identical in test cases and code. The only real difference are the names which can easily be merged with regex.

Merge the two validators.

Complete and verify dynamic validation for CloudantValidator

While dynamic validation exists for CloudantValidator, we've never verified its accuracy. Testing against a free database https://64de90ff-4b11-4141-b2da-71d8013703be-bluemix.cloudant.com/dashboard.html always yields a 200/successful response, regardless of whether or not credentials are included in a request. It may be that free accounts can't be secured like payed accounts.

If we see that scanning against real data produces successful validation, we can close this work item.

If not, we need to eventually revisit this and complete the implementation. Perhaps the IBM documentation will be improved by then.

Useful links:

Review Sensitive File should return only one result

Today, if we run our test for the ReviewSensitiveFile, you will see that the output sarif is expecting many results because it is validating many times in the method RunMatchExpression.

What I would expect is: if we analyze a file that is an issue, we will just show one result pointing to the file.

What is happening today: it is analyzing all characters inside the file.

cc: @michaelcfanning

New RE2 wrapper todos

  • Integrate with BuildRegex.
  • Use String8 instead of StringUtf8.
  • Consider refactoring names.
  • Implement x86 support.
  • Error handling, especially when compiling a pattern.
  • Test x86 support.
  • Ensure that Matches can be used concurrently.
    • [ ]Encoding.UTF8.GetBytes concurrency-safe?

Verify checksum for new GitHub PAT

The new GitHub PAT described here:

https://github.blog/2021-04-05-behind-githubs-new-authentication-token-formats/

... comes with a checksum in the last 6 characters. Use this in static validation to reduce false positives.

We may be able to calculate CRC32 using existing code in the internal repo, here:
\src\Plugins\Security.Internal\SEC101_102.AdoPatValidator.cs

This library may be useful:

ICSharpCode.SharpZipLib.Checksum;

Search Nuget for "SharpZipLib"

Other useful code:

var crc32 = new Crc32();
byte[] byteArray = Encoding.ASCII.GetBytes(pat);
crc32.Update(byteArray);
string testString = crc32.Value.ToString();

string paddedIntendedChecksum = crc32.Value.ToString().PadLeft(6, '0');

if (paddedIntendedChecksum != checksum)
{
return ValidationState.NoMatch;
}

Consider enabling JSON files by default

Talking with @cfaucon , when we analyze data from Cosmos, the data is based on JSON. So, with that in mind, we would need to enable all rules to accept JSON files.

What we propose:

Option 1:

  • create a flag in the match expression properties that would enable JSON by default and, if the rule does not apply for JSON files, we would set it to false, for example.

Option 2:

  • file vs content detection, for example, pfx vs json.

@michaelcfanning , what do you think?

Support RE2 memory allocation

Just to let you know, RE2 has an option where we can configure the size of the memory to use.

Right now, it’s a fixed number, but in specific cases, we could change that to something bigger.

😊

Improve NugetCredentials validator/regex

There are an arbitrary number of hosts in a config file, but you can also specify an arbitrary number of user credentials!

and so this helper should return host * secrets fingerprint candidates.

https://docs.microsoft.com/en-us/nuget/reference/nuget-config-file

<packageSourceCredentials>
    <Contoso>
        <add key="Username" value="[email protected]" />
        <add key="Password" value="..." />
        <add key="ValidAuthenticationTypes" value="basic" />
    </Contoso>
    <Test_x0020_Source>
        <add key="Username" value="user" />
        <add key="ClearTextPassword" value="hal+9ooo_da!sY" />
        <add key="ValidAuthenticationTypes" value="basic, negotiate" />
    </Test_x0020_Source>
</packageSourceCredentials>

Complete Google Service Account Key Dynamic Validation

The Google Service Account Key validator attempts to extract and validate application oauth credentials. This is tricky as it requires requesting an access token without knowing the scopes, and with many redirects involved in the flow. Some code is started in https://github.com/microsoft/sarif-pattern-matcher/tree/users/v-jwinkler/GoogleServiceAccountKeyValidator_DynamicValidationAndUnitTests. The code using GoogleWebAuthorizationBroker comes closest, producing output messages like "invalid client id" or "client id deleted", but unfortunately these messages are only visible in a browser that unit tests open. The unit tests themselves hang.

Determine if there's any way to programmatically read the results from the browser, simulate the flow, or find some other technology/library to verify the client id and secret.

Improving Documentation

I am facing issues building the solution---running BuildAndTest.cmd from powershell throws a bunch of errors, which seem to highlight missing dependencies. Could you kindly enumerate the dependencies for the project in the documentation? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.