GithubHelp home page GithubHelp logo

microsoft / sarif-tools Goto Github PK

View Code? Open in Web Editor NEW
62.0 5.0 16.0 466 KB

A set of Python command line tools for working with SARIF files produced by code analysis tools

License: MIT License

Python 97.51% HTML 2.49%

sarif-tools's Introduction

SARIF Tools

A set of command line tools and Python library for working with SARIF files.

Read more about the SARIF format here: sarifweb.azurewebsites.net.

Installation

Prerequisites

You need Python 3.8 or later installed. Get it from python.org. This document assumes that the python command runs that version.

Installing on Windows

Open a user command prompt and type:

pip install sarif-tools

Check for a warning such as the following:

WARNING: The script sarif.exe is installed in 'C:\tools\Python38\Scripts' which is not on PATH.

Go into Windows Settings and search for "env" (Edit environment variables for your account) and add the missing path to your PATH variable. You'll need to open a new terminal or reboot, and then you can type sarif --version at the command prompt.

To install system-wide for all users, use an Administrator command prompt instead, if you are comfortable with the security risks.

Installing on Linux or Mac

pip install sarif-tools

Check for a warning such as the following:

WARNING: The script sarif is installed in '/home/XYZ/.local/bin' which is not on PATH.

Add the missing path to your PATH. How to do that varies by Linux flavour, but editing ~/.profile is often a good approach. Then after opening a new terminal or running source ~/.profile, you should be able to type sarif --version at the command prompt.

To install system-wide, use sudo pip install. Be aware that this is discouraged from a security perspective.

Testing the installation

After installing using pip, you should then be able to run:

sarif --version

Troubleshooting installation

This section has suggestions in case the sarif command is not available after installation.

A launcher called sarif or sarif.exe is created in Python's Scripts directory. The Scripts directory needs to be in the PATH environment variable for you to be able to type sarif at the command prompt; this is most likely the case if pip is run as a super-user when installing (e.g. Administrator Command Prompt on Windows, or using sudo on Linux). If the SARIF tools are installed for the current user only, adding the user's Scripts directory to the current user's PATH variable is the best approach. Search online for how to do that on your system.

If the Scripts directory is not in the PATH, then you can type python -m sarif instead of sarif to run the tool.

Confusion can arise when the python and pip commands on the PATH are from different installations, or the python installation on the super-user's PATH is different from the python command on the normal user's path. On Windows, you can use where python and where pip in normal CMD and Admin CMD to see which installations are in use; on Linux, it's which python and which pip with and without sudo.

Command Line Usage

usage: sarif [-h] [--version] [--debug] [--check {error,warning,note}] {blame,codeclimate,copy,csv,diff,emacs,html,info,ls,summary,trend,usage,word} ...

Process sets of SARIF files

positional arguments:
  {blame,codeclimate,copy,csv,diff,emacs,html,info,ls,summary,trend,usage,word}
                        command

optional arguments:
  -h, --help            show this help message and exit
  --version, -v         show program's version number and exit
  --debug               Print information useful for debugging
  --check {error,warning,note}, -x {error,warning,note}
                        Exit with error code if there are any issues of the specified level (or for diff, an increase in issues at that level).

commands:
blame        Enhance SARIF file with information from `git blame`
codeclimate  Write a JSON representation in Code Climate format of SARIF file(s) for viewing as a Code Quality report in GitLab UI
copy         Write a new SARIF file containing optionally-filtered data from other SARIF file(s)
csv          Write a CSV file listing the issues from the SARIF files(s) specified
diff         Find the difference between two [sets of] SARIF files
emacs        Write a representation of SARIF file(s) for viewing in emacs
html         Write an HTML representation of SARIF file(s) for viewing in a web browser
info         Print information about SARIF file(s) structure
ls           List all SARIF files in the directories specified
summary      Write a text summary with the counts of issues from the SARIF files(s) specified
trend        Write a CSV file with time series data from SARIF files with "yyyymmddThhmmssZ" timestamps in their filenames
usage        (Command optional) - print usage and exit
word         Produce MS Word .docx summaries of the SARIF files specified
Run `sarif <COMMAND> --help` for command-specific help.

Commands

The commands are illustrated below assuming input files in the following locations:

  • C:\temp\sarif_files = a directory of SARIF files with arbitrary filenames.
  • C:\temp\sarif_with_date = a directory of SARIF files with filenames including timestamps e.g. C:\temp\sarif_with_date\myapp_devskim_output_20211001T012000Z.sarif.
  • C:\temp\old_sarif_files = a directory of SARIF files with arbitrary filenames from an older build.
  • C:\code\my_source_repo = checkout directory of source code files from which SARIF results were obtained.

blame

usage: sarif blame [-h] [--output PATH] [--code PATH] [file_or_dir [file_or_dir ...]]

Enhance SARIF file with information from `git blame`

positional arguments:
  file_or_dir           A SARIF file or a directory containing SARIF files

optional arguments:
  -h, --help            show this help message and exit
  --output PATH, -o PATH
                        Output file or directory
  --code PATH, -c PATH  Path to git repository; if not specified, the current working directory is used

Augment SARIF files with git blame information, and write the augmented files to a specified location.

sarif blame -o "C:\temp\sarif_files_with_blame_info" -c "C:\code\my_source_repo" "C:\temp\sarif_files"

If the current working directory is the git repository, the -c argument can be omitted.

Blame information is added to the property bag of each result object for which it was successfully obtained. The keys and values used are as in the git blame porcelain format. E.g.:

{
  "ruleId": "SM00702",
  ...
  "properties": {
    "blame": {
      "author": "aperson",
      "author-mail": "<[email protected]>",
      "author-time": "1350899798",
      "author-tz": "+0000",
      "committer": "aperson",
      "committer-mail": "<[email protected]>",
      "committer-time": "1350899798",
      "committer-tz": "+0000",
      "summary": "blah blah commit comment blah",
      "boundary": true,
      "filename": "src/net/myproject/mypackage/MyClass.java"
    }
  }
}

Note that the bare boundary key is given the automatic value true.

codeclimate

usage: sarif codeclimate [-h] [--output PATH] [--filter FILE] [--autotrim] [--trim PREFIX] [file_or_dir ...]

Write a JSON representation in Code Climate format of SARIF file(s) for viewing as a Code Quality report in GitLab UI

positional arguments:
  file_or_dir           A SARIF file or a directory containing SARIF files

optional arguments:
  -h, --help            show this help message and exit
  --output PATH, -o PATH
                        Output file or directory
  --filter FILE, -b FILE
                        Specify the filter file to apply. See README for format.
  --autotrim, -a        Strip off the common prefix of paths in the CSV output
  --trim PREFIX         Prefix to strip from issue paths, e.g. the checkout directory on the build agent

Write out a JSON file of Code Climate tool format from [a set of] SARIF files. This can then be published as a Code Quality report artefact in a GitLab pipeline and shown in GitLab UI for merge requests.

The JSON output can also be filtered using the blame information; see Filtering below for how to use the --filter option.

copy

usage: sarif copy [-h] [--output FILE] [--filter FILE] [--timestamp] [file_or_dir [file_or_dir ...]]

Write a new SARIF file containing optionally-filtered data from other SARIF file(s)

positional arguments:
  file_or_dir           A SARIF file or a directory containing SARIF files

optional arguments:
  -h, --help            show this help message and exit
  --output FILE, -o FILE
                        Output file
  --filter FILE, -b FILE
                        Specify the filter file to apply. See README for format.
  --timestamp, -t       Append current timestamp to output filename in the "yyyymmddThhmmssZ" format used by the `sarif trend` command

Write a new SARIF file containing optionally-filtered data from an existing SARIF file or multiple SARIF files. The resulting file contains each run from the original SARIF files back-to-back. The results can be filtered (see Filtering below), in which case only those results from the original SARIF files that meet the filter are included; the output file contains no information about the excluded records. If a run in the original file was empty, or all its results are filtered out, the empty run is still included.

If no output filename is provided, a file called out.sarif in the current directory is written. If the output file already exists and is also in the input file list, it is not included in the inputs, to avoid duplication of results. The output file is overwritten without warning.

The file_or_dir specifier can include wildcards e.g. c:\temp\**\devskim*.sarif (i.e. a "glob"). This works for all commands, but it is particularly useful for copy.

One use for this is to combine a set of SARIF files from multiple static analysis tools run during a build process into a single file that can be more easily stored and processed as a build asset.

csv

usage: sarif csv [-h] [--output PATH] [--filter FILE] [--autotrim] [--trim PREFIX] [file_or_dir [file_or_dir ...]]

Write a CSV file listing the issues from the SARIF files(s) specified

positional arguments:
  file_or_dir           A SARIF file or a directory containing SARIF files

optional arguments:
  -h, --help            show this help message and exit
  --output PATH, -o PATH
                        Output file or directory
  --filter FILE, -b FILE
                        Specify the filter file to apply. See README for format.
  --autotrim, -a        Strip off the common prefix of paths in the CSV output
  --trim PREFIX         Prefix to strip from issue paths, e.g. the checkout directory on the build agent

Write out a simple tabular list of issues from [a set of] SARIF files. This can then be analysed, e.g. via Pivot Tables in Excel.

Use the --trim option to strip specific prefixes from the paths, to make the CSV less verbose. Alternatively, use --autotrim to strip off the longest common prefix.

Generate a CSV summary of a single SARIF file with common file path prefix suppressed:

sarif csv "C:\temp\sarif_files\devskim_myapp.sarif"

Generate a CSV summary of a directory of SARIF files with path prefix C:\code\my_source_repo suppressed:

sarif csv --trim c:\code\my_source_repo "C:\temp\sarif_files"

If the SARIF file(s) contain blame information (as added by the blame command), then the CSV includes an "Author" column indicating who last modified the line in question.

The CSV output can also be filtered using the same blame information; see Filtering below for how to use the --filter option.

diff

usage: sarif diff [-h] [--output FILE] [--filter FILE] old_file_or_dir new_file_or_dir

Find the difference between two [sets of] SARIF files

positional arguments:
  old_file_or_dir       An old SARIF file or a directory containing the old SARIF files
  new_file_or_dir       A new SARIF file or a directory containing the new SARIF files

optional arguments:
  -h, --help            show this help message and exit
  --output FILE, -o FILE
                        Output file
  --filter FILE, -b FILE
                        Specify the filter file to apply. See README for format.

Print the difference between two [sets of] SARIF files.

Difference between the issues in two SARIF files:

sarif diff "C:\temp\old_sarif_files\devskim_myapp.sarif" "C:\temp\sarif_files\devskim_myapp.sarif"

Difference between the issues in two directories of SARIF files:

sarif diff "C:\temp\old_sarif_files" "C:\temp\sarif_files"

Write output to JSON file instead of printing to stdout:

sarif diff -o mydiff.json "C:\temp\old_sarif_files\devskim_myapp.sarif" "C:\temp\sarif_files\devskim_myapp.sarif"

The JSON format is like this:

{
    "all": {
        "+": 5,
        "-": 11
    },
    "error": {
        "+": 2,
        "-": 0,
        "codes": {
            "XYZ1234 Some Issue": {
                "<": 0,
                ">": 2,
                "+@": [
                    {
                        "Location": "C:\\code\\file1.py",
                        "Line": 119
                    },
                    {
                        "Location": "C:\\code\\file2.py",
                        "Line": 61
                    }
                ]
            },
        }
    },
    "warning": {
        "+": 3,
        "-": 11,
        "codes": {...}
    },
    "note": {
        "+": 3,
        "-": 11,
        "codes": {...}
    }
}

Where:

  • "+" indicates new issue types at this severity, "error", "warning" or "note"
  • "-" indicates resolved issue types at this severity (no occurrences remaining)
  • "codes" lists each issue code where the number of occurrences has changed:
    • occurrences before indicated by "<"
    • occurrences after indicated by ">"
    • new locations indicated by "+@"

If the set of issue codes at a given severity has changed, diff will report this even if the total number of issue types at that severity is unchanged.

When the number of occurrences of an issue code is unchanged, diff will not report this issue code, although it is possible that an equal number of new occurrences of the specific issue have arisen as have been resolved. This is to avoid reporting line number changes.

The diff operation shows the location of new occurrences of each issue. When writing to an output JSON file, all new locations are written, but when writing output to the console, a maximum of three locations are shown. Note that there can be some false positives here, if line numbers have changed.

See Filtering below for how to use the --filter option.

emacs

usage: sarif emacs [-h] [--output PATH] [--filter FILE] [--no-autotrim] [--image IMAGE] [--trim PREFIX] [file_or_dir [file_or_dir ...]]

Write a representation of SARIF file(s) for viewing in emacs

positional arguments:
  file_or_dir           A SARIF file or a directory containing SARIF files

optional arguments:
  -h, --help            show this help message and exit
  --output PATH, -o PATH
                        Output file or directory
  --filter FILE, -b FILE
                        Specify the filter file to apply. See README for format.
  --no-autotrim, -n     Do not strip off the common prefix of paths in the output document
  --image IMAGE         Image to include at top of file - SARIF logo by default
  --trim PREFIX         Prefix to strip from issue paths, e.g. the checkout directory on the build agent

html

usage: sarif html [-h] [--output PATH] [--filter FILE] [--no-autotrim] [--image IMAGE] [--trim PREFIX] [file_or_dir [file_or_dir ...]]

Write an HTML representation of SARIF file(s) for viewing in a web browser

positional arguments:
  file_or_dir           A SARIF file or a directory containing SARIF files

optional arguments:
  -h, --help            show this help message and exit
  --output PATH, -o PATH
                        Output file or directory
  --filter FILE, -b FILE
                        Specify the filter file to apply. See README for format.
  --no-autotrim, -n     Do not strip off the common prefix of paths in the output document
  --image IMAGE         Image to include at top of file - SARIF logo by default
  --trim PREFIX         Prefix to strip from issue paths, e.g. the checkout directory on the build agent

Create an HTML file summarising SARIF results.

sarif html -o summary.html "C:\temp\sarif_files"

Use the --trim option to strip specific prefixes from the paths, to make the generated HTML page less verbose. The longest common prefix of the paths will be trimmed unless --no-autotrim is specified.

Use the --image option to provide a header image for the top of the HTML page. The image is embedded into the HTML, so the HTML document remains a portable standalone file.

See Filtering below for how to use the --filter option.

info

usage: sarif info [-h] [--output FILE] [file_or_dir [file_or_dir ...]]

Print information about SARIF file(s) structure

positional arguments:
  file_or_dir           A SARIF file or a directory containing SARIF files

optional arguments:
  -h, --help            show this help message and exit
  --output FILE, -o FILE
                        Output file

Print information about the structure of a SARIF file or multiple files. This is about the JSON structure rather than any meaning of the results produced by the tool. The summary includes the full path of the file, its size and modified date, the number of runs, and for each run, the tool that generated the run, the number of results, and the entries in the results' property bags.

c:\temp\sarif_files\ios_devskim_output.sarif
  1256241 bytes (1.2 MiB)
  modified: 2021-10-13 21:50:01.251544, accessed: 2022-01-09 18:23:00.060573, ctime: 2021-10-13 20:49:00
  1 run
    Tool: devskim
    1323 results
    All results have properties: tags, DevSkimSeverity

ls

usage: sarif ls [-h] [--output FILE] [file_or_dir [file_or_dir ...]]

List all SARIF files in the directories specified

positional arguments:
  file_or_dir           A SARIF file or a directory containing SARIF files

optional arguments:
  -h, --help            show this help message and exit
  --output FILE, -o FILE
                        Output file

List SARIF files in one or more directories.

sarif ls "C:\temp\sarif_files" "C:\temp\sarif_with_date"

summary

usage: sarif summary [-h] [--output PATH] [--filter FILE] [file_or_dir [file_or_dir ...]]

Write a text summary with the counts of issues from the SARIF files(s) specified

positional arguments:
  file_or_dir           A SARIF file or a directory containing SARIF files

optional arguments:
  -h, --help            show this help message and exit
  --output PATH, -o PATH
                        Output file or directory
  --filter FILE, -b FILE
                        Specify the filter file to apply. See README for format.

Print a summary of the issues in one or more SARIF file(s), grouped by severity and then ordered by number of occurrences.

When directories are provided as input and output, a summary is written for each input file, along with another file containing the totals.

sarif summary -o summaries "C:\temp\sarif_files"

When no output directory or file is specified, the overall summary is printed to the standard output.

sarif summary "C:\temp\sarif_files\devskim_myapp.sarif"

See Filtering below for how to use the --filter option.

trend

usage: sarif trend [-h] [--output FILE] [--filter FILE] [--dateformat {dmy,mdy,ymd}] [file_or_dir [file_or_dir ...]]

Write a CSV file with time series data from SARIF files with "yyyymmddThhmmssZ" timestamps in their filenames

positional arguments:
  file_or_dir           A SARIF file or a directory containing SARIF files

optional arguments:
  -h, --help            show this help message and exit
  --output FILE, -o FILE
                        Output file
  --filter FILE, -b FILE
                        Specify the filter file to apply. See README for format.
  --dateformat {dmy,mdy,ymd}, -f {dmy,mdy,ymd}
                        Date component order to use in output CSV. Default is `dmy`

Generate a CSV showing a timeline of issues from a set of SARIF files in a directory. The SARIF file names must contain a timestamp in the specific format yyyymmddThhhmmss e.g. 20211012T110000Z.

The CSV can be loaded in Microsoft Excel for graphing and trend analysis.

sarif trend -o timeline.csv "C:\temp\sarif_with_date" --dateformat dmy

See Filtering below for how to use the --filter option.

upgrade-filter

usage: sarif upgrade-filter [-h] [--output PATH] [file [file ...]]

Upgrade a v1-style blame filter file to a v2-style filter YAML file

positional arguments:
  file                  A v1-style blame-filter file

optional arguments:
  -h, --help            show this help message and exit
  --output PATH, -o PATH
                        Output file or directory

usage

usage: sarif usage [-h] [--output FILE]

(Command optional) - print usage and exit

optional arguments:
  -h, --help            show this help message and exit
  --output FILE, -o FILE
                        Output file

Print usage and exit.

word

usage: sarif word [-h] [--output PATH] [--filter FILE] [--no-autotrim] [--image IMAGE] [--trim PREFIX] [file_or_dir [file_or_dir ...]]

Produce MS Word .docx summaries of the SARIF files specified

positional arguments:
  file_or_dir           A SARIF file or a directory containing SARIF files

optional arguments:
  -h, --help            show this help message and exit
  --output PATH, -o PATH
                        Output file or directory
  --filter FILE, -b FILE
                        Specify the filter file to apply. See README for format.
  --no-autotrim, -n     Do not strip off the common prefix of paths in the output document
  --image IMAGE         Image to include at top of file - SARIF logo by default
  --trim PREFIX         Prefix to strip from issue paths, e.g. the checkout directory on the build agent

Create Word documents representing a SARIF file or multiple SARIF files.

If directories are provided for the -o option and the input, then a Word document is produced for each individual SARIF file and for the full set of SARIF files. Otherwise, a single Word document is created.

Create a Word document for each SARIF file and one for all of them together, in the reports directory (created if non-existent):

sarif word -o reports "C:\temp\sarif_files"

Create a Word document for a single SARIF file:

sarif word -o "reports\devskim_myapp.docx" "C:\temp\sarif_files\devskim_myapp.sarif"

Use the --trim option to strip specific prefixes from the paths, to make the generated documents less verbose. The longest common prefix of the paths will be trimmed unless --no-autotrim is specified.

Use the --image option to provide a header image for the top of the Word document.

See Filtering below for how to use the --filter option.

Filtering

The data in each result object can then be used for filtering via the --filter option available for various commands. This option requires a path to a filter-list YAML file, containing a list of patterns and substrings to match against data in a SARIF file. The format of a filter-list file is as follows:

# Lines beginning with # are interpreted as comments and ignored.
# Optional description for the filter.  If no title is specified, the filter file name is used.
description: Example filter from README.md

# Optional configuration section to override default values.
configuration:
  # This option controls whether to include results where a property to check is missing, default
  # value is true.
  default-include: false
  # This option only applies filter criteria if the line number is present and not equal to 1.
  # Some static analysis tools set the line number to 1 for whole file issues, but this does not
  # work with blame filtering, because who last changed line 1 is irrelevant.  Default value is
  # true.
  check-line-number: true

# Items in `include` list are interpreted as inclusion filtering rules.
# Items are treated with OR operator, the filtered results includes objects matching any rule.
# Each item can be one rule or a list of rules, in the latter case rules in the list are treated
# with AND operator - all rules must match.
include:
  # The following line includes issues whose author-mail property contains "@microsoft.com" AND
  # found in Java files.
  # Values with special characters `\:;_()$%^@,` must be enclosed in quotes (single or double):
  - author-mail: "@microsoft.com"
    locations[*].physicalLocation.artifactLocation.uri: "*.java"
  # Instead of a substring, a regular expression can be used, enclosed in "/" characters.
  # Issues whose committer-mail property includes a string matching the regular expression are included.
  # Use ^ and $ to match the whole committer-mail property.
  - committer-mail:
      value: "/^<myname.*\\.com>$/"
      # Configuration options can be overridden for any rule.
      default-include: true
      check-line-number: true
# Lines under `exclude` are interpreted as exclusion filtering rules.
exclude:
  # The following line excludes issues whose location is in test Java files with names starting with
  #  the "Test" prefix.
  - location: "Test*.java"
  # The value for the property can be empty, in this case only existence of the property is checked.
  - suppression:

Here's an example of a filter-file that includes issues on lines changed by an @microsoft.com email address or a myname.SOMETHING.com email address, but not if those email addresses end in [email protected] or contain a GUID. It's the same as the above example, with comments stripped out.

description: Example filter from README.md
configuration:
  default-include: true
  check-line-number: true
include:
  - author-mail: "@microsoft.com"
  - author-mail: "/myname\\..*\\.com/"
exclude:
  - author-mail: [email protected]
  - author-mail: '/[0-9A-F]{8}[-][0-9A-F]{4}[-][0-9A-F]{4}[-][0-9A-F]{4}[-][0-9A-F]{12}\@microsoft.com/'

Field names must be specified in JSONPath notation accessing data in the SARIF result object.

For commonly used properties the following shortcuts are defined:

Shortcut Full JSONPath
author properties.blame.author
author-mail properties.blame.author-mail
committer properties.blame.committer
committer-mail properties.blame.committer-mail
location locations[*].physicalLocation.artifactLocation.uri
rule ruleId
suppression suppressions[*].kind

For the property uri (e.g. in locations[*].physicalLocation.artifactLocation.uri) file name wildcard characters can be used as it represents a file location:

  • ? - a single occurrence of any character in a directory or file name
  • * - zero or more occurrences of any character in a directory or file name
  • ** - zero or more occurrences across multiple directory levels

E.g.

  • tests/Test???.js
  • src/js/*.js
  • src/js/**/*.js

All matching is case insensitive, because email addresses are. Whitespace at the start and end of lines is ignored, which also means that line ending characters don't matter. The filter file must be UTF-8 encoded (including plain ASCII7).

If there are no inclusion patterns, all issues are included except for those matching the exclusion patterns. If there are inclusion patterns, only issues matching the inclusion patterns are included. If an issue matches one or more inclusion patterns and also at least one exclusion pattern, it is excluded.

Usage as a Python library

Although not its primary purpose, you can use sarif-tools from a Python script or module to load and summarise SARIF results.

Basic usage pattern

After installation, use sarif.loader to load a SARIF file or files, and then use the operations on the returned SarifFile or SarifFileSet objects to explore the data.

from sarif import loader

sarif_data = loader.load_sarif_file(path_to_sarif_file)
issue_count_by_severity = sarif_data.get_result_count_by_severity()
error_histogram = sarif_data.get_issue_code_histogram("error")

Result access API

The three classes defined in the sarif_files module, SarifFileSet, SarifFile and SarifRun, provide similar APIs, which allows SARIF results to be handled similarly at multiple levels of aggregation. This section briefly describes some of the key APIs at the three levels of aggregation.

get_distinct_tool_names()

Returns a list of distinct tool names in a SarifFile or for all files in a SarifFileSet. A SarifRun has a single tool name so the equivalent method is get_tool_name().

get_results()

Return the list of SARIF results. These are objects as defined in the SARIF standard section 3.27.

get_records()

Return the list of SARIF results as simplified, flattened record dicts. Each record has the attributes defined in sarif_file.RECORD_ATTRIBUTES.

  • "Tool" - the tool name for the run containing the result.
  • "Severity" - the SARIF severity for the record. One of error, warning (the default if the record doesn't specify) or note.
  • "Code" - the issue code from the result.
  • "Description" - the issue name from the result - corresponding to the Code.
  • "Location" - the location of the issue, typically the file containing the issue. Format varies by tool.
  • "Line" - the line number in the file where the issue occurs. Value is a string. This defaults to "1" if the tool failed to identify the line.

get_records_grouped_by_severity()

As per get_records(), but the result is a dict from SARIF severity level (error, warning and note) to the list of records of that severity level.

get_result_count(), get_result_count_by_severity()

Get the total number of SARIF results. get_result_count_by_severity() returns a dict from SARIF severity level (error, warning and note) to the integer number of results of that severity.

get_issue_code_histogram(severity)

For the given severity, get histogram in the form of a list of pairs. The first item in each pair is the issue code, the second item is the number of matching records, and the list is sorted in decreasing order of frequency (the same as the sarif summary command output).

Disaggregation and filename access

These fields and methods allow access to the underlying information about the SARIF files.

  • SarifFileSet.subdirs - a list of SarifFileSet objects corresponding to the subdirectories of the directory from which the SarifFileSet was created.
  • SarifFileSet.files - a list of SarifFile objects corresponding to the SARIF files contained in the directory from which the SarifFileSet was created.
  • SarifFile.get_abs_file_path() - get the absolute path to the SARIF file.
  • SarifFile.get_file_name() - get the name of the SARIF file.
  • SarifFile.get_file_name_without_extension() - get the name of the SARIF file without its extension. Useful for constructing derived filenames.
  • SarifFile.get_filename_timestamp() - extract the timestamp from the filename of a SARIF file, and return it as a string. The timestamp must be in the format specified in the sarif trend command.
  • SarifFile.runs - a list of SarifRun objects contained in the SARIF file. Most SARIF files only contain a single run, but it is possible to aggregate runs from multiple tools into a single SARIF file.

Path shortening API

Call init_path_prefix_stripping(autotrim, path_prefixes) on a SarifFileSet, SarifFile or SarifRun object to set up path filtering, either automatically removing the longest common prefix (autotrim=True) or removing specific prefixes (autotrim=False and a list of strings in path_prefixes).

Filtering API

Call init_general_filter(filter_description, include_filters, exclude_filters) on a SarifFileSet, SarifFile or SarifRun object to set up filtering. filter_description is a string and the other parameters are lists of inclusion and exclusion rules. They correspond in an obvious way to the filter file contents described in Filtering above.

Call get_filter_stats() to retrieve the filter stats after reading the results or records from sarif files. It returns None if there is no filter, or otherwise a sarif_file.FilterStats object with integer fields filtered_in_result_count, filtered_out_result_count. Call to_string() on the FilterStats object for a readable representation of these statistics, which also includes the filter file name or description (filter_description field).

Suggested usage in CI pipelines

Using the --check option in combination with the summary command causes sarif-tools to exit with a nonzero exit code if there are any issues of the specified level, or higher. This can be useful to fail a continuous integration (CI) pipeline in the case of SAST violation.

The SARIF issue levels are error, warning and note. These are all valid options for the --check option.

E.g. to fail if there are any errors or warnings:

sarif --check warning summary c:\temp\sarif_files

The diff command can check for any increase in issues of the specified level or above, relative to a previous or baseline build.

E.g. to fail if there are any new issue codes at error level:

sarif --check error diff c:\temp\old_sarif_files c:\temp\sarif_files

You can also use sarif-tools to filter and consolidate the output from multiple tools. E.g.

# First run your static analysis tools, configured to write SARIF output.  How to do that depends
# the tool.

# Now run the blame command to augment the output with blame information.
sarif blame -o with_blame/myapp_mytool_with_blame.sarif myapp_mytool.sarif

# Now combine all tools' output into a single file
sarif copy --timestamp -o artifacts/myapp_alltools_with_blame.sarif

Download the file myapp_alltools_with_blame_TIMESTAMP.sarif that is generated. Then later you can filter the results using the --filter argument, or generate graph of code quality over time using sarif trend.

Credits

sarif-tools was originally developed during the Microsoft Global Hackathon 2021 by Simon Abykov, Nick Brabbs, Anthony Hayward, Sivaji Kondapalli, Matt Parkes and Kathryn Pentland.

Thank you to everyone who has contributed pull requests since the initial release!

sarif-tools's People

Contributors

abyss638 avatar balgillo avatar bushelofsilicon avatar dependabot[bot] avatar dkloper avatar makhorkin avatar manuelberrueta avatar matt-parkes avatar microsoft-github-operations[bot] avatar microsoftopensource avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

sarif-tools's Issues

Code block in sarif files incorrectly rendered in Summary

  1. have sarif file with complex message
    summary (4).zip

  2. run summary command (sarif summary)

  3. pipe to github step output or github command

  4. results in text that is rendered in ugly way

    • name: Install ms sarif tools
      if: ${{ always() }}
      shell: bash
      run: |
      mkdir sarif-tools
      git clone https://github.com/microsoft/sarif-tools.git
      cd ./sarif-tools
      pip install .
      cd ..

    • name: View all issues from sarif files
      if: ${{ always() }}
      shell: bash
      run: |
      sarif summary ./${{inputs.report_location}} -o ./hdf/issues.txt
      cat ./hdf/issues.txt
      echo "Summary of issues found:" >> "${GITHUB_STEP_SUMMARY}"
      cat ./hdf/issues.txt >> "${GITHUB_STEP_SUMMARY}"

The result is broken code blocks:
I assume there could be a way how to maybe format the message as block of text or as code block or customize it so it does not end up looking like this

image

Copy failed when filtering without any include filters

Tried to copy using this filter:

exclude:
  - location: "Test*.java"
  - suppression:

and got the error:

Traceback (most recent call last):
  File "/Users/simonabykov/BuildSystem/sarif-tools/venv/bin/sarif", line 6, in <module>
    sys.exit(main())
  File "/Users/simonabykov/BuildSystem/sarif-tools/sarif/cmdline/main.py", line 43, in main
    exitcode = args.func(args)
  File "/Users/simonabykov/BuildSystem/sarif-tools/sarif/cmdline/main.py", line 305, in _copy
    output_sarif_file_set = copy_op.generate_sarif(
  File "/Users/simonabykov/BuildSystem/sarif-tools/sarif/operations/copy_op.py", line 65, in generate_sarif
    results = input_run.get_results()
  File "/Users/simonabykov/BuildSystem/sarif-tools/sarif/sarif_file.py", line 245, in get_results
    return self._filter.filter_results(self.run_data["results"])
  File "/Users/simonabykov/BuildSystem/sarif-tools/sarif/filter/general_filter.py", line 204, in filter_results
    self._filter_append(ret, result)
  File "/Users/simonabykov/BuildSystem/sarif-tools/sarif/filter/general_filter.py", line 125, in _filter_append
    if included_stats["state"] == "included":
TypeError: 'NoneType' object is not subscriptable

No location for <CVE>

Running this command in github action

sarif summary ./*_scan.sarif -o ./hdf/issues.txt

The tool crashes on this jfrog xray sarif file with the following error :

xray_scan.zip

Traceback (most recent call last):
File "/home/runner/.local/bin/sarif", line 8, in
sys.exit(main())
File "/home/runner/.local/lib/python3.10/site-packages/sarif/cmdline/main.py", line 40, in main
exitcode = args.func(args)
File "/home/runner/.local/lib/python3.10/site-packages/sarif/cmdline/main.py", line 399, in _summary
summary_op.generate_summary(input_files, output, multiple_file_output)
File "/home/runner/.local/lib/python3.10/site-packages/sarif/operations/summary_op.py", line 40, in generate_summary
summary_lines = _generate_summary(input_files)
File "/home/runner/.local/lib/python3.10/site-packages/sarif/operations/summary_op.py", line 62, in _generate_summary
result_count_by_severity = input_files.get_result_count_by_severity()
File "/home/runner/.local/lib/python3.10/site-packages/sarif/sarif_file.py", line 957, in get_result_count_by_severity
result_counts_by_severity.append(input_file.get_result_count_by_severity())
File "/home/runner/.local/lib/python3.10/site-packages/sarif/sarif_file.py", line 741, in get_result_count_by_severity
get_result_count_by_severity_per_run = [
File "/home/runner/.local/lib/python3.10/site-packages/sarif/sarif_file.py", line 742, in
run.get_result_count_by_severity() for run in self.runs
File "/home/runner/.local/lib/python3.10/site-packages/sarif/sarif_file.py", line 572, in get_result_count_by_severity
records = self.get_records()
File "/home/runner/.local/lib/python3.10/site-packages/sarif/sarif_file.py", line 510, in get_records
self._cached_records = [self.result_to_record(result) for result in results]
File "/home/runner/.local/lib/python3.10/site-packages/sarif/sarif_file.py", line 510, in
self._cached_records = [self.result_to_record(result) for result in results]
File "/home/runner/.local/lib/python3.10/site-packages/sarif/sarif_file.py", line 530, in result_to_record
raise ValueError(f"No location in {error_id} output from {tool_name}")
ValueError: No location in CVE-2021-43[616](https://github.com//actions/runs/6628611502/job/18006238865#step:4:623)_npm_8.1.2 output from JFrog Xray SCA
Error: Process completed with exit code 1.

Code and description are combined in the Code column in the records and CSV

The Code column contains both the error code and the description, e.g. DS126186 Disabled certificate validation, in the record dict and in the CSV output.

This is prematurely combining information that we might want to keep separate in some use cases.

Change the tools to keep these fields separate and only combine them at the output stage if desired.

Using sudo to install sarif tools

Hi!

In the readme file, you include this line:

Installing on Linux or Mac
sudo pip install sarif-tools

This is a very very dangerous security practice: you should never run pip as sudo. If the user is getting permission errors, they should create a virtual environment. I believe it should not be used, least so in an official Microsoft project, and even less in a project about tools that allow for file portability for security scans.

Errors are displayed/counted as warnings

Using the python library, loading a sarif file that contains errors, then using "get_result_count_by_severity()" will display zero erros and give the number of warnings as number of warnings + number of erros. "get_records()" will show the errors, but they are classified as warnings.

The Visual Studio Code plugin displays these correctly as errors.

Diff is not showing where the changes where

When running a diff on a large directory, the diff output does not give any hints on where to look for the new issue.

For example the below:

    error level: +0 -0 no changes
    warning level: +1 +0
      New issue "deadcode.DeadStores Value stored to 'error' is never read" (1 occurence)
    note level: +0 -0 no changes
    all levels: +1 +0

To figure out where it happened I did the following:

sarif csv -o sarif_org ~/ovs/tests/clang-analyzer-results/2023-06-27-165114-1160621-1
sarif csv -o sarif_new ~/ovs/tests/clang-analyzer-results/2023-06-27-165604-1179114-1

sort sarif_org/static_analysis_output.csv > sarif_org.txt
sort sarif_new/static_analysis_output.csv > sarif_new.txt
diff sarif_org.txt sarif_new.txt

But this is also showing line changes, etc. etc.

Fails to handle results with zero locations

Looking at the specification, in "3.27.12 locations property" I see

A result object SHOULD contain a property named locations whose value is an array of zero or more location objects (§3.28) each of which specifies a location where the result occurred.

Hence it appears to be valid to have a result with no locations.

Consider e.g.:

{
  "version": "2.1.0",
  "$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
  "runs": [
    {
      "tool": {
        "driver": {
          "name": "Foo"
        }
      },
      "results": [
        {
          "ruleId": "B6412",
          "message": {
            "text": "The command-line option '--foo' wasn't recognized."
          },
          "level": "note",
          "locations": []
        }
      ]
    }
  ]
}

i.e. a result with zero objects in its locations array.

Most of the sarif subcommands fail on the above with an error such as:

$ sarif summary /tmp/foo.sarif 
Traceback (most recent call last):
  File "/usr/local/bin/sarif", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/site-packages/sarif/cmdline/main.py", line 40, in main
    exitcode = args.func(args)
  File "/usr/local/lib/python3.8/site-packages/sarif/cmdline/main.py", line 399, in _summary
    summary_op.generate_summary(input_files, output, multiple_file_output)
  File "/usr/local/lib/python3.8/site-packages/sarif/operations/summary_op.py", line 40, in generate_summary
    summary_lines = _generate_summary(input_files)
  File "/usr/local/lib/python3.8/site-packages/sarif/operations/summary_op.py", line 62, in _generate_summary
    result_count_by_severity = input_files.get_result_count_by_severity()
  File "/usr/local/lib/python3.8/site-packages/sarif/sarif_file.py", line 957, in get_result_count_by_severity
    result_counts_by_severity.append(input_file.get_result_count_by_severity())
  File "/usr/local/lib/python3.8/site-packages/sarif/sarif_file.py", line 741, in get_result_count_by_severity
    get_result_count_by_severity_per_run = [
  File "/usr/local/lib/python3.8/site-packages/sarif/sarif_file.py", line 742, in <listcomp>
    run.get_result_count_by_severity() for run in self.runs
  File "/usr/local/lib/python3.8/site-packages/sarif/sarif_file.py", line 572, in get_result_count_by_severity
    records = self.get_records()
  File "/usr/local/lib/python3.8/site-packages/sarif/sarif_file.py", line 510, in get_records
    self._cached_records = [self.result_to_record(result) for result in results]
  File "/usr/local/lib/python3.8/site-packages/sarif/sarif_file.py", line 510, in <listcomp>
    self._cached_records = [self.result_to_record(result) for result in results]
  File "/usr/local/lib/python3.8/site-packages/sarif/sarif_file.py", line 530, in result_to_record
    raise ValueError(f"No location in {error_id} output from {tool_name}")
ValueError: No location in B6412 output from Foo

A similar thing happens on the variant where:

"locations": [{}]

i.e. a single location with no properties. My reading of the schema is that this too is valid.

Seen in the wild on SARIF output from GCC 13, which emits sometimes diagnostics with no location (e.g. for cases such as unrecognized command line argument when invoking the tool, but also sometimes on real diagnostics when we've got a bug in our location-tracking).

How to remove specific entries

I tried to copy while applying blame-filter, but it didn't work as expected.

I want to ask if it is possible to provide a specific entry with filename and lines to be filtered out.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.