cloudsecurityalliance / gsd-tools Goto Github PK

Global Security Database Tools

License: Apache License 2.0

Python 48.40% Shell 1.72% Dockerfile 0.96% JavaScript 13.02% CSS 0.04% Handlebars 4.93% Vue 24.47% Sass 0.32% HTML 0.80% Ruby 5.32%

gsd-tools's People

Contributors

Stargazers

Watchers

Forkers

joshbressers pombredanne ekmixon joshbuker westonsteimel kurtseifried kogelvis oswalpalash raphaelahrens larswirzenius hartl3y94 tdunlap607 disenable ninjapanzer garyanewsome lmoyle1989 jasnow mrmoshkovitz

gsd-tools's Issues

CPE components -> ecosystem/package name mapping

As much as I'm sure we'd all like to see CPE's go away, it seems unlikely at this point and I think it'd be useful to help with mapping existing CPE-based data to something more useful.

Should we have some sort of CSA maintained mapping of CPE components -> ecosystem/packagename (here or as a separate repo)?

As an example I started https://github.com/westonsteimel/package-metadata to help with some of my personal stuff where I was trying to map CPE's to Package URL, but I think it'd be great to have this kind of information somewhere that is more likely to actually be useful to others.

I'm fairly certain I saw a list of CPE-> Debian package name mappings in the Debian repos somewhere as well, and I'm sure there are many other sources. I think it'd be great to have everything gathered in a central place

Can search for GSD from web interface

Expand CodeQL workflows

We need to review / document the current CodeQL workflows.

Provide some space in the documentation to provide scoping for our current goals in regard to CodeQL.

Create a contribution guide for CodeQL.

Work through data format example for GSD-2020-7471

I know @kurtseifried has documented some great stuff about potential data formats here, but I think it would be quite helpful to work through an actual example record, and I suggest having a try with GSD-2020-7471.

I like this one because it presents several common challenges that I would like us as a community to work on addressing. For instance, how do we want to handle package naming and versioning differences across various package managers? Here, the vulnerability is for Django, and for PyPI specifically we have PYSEC-2020-35 in the OSV format; however, what about for the Debian package where the name is python-django and there are fixes backported to earlier versions than PyPI?

If we end up using something like the OSV format for the primary GSD namespace, is this one OSV record with multiple affected entries with various ecosystem, package name, and version entries, or is it something like an entire OSV record for each ecosystem as a separate namespace (so each OS or packaging ecosystem could potentially have its own custom description, etc)?

Or do we want a separate GSD id for each one and some parent record that unifies them?

In this case I'd expect to see something for at least the following:

PyPI
Anaconda (typically same name and affected version ranges as PyPI)
Debian
Ubuntu
Alpine
Gentoo
Probably a bunch more Linux distros?
source control repository with commit-level info

I'll try to throw in some example json of a few possible approaches when I have some more time, but please start sharing any ideas you all have!

Document how to approve GSD requests and how to add themselves to the approver list

Anyone who has ever submitted a request or correction can be on the approver list

We also need to document the flow of the request and approve process

JSON formatting - camelCase?

Should we officially use camelCase? Google guidelines? https://google.github.io/styleguide/jsoncstyleguide.xml

Whatever we do, consistency is key.

Mirror GitHub advisory data (GHSA) into a github.com namespace

Mirror GitHub advisories to a github.com namespace similar to what we are doing for NVD data. Not every GitHub advisory corresponds to a CVE, but if it does it should be easy enough to map it to the correct GSD entry using the alias field. If there is no CVE for it maybe create a GSD candidate for it? Or just leave that part as a later enhancement

Add list of involved people and organizations to the GSD front page

We need to be very transparent about who is involved with GSD. We should also explain the structure of data and who can use the data. We should try to be as transparent as possible about the project.

Allow anyone to enrich existing vulnerabilities

Right now we have a web form to request one GSD ID
https://requests.globalsecuritydatabase.org/

This only allows one to request a new ID.

It is very common for a CVE ID to be lacking useful details. See this thread for an example
https://twitter.com/joshbressers/status/1492870755167748104

It would be valuable to have a way that isn't a PR to add such details to GSD.

I envision a workflow looking something like

Load page for existing GSD
Enrich data
Submit data as issue
Someone reviews the added data
Add data to DB

We could skip the review step if we want to namespace the edits

GSD data format

We need to do some work around the data format we want to use. If you search the discussion archive
https://groups.google.com/a/groups.cloudsecurityalliance.org/g/gsd

There is a preference for using JSON-LD. This give a number of benefits such as being able to link data together from disparate locations and add some context to what the various fields are for parsing.

I have the beginnings of a prototype for this here
https://github.com/joshbressers/uvi-tools/tree/json-ld/json-ld

It's a VERY early attempt, this should be considered experimental

As a User when GSD Web cannot find an exact match in search, show a no results found page

When I search for something that isn't an exact match I am redirected to the gsd-database repo search page.

We should probably have an intermediate page which shows that no results are found and links to the GH search while giving some examples of what a search should look for so we can clarify what fields would be indexed by a search.

This might want to expand to an advanced search that could selectively dig deeper into the fields or focus on some of our data delineation based on time range or the submission year.

Enhance "securitylist tool" with error checking and documentation

The "securitylist tool" is how we mirror the NVD and CVE data. It currently has little to no error checking and was the result of MVP work.

This tool could use some ❤️ and possibly a small refactor. The goal of this work would be to make this script more reliable and better documented.

The expectation here is to incrementally de-risk this code by:

Adding error detection and mitigation
Enhance Documentation of usage and triage
Provide test coverage of behaviors to protect future development.

Github actions and Automation for scripts and manual processing

Introduce automation via GH-Actions to automation scripts that are currently being run by hand.

Can update GSD from web interface

build error in jekyll website

@enck I did the default jekyll handler in cloudflare pages,and it spat this out:

2023-02-01T23:26:40.496783Z Cloning repository...
2023-02-01T23:26:43.770209Z From https://github.com/cloudsecurityalliance/gsd-tools
2023-02-01T23:26:43.770881Z * branch bfb30d0 -> FETCH_HEAD
2023-02-01T23:26:43.771099Z
2023-02-01T23:26:44.25026Z HEAD is now at bfb30d0 Merge pull request #93 from oswalpalash/merge-project-tools
2023-02-01T23:26:44.251036Z
2023-02-01T23:26:44.39709Z
2023-02-01T23:26:44.427637Z Success: Finished cloning repository files
2023-02-01T23:26:45.175018Z Installing dependencies
2023-02-01T23:26:45.188524Z Python version set to 2.7
2023-02-01T23:26:48.847221Z v12.18.0 is already installed.
2023-02-01T23:26:50.100308Z Now using node v12.18.0 (npm v6.14.4)
2023-02-01T23:26:50.366967Z Started restoring cached build plugins
2023-02-01T23:26:50.382403Z Finished restoring cached build plugins
2023-02-01T23:26:50.949994Z Attempting ruby version 2.7.1, read from environment
2023-02-01T23:26:54.679316Z Using ruby version 2.7.1
2023-02-01T23:26:55.04601Z Using PHP version 5.6
2023-02-01T23:26:55.046932Z Started restoring cached ruby gems
2023-02-01T23:26:55.064186Z Finished restoring cached ruby gems
2023-02-01T23:26:55.065453Z Installing gem bundle
2023-02-01T23:26:55.344195Z [DEPRECATED] The --path flag is deprecated because it relies on being remembered across bundler invocations, which bundler will no longer do in future versions. Instead please use bundle config set path '/opt/buildhome/cache/bundle', and stop using this flag
2023-02-01T23:26:55.493793Z [DEPRECATED] The --binstubs option will be removed in favor of bundle binstubs
2023-02-01T23:26:55.626898Z The dependency tzinfo (>= 1, < 3) will be unused by any of the platforms Bundler is installing for. Bundler is installing for ruby but the dependency is only for x86-mingw32, x64-mingw32, x86-mswin32, java. To add those platforms to the bundle, run bundle lock --add-platform x86-mingw32 x64-mingw32 x86-mswin32 java.
2023-02-01T23:26:55.627211Z The dependency tzinfo-data (>= 0) will be unused by any of the platforms Bundler is installing for. Bundler is installing for ruby but the dependency is only for x86-mingw32, x64-mingw32, x86-mswin32, java. To add those platforms to the bundle, run bundle lock --add-platform x86-mingw32 x64-mingw32 x86-mswin32 java.
2023-02-01T23:26:55.627363Z The dependency wdm (> 0.1.1) will be unused by any of the platforms Bundler is installing for. Bundler is installing for ruby but the dependency is only for x86-mingw32, x64-mingw32, x86-mswin32. To add those platforms to the bundle, run bundle lock --add-platform x86-mingw32 x64-mingw32 x86-mswin32.
2023-02-01T23:26:55.627494Z The dependency http_parser.rb (> 0.6.0) will be unused by any of the platforms Bundler is installing for. Bundler is installing for ruby but the dependency is only for java. To add those platforms to the bundle, run bundle lock --add-platform java.
2023-02-01T23:26:57.898377Z Fetching gem metadata from https://rubygems.org/............
2023-02-01T23:26:58.015672Z Fetching gem metadata from https://rubygems.org/.
2023-02-01T23:26:58.083109Z Resolving dependencies...
2023-02-01T23:26:58.121553Z sass-embedded-1.57.1-x86_64-linux-gnu requires rubygems version >= 3.3.22, which
2023-02-01T23:26:58.121847Z is incompatible with the current version, 3.1.2
2023-02-01T23:26:58.157761Z Error during gem install
2023-02-01T23:26:58.164279Z Failed: build command exited with code: 1
2023-02-01T23:26:59.048184Z Failed: an internal error occurred

Clearly define use cases and examples

It's not entirely clear who the GSD data is for. We need to define personas and use cases with examples of each.

This issue should be broken apart into many smaller issues

Possible issues (this list is not complete or accurate)

define 3 expected use cases
Define one or more persona for each use case
Create examples for each use case and persona combination

Expand "gsd-bot" to produce OSV format for more than kernel entries

Currently, "gsd-bot" only produces OSV data for kernel entries. It should produce OSV format data for all sources.

This will require some updates to the create gsd webform.

sandbox.gsd creation

Create a sandbox for people to play with and demo with GSD both for the purposes of improving the product, but also to allow interactive potential feedback
MUST use test data
MUST provide clear instructions for use.

Copy and paste error

The LICENSE file has no year and owner

https://github.com/cloudsecurityalliance/gsd-tools/blob/f8d2a1600c5ce3f14dea0870b7730610d42c3a5b/LICENSE#L189

Update CodeQL to v2 (GitHub Actions)

Currently the Javascript GitHub Action fails due to CodeQL v1 being deprecated. See latest build: https://github.com/cloudsecurityalliance/gsd-tools/actions

Define what's in/out of scope vs. filtered

The GSD doesn't define "vulnerability", so it may be unclear to others what is in and out of scope.

I believe we want to have a superset, and let the user filter based upon what they find useful. The ability to parse -- thus filter -- is a killer feature IMO.

List of interesting security vendor pages and data sources

List of interesting security vendor pages and data sources that have good quality data and partial CVE coverage:

https://www.debian.org/security/
https://www.twcert.org.tw/en/lp-139-2.html
https://github.com/SummitRoute/csp_security_mistakes
https://docs.r3.com/en/platform/corda/4.8/open-source/release-notes.html (and other versions)
https://huntr.dev/bounties/hacktivity
https://www.malvuln.com/
https://github.com/RhinoSecurityLabs/CVEs
https://blog.sonarsource.com/tag/security

a wide variety of data, data types and so on from generally reliable sources.

Add CISA known exploited vulnerabilities info to cisa.gov namespace

Add the information from the CISA known exploited vulnerabilities catalog to the relevant GSD entries under a cisa.gov namespace. The JSON version is at https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json

Update kernel script commit generation

Reminder for @joshbressers

Squash commits into one descriptive commit.

Include helpful info in the commit like:

These are kernel requests from X to Y time

This was automatically generated by the script
ID Range: GSD-2022-1000xxx - GSD-2022-1000yyy

Security monitoring things to investigate

TODO: investigate GitHub actions for finding vulns being published?
Automatic monitor github cve using Github Actions
https://github.com/p1ay8y3ar/cve_monitor

CVEMON - Monitoring exploits & references for CVEs
https://github.com/ARPSyndicate/cvemon

PoC in GitHub
https://github.com/nomi-sec/PoC-in-GitHub

Collecting vulnerabilities for 2022
https://github.com/binganao/vulns-2022

Bug writeups for OpenGitLab
https://github.com/OpenGitLab/Bug-Storage

Packetstorm
https://packetstormsecurity.com/

gsd-analysis

Hi All,

I'm sharing the repository (gsd-analysis) I brought up Monday in the GSD meeting for analysis of the gsd-database. I made a few updates to the documentation for reproducibility.

I think the README from the gsd-analysis repo starts to handle some of the "Data" to-dos for the GSD project, mainly for documentation purposes.

Example, the current overall data structure of the gsd-database:

{
    "GSD": {"type":  "object"},
    "OSV": {"type":  "object"},
    "namespaces": {
        "properties": {
            "cisa.gov": {"type":  "object"},
            "cve.org": {"type":  "object"},
            "gitlab.com": {"type":  "object"},
            "nvd.nist.gov": {"type":  "object"},
            "github.com/kurtseifried:582211": {"type":  "object"}
        }
    },
    "overlay": {
        "properties": {
            "cve.org": {"type":  "object"}
        }
    }
}

GSD Object Schema: ./data/schemas/schema_gsd_object.json
OSV Schema: ./data/schemas/schema_osv.json

Schemas would obviously need to be agreed/approved/validated before we could make any helper functions to validate incoming data into the gsd-database, but the gsd-analysis.py helps produce the initial bases for what's in the database.

I'm currently unsure of the future home of the gsd-analysis; maybe a new tool within gsd-tools? Could it be the start of a dashboard? Also, while not the initial intention, it looks for inconsistencies/outliers within the gsd-database that could help with data cleansing.

Let me know what you all think and the direction to potentially integrate the analysis into the project.

Thanks!

API endpoint for parsed json data of GSD

Rewrite gsd-bot "special kernel mode"

Related to #120

The gsd-bot has a "special kernel mode" that is hard to understand and if left alone will become an impediment to future development.

The scripts need to be documented, provided some care and test coverage, and eventually rewritten.

We will likely need some help from Josh and Oliver to understand the nuance of this scripts so let the learning begin.

Support editing GSDs from web interface

Related to #18.

GSD Web allows users signed into Github to edit the JSON, description, and references for an existing GSD ID.

There are other details that may need to updates or correction over the life of a GSD:

...
"vendor_name": "Uber",
"product_name": "email for uber.com",
"product_version": "current as of 2022-01-02",
"vulnerability_type": "unspecified",
"affected_component": "unspecified",
"attack_vector": "unspecified",
"impact": "email spoofing for uber.com",
"credit": "",
...

Each field in the GSD should editable as an individual field. We should be able to validate the core schema against a JSON Schema. namespaces should allow for fields outside the schema.

Timestamps and Dates related to published and modified should not persist.

The resulting change should be submitted to Github as a Pull Request.

Below are the current show/edit views available so we can use them as a reference to improve

Show GSD

Edit GSD

Edit GSD JSON

Ensure all repos have a contributor.md file

All the repos should have a CONTRIBUTOR.md

https://github.com/cloudsecurityalliance/gsd-database/blob/draft-docs/CONTRIBUTOR.md

I'm assigning this to @athix who will then mail the group about these files

Mirror GitLab Community Advisories to gitlab.com namespace

Mirror GitLab Community Advisories to a gitlab.com namespace

gsd-bot and requests page - URL display and dangling empty ones

Problem 1: The URL fields for references are fixed length and to short to see most of any URL.

Problem 2: if you leave a blank URL(s) at the end the json has blank URLs that get populated to the entry.

Enhance "gsd-bot" with error checking and documentation

The "gsd-bot" lives in the gsd directory its purpose is to look at issues in Github and create entries in the GSD database. It currently has little to no error checking and was the result of MVP work.

This tool could use some ❤️ and possibly a small refactor. The goal of this work would be to make this script more reliable and better documented.

The expectation here is to incrementally de-risk this code by:

Adding error detection and mitigation
Enhance Documentation of usage and triage
Provide test coverage of behaviors to protect future development.

Only pulling in updates to GSD from CVE

https://github.com/cloudsecurityalliance/gsd-tools/blob/281714814ab1442108cee5c2577722fcf64369cd/securitylist/src/update_repo.py#L31

Since the CVE data is inside a git repo, it should be possible with something like this

pre_pull_commit=$(git rev-parse HEAD)
git pull
git diff --name-only "$pre_pull_commit" HEAD

It is possible to use GitPython for this, but this is quite a large dependency and internally it uses the git executable.

For this the fetching of the CVEs should be moved from securitylist/src/update.sh into a separate fetch function.
It might in general be a good idea to have the fetching of the external sources (CVE, NVD, CISA, GitLab) in seperate modules and let these manage their own repos and caches.

Document format of files

Document format of files - waiting on email thread to confirm changes to file format. https://groups.google.com/a/groups.cloudsecurityalliance.org/g/gsd/c/DgSaS-lAsP4

How should we suggest updating data in a readonly namespace?

I would like to start proposing improvements to the NVD data from the nvd.nist.gov namespace on GSD entries in the hope that someday we can find a way to provide those updates back to the source. The nvd.nist.gov namespace is readonly and populated by the GSD bot, so editing it directly is not supported. I know at some point @kurtseifried had provided a correction to a CPE and I think that was done by copying the affected property into a namespace nvd.nist.gov under the GSD namespace, but I'd like to understand if that is really how we want this to work moving forward.

I do have an example of a small proposed GSD entry change at CloudSecurityAlliance/gsd-database#2400

easy way to validate json and print in same format we use

python3 -m json.tool defaults to indent=4 and there isn't any way to set it to indent=2 (there is in newer versions perhaps but I don't have an up to date version apparently).

#!/usr/bin/env python3

import sys
import json
from tempfile import mkstemp
import os
import shutil

filename = sys.argv[1]

with open(filename, 'r') as filehandle:
      try:
            json_data = json.load(filehandle)
      except:
            print("Your JSON is broken, reloading to spit the error out\n\n")
            data = json.load(filehandle)
            quit()
            
      json_output = json.dumps(json_data, indent=2)
      json_output = json_output.rstrip("\n")
      file_descriptor, path = mkstemp()
      with open(path, 'w') as f:
            f.write(json_output)
      os.close(file_descriptor)
      shutil.move(path, filename)

I don't see any problems here,

How to incorporate exploitation information into GSD

Hi,

we are working on an open source project to make exploitation information easier to collect and access: https://github.com/gmatuz/inthewilddb

It is mostly driven by our own need to help prioritisation in vulnerability and patch management, using what is the simplest (it is clear and binary) and most salient information about a vulnerability in this aspect.

I'm wondering if you think this is useful and we could find a way to collaborate

Remove the old GSD namespace and put the OSV data in there

In our old data, we have a GSD and OSV namespace. The GSD namespace is old and not a standard. The OSV namespace in most instances will be a mirror of the GSD data. We should get rid of the old GSD space.

I would prefer we do not use a namespace called OSV as I think it's not clear that it has nothing to do with osv.dev

queue for urls to be investigated for GSD assignment

So I often run across interesting vulnerabilities, e.g.:

https://twitter.com/balancerlabs/status/1525277944674930689?s=11&t=3gk-4MzTiAsNvZmog9ttOA

should we perhaps have some sort of of queue (text file of urls? csv with some basic info fields?) to track these so if someone wants to investigate and assign GSD's there's a list of interesting items to start with?

Add last commit to the web page as version string

Add last commit to the web page as version string so once an update is made and the site rebuilds/reloads we know if it actually took.

Additional files to consider

CONTRIBUTOR_LADDER.md
GOVERNANCE.md
SECURITY_CONTACTS.md
ISSUE_TEMPLATE.md
embargo-policy.md
embargo.md
incident-response.md
REVIEWING.md

Create Page on globalsecuritydatabase.org under a "Tutorial" navigation that provides a HOWTO find and file a GSD Issue

Basic "HOWTO find and file a GSD issue" with links to other resources.

The How To should include an explanation and resources for how to:

How To identify a vulnerability, try to answer the question posed here -> https://twitter.com/pry0cc/status/1535708325874106370 using the resources provided by responders.
The best practices for contacting the vendor/project/upstream and reporting the vulnerability.
How To request a GSD ID
How to publishing/using the GSD

Create GSD contribution guide

user stories/cases:

researcher/GSD user
get 1 GSD ID
get multiple GSD IDs
get multiple GSD IDs over time
update of 1, many, many over time GSD IDs
how to mark as duplicate of another
how to challenge/delete a GSD

dev
how to get involved in GSD automation/tooling/consumption tools/schema validators/etc.

Improve GSD Project description, who should be interested, and how it adds to the landscape to the GSD Website

This text is from David Brumley in the GSD call. Most of this should be added to the GSD site.

Difference between what our website says and what we say on calls (David Brumley):
The vulnerability landscape is evolving, with many sources of new information, often in their own proprietary format.

The GSD solves three problems in the current vulnerability management landscape:

Ensure parsability for machine automation
Aggregate vulnerability information from authorities
Provide a feed of issues that impact security but may not be in an authority stream, such as results from fuzzing campaigns, malicious software masquerading as good software, and similar.

GSD allows you to make sense of vulnerabilities across vulnerability authorities, security findings, and other threat feeds. Think of it as a machine-parsable modern version of the 1990's bugtraq mailing list.

How to use this: (these are our personas)
If you have sources of new vulnerability information:
If you wish to report a security finding:
If you want to consume the feed for your own projects and products:

Add support for creating GSD entries from data.gsd.id

Guided process / web form for creation of new GSDs

Policy document

We need a document that begins to explain policy and expectations.

We should be able to answer questions like

What should get an ID
How are disagreements handled
What happens when there are duplicate IDs

There will be many more such questions and decisions. This repo may not be the best place to track this, but it's currently the most obviously place to have a discussion at the moment

I get the above error. Guessing this is due to an issue with how I wrote the parsing for GSD IDs that are not conformant to the schema yet.