GithubHelp home page GithubHelp logo

cloudsecurityalliance / gsd-tools Goto Github PK

View Code? Open in Web Editor NEW
40.0 40.0 20.0 8.54 MB

Global Security Database Tools

Home Page: https://gsd.id

License: Apache License 2.0

Python 48.40% Shell 1.72% Dockerfile 0.96% JavaScript 13.02% CSS 0.04% Handlebars 4.93% Vue 24.47% Sass 0.32% HTML 0.80% Ruby 5.32%

gsd-tools's People

Contributors

dependabot[bot] avatar enck avatar jasnow avatar joshbressers avatar joshbuker avatar kurtseifried avatar liwfi avatar ninjapanzer avatar oswalpalash avatar raphaelahrens avatar tdunlap607 avatar westonsteimel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gsd-tools's Issues

CPE components -> ecosystem/package name mapping

As much as I'm sure we'd all like to see CPE's go away, it seems unlikely at this point and I think it'd be useful to help with mapping existing CPE-based data to something more useful.

Should we have some sort of CSA maintained mapping of CPE components -> ecosystem/packagename (here or as a separate repo)?

As an example I started https://github.com/westonsteimel/package-metadata to help with some of my personal stuff where I was trying to map CPE's to Package URL, but I think it'd be great to have this kind of information somewhere that is more likely to actually be useful to others.

I'm fairly certain I saw a list of CPE-> Debian package name mappings in the Debian repos somewhere as well, and I'm sure there are many other sources. I think it'd be great to have everything gathered in a central place

Expand CodeQL workflows

We need to review / document the current CodeQL workflows.

Provide some space in the documentation to provide scoping for our current goals in regard to CodeQL.

Create a contribution guide for CodeQL.

Work through data format example for GSD-2020-7471

I know @kurtseifried has documented some great stuff about potential data formats here, but I think it would be quite helpful to work through an actual example record, and I suggest having a try with GSD-2020-7471.

I like this one because it presents several common challenges that I would like us as a community to work on addressing. For instance, how do we want to handle package naming and versioning differences across various package managers? Here, the vulnerability is for Django, and for PyPI specifically we have PYSEC-2020-35 in the OSV format; however, what about for the Debian package where the name is python-django and there are fixes backported to earlier versions than PyPI?

If we end up using something like the OSV format for the primary GSD namespace, is this one OSV record with multiple affected entries with various ecosystem, package name, and version entries, or is it something like an entire OSV record for each ecosystem as a separate namespace (so each OS or packaging ecosystem could potentially have its own custom description, etc)?

Or do we want a separate GSD id for each one and some parent record that unifies them?

In this case I'd expect to see something for at least the following:

  • PyPI
  • Anaconda (typically same name and affected version ranges as PyPI)
  • Debian
  • Ubuntu
  • Alpine
  • Gentoo
  • Probably a bunch more Linux distros?
  • source control repository with commit-level info

I'll try to throw in some example json of a few possible approaches when I have some more time, but please start sharing any ideas you all have!

Mirror GitHub advisory data (GHSA) into a github.com namespace

Mirror GitHub advisories to a github.com namespace similar to what we are doing for NVD data. Not every GitHub advisory corresponds to a CVE, but if it does it should be easy enough to map it to the correct GSD entry using the alias field. If there is no CVE for it maybe create a GSD candidate for it? Or just leave that part as a later enhancement

Allow anyone to enrich existing vulnerabilities

Right now we have a web form to request one GSD ID
https://requests.globalsecuritydatabase.org/

This only allows one to request a new ID.

It is very common for a CVE ID to be lacking useful details. See this thread for an example
https://twitter.com/joshbressers/status/1492870755167748104

It would be valuable to have a way that isn't a PR to add such details to GSD.

I envision a workflow looking something like

  • Load page for existing GSD
  • Enrich data
  • Submit data as issue
  • Someone reviews the added data
  • Add data to DB

We could skip the review step if we want to namespace the edits

GSD data format

We need to do some work around the data format we want to use. If you search the discussion archive
https://groups.google.com/a/groups.cloudsecurityalliance.org/g/gsd

There is a preference for using JSON-LD. This give a number of benefits such as being able to link data together from disparate locations and add some context to what the various fields are for parsing.

I have the beginnings of a prototype for this here
https://github.com/joshbressers/uvi-tools/tree/json-ld/json-ld

It's a VERY early attempt, this should be considered experimental

As a User when GSD Web cannot find an exact match in search, show a no results found page

When I search for something that isn't an exact match I am redirected to the gsd-database repo search page.

We should probably have an intermediate page which shows that no results are found and links to the GH search while giving some examples of what a search should look for so we can clarify what fields would be indexed by a search.

This might want to expand to an advanced search that could selectively dig deeper into the fields or focus on some of our data delineation based on time range or the submission year.

Enhance "securitylist tool" with error checking and documentation

The "securitylist tool" is how we mirror the NVD and CVE data. It currently has little to no error checking and was the result of MVP work.

This tool could use some ❤️ and possibly a small refactor. The goal of this work would be to make this script more reliable and better documented.

The expectation here is to incrementally de-risk this code by:

  • Adding error detection and mitigation
  • Enhance Documentation of usage and triage
  • Provide test coverage of behaviors to protect future development.

build error in jekyll website

@enck I did the default jekyll handler in cloudflare pages,and it spat this out:

2023-02-01T23:26:40.496783Z Cloning repository...
2023-02-01T23:26:43.770209Z From https://github.com/cloudsecurityalliance/gsd-tools
2023-02-01T23:26:43.770881Z * branch bfb30d0 -> FETCH_HEAD
2023-02-01T23:26:43.771099Z
2023-02-01T23:26:44.25026Z HEAD is now at bfb30d0 Merge pull request #93 from oswalpalash/merge-project-tools
2023-02-01T23:26:44.251036Z
2023-02-01T23:26:44.39709Z
2023-02-01T23:26:44.427637Z Success: Finished cloning repository files
2023-02-01T23:26:45.175018Z Installing dependencies
2023-02-01T23:26:45.188524Z Python version set to 2.7
2023-02-01T23:26:48.847221Z v12.18.0 is already installed.
2023-02-01T23:26:50.100308Z Now using node v12.18.0 (npm v6.14.4)
2023-02-01T23:26:50.366967Z Started restoring cached build plugins
2023-02-01T23:26:50.382403Z Finished restoring cached build plugins
2023-02-01T23:26:50.949994Z Attempting ruby version 2.7.1, read from environment
2023-02-01T23:26:54.679316Z Using ruby version 2.7.1
2023-02-01T23:26:55.04601Z Using PHP version 5.6
2023-02-01T23:26:55.046932Z Started restoring cached ruby gems
2023-02-01T23:26:55.064186Z Finished restoring cached ruby gems
2023-02-01T23:26:55.065453Z Installing gem bundle
2023-02-01T23:26:55.344195Z [DEPRECATED] The --path flag is deprecated because it relies on being remembered across bundler invocations, which bundler will no longer do in future versions. Instead please use bundle config set path '/opt/buildhome/cache/bundle', and stop using this flag
2023-02-01T23:26:55.493793Z [DEPRECATED] The --binstubs option will be removed in favor of bundle binstubs
2023-02-01T23:26:55.626898Z The dependency tzinfo (>= 1, < 3) will be unused by any of the platforms Bundler is installing for. Bundler is installing for ruby but the dependency is only for x86-mingw32, x64-mingw32, x86-mswin32, java. To add those platforms to the bundle, run bundle lock --add-platform x86-mingw32 x64-mingw32 x86-mswin32 java.
2023-02-01T23:26:55.627211Z The dependency tzinfo-data (>= 0) will be unused by any of the platforms Bundler is installing for. Bundler is installing for ruby but the dependency is only for x86-mingw32, x64-mingw32, x86-mswin32, java. To add those platforms to the bundle, run bundle lock --add-platform x86-mingw32 x64-mingw32 x86-mswin32 java.
2023-02-01T23:26:55.627363Z The dependency wdm (> 0.1.1) will be unused by any of the platforms Bundler is installing for. Bundler is installing for ruby but the dependency is only for x86-mingw32, x64-mingw32, x86-mswin32. To add those platforms to the bundle, run bundle lock --add-platform x86-mingw32 x64-mingw32 x86-mswin32.
2023-02-01T23:26:55.627494Z The dependency http_parser.rb (
> 0.6.0) will be unused by any of the platforms Bundler is installing for. Bundler is installing for ruby but the dependency is only for java. To add those platforms to the bundle, run bundle lock --add-platform java.
2023-02-01T23:26:57.898377Z Fetching gem metadata from https://rubygems.org/............
2023-02-01T23:26:58.015672Z Fetching gem metadata from https://rubygems.org/.
2023-02-01T23:26:58.083109Z Resolving dependencies...
2023-02-01T23:26:58.121553Z sass-embedded-1.57.1-x86_64-linux-gnu requires rubygems version >= 3.3.22, which
2023-02-01T23:26:58.121847Z is incompatible with the current version, 3.1.2
2023-02-01T23:26:58.157761Z Error during gem install
2023-02-01T23:26:58.164279Z Failed: build command exited with code: 1
2023-02-01T23:26:59.048184Z Failed: an internal error occurred

Clearly define use cases and examples

It's not entirely clear who the GSD data is for. We need to define personas and use cases with examples of each.

This issue should be broken apart into many smaller issues

Possible issues (this list is not complete or accurate)

  1. define 3 expected use cases
  2. Define one or more persona for each use case
  3. Create examples for each use case and persona combination

sandbox.gsd creation

Create a sandbox for people to play with and demo with GSD both for the purposes of improving the product, but also to allow interactive potential feedback
MUST use test data
MUST provide clear instructions for use.

Define what's in/out of scope vs. filtered

The GSD doesn't define "vulnerability", so it may be unclear to others what is in and out of scope.

I believe we want to have a superset, and let the user filter based upon what they find useful. The ability to parse -- thus filter -- is a killer feature IMO.

List of interesting security vendor pages and data sources

Update kernel script commit generation

Reminder for @joshbressers

Squash commits into one descriptive commit.

Include helpful info in the commit like:

These are kernel requests from X to Y time

This was automatically generated by the script
ID Range: GSD-2022-1000xxx - GSD-2022-1000yyy

Security monitoring things to investigate

TODO: investigate GitHub actions for finding vulns being published?
Automatic monitor github cve using Github Actions
https://github.com/p1ay8y3ar/cve_monitor

CVEMON - Monitoring exploits & references for CVEs
https://github.com/ARPSyndicate/cvemon

PoC in GitHub
https://github.com/nomi-sec/PoC-in-GitHub

Collecting vulnerabilities for 2022
https://github.com/binganao/vulns-2022

Bug writeups for OpenGitLab
https://github.com/OpenGitLab/Bug-Storage

Packetstorm
https://packetstormsecurity.com/

gsd-analysis

Hi All,

I'm sharing the repository (gsd-analysis) I brought up Monday in the GSD meeting for analysis of the gsd-database. I made a few updates to the documentation for reproducibility.

I think the README from the gsd-analysis repo starts to handle some of the "Data" to-dos for the GSD project, mainly for documentation purposes.

Example, the current overall data structure of the gsd-database:

{
    "GSD": {"type":  "object"},
    "OSV": {"type":  "object"},
    "namespaces": {
        "properties": {
            "cisa.gov": {"type":  "object"},
            "cve.org": {"type":  "object"},
            "gitlab.com": {"type":  "object"},
            "nvd.nist.gov": {"type":  "object"},
            "github.com/kurtseifried:582211": {"type":  "object"}
        }
    },
    "overlay": {
        "properties": {
            "cve.org": {"type":  "object"}
        }
    }
}

GSD Object Schema: ./data/schemas/schema_gsd_object.json
OSV Schema: ./data/schemas/schema_osv.json

Schemas would obviously need to be agreed/approved/validated before we could make any helper functions to validate incoming data into the gsd-database, but the gsd-analysis.py helps produce the initial bases for what's in the database.

I'm currently unsure of the future home of the gsd-analysis; maybe a new tool within gsd-tools? Could it be the start of a dashboard? Also, while not the initial intention, it looks for inconsistencies/outliers within the gsd-database that could help with data cleansing.

Let me know what you all think and the direction to potentially integrate the analysis into the project.

Thanks!

Rewrite gsd-bot "special kernel mode"

Related to #120

The gsd-bot has a "special kernel mode" that is hard to understand and if left alone will become an impediment to future development.

The scripts need to be documented, provided some care and test coverage, and eventually rewritten.

We will likely need some help from Josh and Oliver to understand the nuance of this scripts so let the learning begin.

Support editing GSDs from web interface

Related to #18.

GSD Web allows users signed into Github to edit the JSON, description, and references for an existing GSD ID.

There are other details that may need to updates or correction over the life of a GSD:

...
"vendor_name": "Uber",
"product_name": "email for uber.com",
"product_version": "current as of 2022-01-02",
"vulnerability_type": "unspecified",
"affected_component": "unspecified",
"attack_vector": "unspecified",
"impact": "email spoofing for uber.com",
"credit": "",
...

Each field in the GSD should editable as an individual field. We should be able to validate the core schema against a JSON Schema. namespaces should allow for fields outside the schema.

Timestamps and Dates related to published and modified should not persist.

The resulting change should be submitted to Github as a Pull Request.

Below are the current show/edit views available so we can use them as a reference to improve

Show GSD

gsd_show_page

Edit GSD

gsd_edit_modal

Edit GSD JSON

gsd_edit_modal_json_edit

Enhance "gsd-bot" with error checking and documentation

The "gsd-bot" lives in the gsd directory its purpose is to look at issues in Github and create entries in the GSD database. It currently has little to no error checking and was the result of MVP work.

This tool could use some ❤️ and possibly a small refactor. The goal of this work would be to make this script more reliable and better documented.

The expectation here is to incrementally de-risk this code by:

  • Adding error detection and mitigation
  • Enhance Documentation of usage and triage
  • Provide test coverage of behaviors to protect future development.

Only pulling in updates to GSD from CVE

https://github.com/cloudsecurityalliance/gsd-tools/blob/281714814ab1442108cee5c2577722fcf64369cd/securitylist/src/update_repo.py#L31

Since the CVE data is inside a git repo, it should be possible with something like this

pre_pull_commit=$(git rev-parse HEAD)
git pull
git diff --name-only "$pre_pull_commit" HEAD

It is possible to use GitPython for this, but this is quite a large dependency and internally it uses the git executable.

For this the fetching of the CVEs should be moved from securitylist/src/update.sh into a separate fetch function.
It might in general be a good idea to have the fetching of the external sources (CVE, NVD, CISA, GitLab) in seperate modules and let these manage their own repos and caches.

How should we suggest updating data in a readonly namespace?

I would like to start proposing improvements to the NVD data from the nvd.nist.gov namespace on GSD entries in the hope that someday we can find a way to provide those updates back to the source. The nvd.nist.gov namespace is readonly and populated by the GSD bot, so editing it directly is not supported. I know at some point @kurtseifried had provided a correction to a CPE and I think that was done by copying the affected property into a namespace nvd.nist.gov under the GSD namespace, but I'd like to understand if that is really how we want this to work moving forward.

I do have an example of a small proposed GSD entry change at CloudSecurityAlliance/gsd-database#2400

easy way to validate json and print in same format we use

python3 -m json.tool defaults to indent=4 and there isn't any way to set it to indent=2 (there is in newer versions perhaps but I don't have an up to date version apparently).

#!/usr/bin/env python3

import sys
import json
from tempfile import mkstemp
import os
import shutil

filename = sys.argv[1]

with open(filename, 'r') as filehandle:
      try:
            json_data = json.load(filehandle)
      except:
            print("Your JSON is broken, reloading to spit the error out\n\n")
            data = json.load(filehandle)
            quit()
            
      json_output = json.dumps(json_data, indent=2)
      json_output = json_output.rstrip("\n")
      file_descriptor, path = mkstemp()
      with open(path, 'w') as f:
            f.write(json_output)
      os.close(file_descriptor)
      shutil.move(path, filename)

I don't see any problems here,

How to incorporate exploitation information into GSD

Hi,

we are working on an open source project to make exploitation information easier to collect and access: https://github.com/gmatuz/inthewilddb

It is mostly driven by our own need to help prioritisation in vulnerability and patch management, using what is the simplest (it is clear and binary) and most salient information about a vulnerability in this aspect.

I'm wondering if you think this is useful and we could find a way to collaborate

Remove the old GSD namespace and put the OSV data in there

In our old data, we have a GSD and OSV namespace. The GSD namespace is old and not a standard. The OSV namespace in most instances will be a mirror of the GSD data. We should get rid of the old GSD space.

I would prefer we do not use a namespace called OSV as I think it's not clear that it has nothing to do with osv.dev

Additional files to consider

CONTRIBUTOR_LADDER.md
GOVERNANCE.md
SECURITY_CONTACTS.md
ISSUE_TEMPLATE.md
embargo-policy.md
embargo.md
incident-response.md
REVIEWING.md

Create Page on globalsecuritydatabase.org under a "Tutorial" navigation that provides a HOWTO find and file a GSD Issue

Basic "HOWTO find and file a GSD issue" with links to other resources.

The How To should include an explanation and resources for how to:

  1. How To identify a vulnerability, try to answer the question posed here -> https://twitter.com/pry0cc/status/1535708325874106370 using the resources provided by responders.
  2. The best practices for contacting the vendor/project/upstream and reporting the vulnerability.
  3. How To request a GSD ID
  4. How to publishing/using the GSD

Create GSD contribution guide

Create GSD contribution guide

user stories/cases:

researcher/GSD user
get 1 GSD ID
get multiple GSD IDs
get multiple GSD IDs over time
update of 1, many, many over time GSD IDs
how to mark as duplicate of another
how to challenge/delete a GSD

dev
how to get involved in GSD automation/tooling/consumption tools/schema validators/etc.

other stories/use cases?

Improve GSD Project description, who should be interested, and how it adds to the landscape to the GSD Website

This text is from David Brumley in the GSD call. Most of this should be added to the GSD site.

Difference between what our website says and what we say on calls (David Brumley):
The vulnerability landscape is evolving, with many sources of new information, often in their own proprietary format.

The GSD solves three problems in the current vulnerability management landscape:

  • Ensure parsability for machine automation
  • Aggregate vulnerability information from authorities
  • Provide a feed of issues that impact security but may not be in an authority stream, such as results from fuzzing campaigns, malicious software masquerading as good software, and similar.

GSD allows you to make sense of vulnerabilities across vulnerability authorities, security findings, and other threat feeds. Think of it as a machine-parsable modern version of the 1990's bugtraq mailing list.

How to use this: (these are our personas)
If you have sources of new vulnerability information:
If you wish to report a security finding:
If you want to consume the feed for your own projects and products:

Policy document

We need a document that begins to explain policy and expectations.

We should be able to answer questions like

  • What should get an ID
  • How are disagreements handled
  • What happens when there are duplicate IDs

There will be many more such questions and decisions. This repo may not be the best place to track this, but it's currently the most obviously place to have a discussion at the moment

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.