cloudsecurityalliance / gsd-tools Goto Github PK
View Code? Open in Web Editor NEWGlobal Security Database Tools
Home Page: https://gsd.id
License: Apache License 2.0
Global Security Database Tools
Home Page: https://gsd.id
License: Apache License 2.0
As much as I'm sure we'd all like to see CPE's go away, it seems unlikely at this point and I think it'd be useful to help with mapping existing CPE-based data to something more useful.
Should we have some sort of CSA maintained mapping of CPE components -> ecosystem/packagename (here or as a separate repo)?
As an example I started https://github.com/westonsteimel/package-metadata to help with some of my personal stuff where I was trying to map CPE's to Package URL, but I think it'd be great to have this kind of information somewhere that is more likely to actually be useful to others.
I'm fairly certain I saw a list of CPE-> Debian package name mappings in the Debian repos somewhere as well, and I'm sure there are many other sources. I think it'd be great to have everything gathered in a central place
We need to review / document the current CodeQL workflows.
Provide some space in the documentation to provide scoping for our current goals in regard to CodeQL.
Create a contribution guide for CodeQL.
I know @kurtseifried has documented some great stuff about potential data formats here, but I think it would be quite helpful to work through an actual example record, and I suggest having a try with GSD-2020-7471.
I like this one because it presents several common challenges that I would like us as a community to work on addressing. For instance, how do we want to handle package naming and versioning differences across various package managers? Here, the vulnerability is for Django, and for PyPI specifically we have PYSEC-2020-35 in the OSV format; however, what about for the Debian package where the name is python-django and there are fixes backported to earlier versions than PyPI?
If we end up using something like the OSV format for the primary GSD namespace, is this one OSV record with multiple affected entries with various ecosystem, package name, and version entries, or is it something like an entire OSV record for each ecosystem as a separate namespace (so each OS or packaging ecosystem could potentially have its own custom description, etc)?
Or do we want a separate GSD id for each one and some parent record that unifies them?
In this case I'd expect to see something for at least the following:
I'll try to throw in some example json of a few possible approaches when I have some more time, but please start sharing any ideas you all have!
Anyone who has ever submitted a request or correction can be on the approver list
We also need to document the flow of the request and approve process
Should we officially use camelCase? Google guidelines? https://google.github.io/styleguide/jsoncstyleguide.xml
Whatever we do, consistency is key.
Mirror GitHub advisories to a github.com
namespace similar to what we are doing for NVD data. Not every GitHub advisory corresponds to a CVE, but if it does it should be easy enough to map it to the correct GSD entry using the alias field. If there is no CVE for it maybe create a GSD candidate for it? Or just leave that part as a later enhancement
We need to be very transparent about who is involved with GSD. We should also explain the structure of data and who can use the data. We should try to be as transparent as possible about the project.
Right now we have a web form to request one GSD ID
https://requests.globalsecuritydatabase.org/
This only allows one to request a new ID.
It is very common for a CVE ID to be lacking useful details. See this thread for an example
https://twitter.com/joshbressers/status/1492870755167748104
It would be valuable to have a way that isn't a PR to add such details to GSD.
I envision a workflow looking something like
We could skip the review step if we want to namespace the edits
We need to do some work around the data format we want to use. If you search the discussion archive
https://groups.google.com/a/groups.cloudsecurityalliance.org/g/gsd
There is a preference for using JSON-LD. This give a number of benefits such as being able to link data together from disparate locations and add some context to what the various fields are for parsing.
I have the beginnings of a prototype for this here
https://github.com/joshbressers/uvi-tools/tree/json-ld/json-ld
It's a VERY early attempt, this should be considered experimental
When I search for something that isn't an exact match I am redirected to the gsd-database repo search page.
We should probably have an intermediate page which shows that no results are found and links to the GH search while giving some examples of what a search should look for so we can clarify what fields would be indexed by a search.
This might want to expand to an advanced search that could selectively dig deeper into the fields or focus on some of our data delineation based on time range or the submission year.
The "securitylist tool" is how we mirror the NVD and CVE data. It currently has little to no error checking and was the result of MVP work.
This tool could use some ❤️ and possibly a small refactor. The goal of this work would be to make this script more reliable and better documented.
The expectation here is to incrementally de-risk this code by:
Introduce automation via GH-Actions to automation scripts that are currently being run by hand.
@enck I did the default jekyll handler in cloudflare pages,and it spat this out:
2023-02-01T23:26:40.496783Z Cloning repository...
2023-02-01T23:26:43.770209Z From https://github.com/cloudsecurityalliance/gsd-tools
2023-02-01T23:26:43.770881Z * branch bfb30d0 -> FETCH_HEAD
2023-02-01T23:26:43.771099Z
2023-02-01T23:26:44.25026Z HEAD is now at bfb30d0 Merge pull request #93 from oswalpalash/merge-project-tools
2023-02-01T23:26:44.251036Z
2023-02-01T23:26:44.39709Z
2023-02-01T23:26:44.427637Z Success: Finished cloning repository files
2023-02-01T23:26:45.175018Z Installing dependencies
2023-02-01T23:26:45.188524Z Python version set to 2.7
2023-02-01T23:26:48.847221Z v12.18.0 is already installed.
2023-02-01T23:26:50.100308Z Now using node v12.18.0 (npm v6.14.4)
2023-02-01T23:26:50.366967Z Started restoring cached build plugins
2023-02-01T23:26:50.382403Z Finished restoring cached build plugins
2023-02-01T23:26:50.949994Z Attempting ruby version 2.7.1, read from environment
2023-02-01T23:26:54.679316Z Using ruby version 2.7.1
2023-02-01T23:26:55.04601Z Using PHP version 5.6
2023-02-01T23:26:55.046932Z Started restoring cached ruby gems
2023-02-01T23:26:55.064186Z Finished restoring cached ruby gems
2023-02-01T23:26:55.065453Z Installing gem bundle
2023-02-01T23:26:55.344195Z [DEPRECATED] The --path
flag is deprecated because it relies on being remembered across bundler invocations, which bundler will no longer do in future versions. Instead please use bundle config set path '/opt/buildhome/cache/bundle'
, and stop using this flag
2023-02-01T23:26:55.493793Z [DEPRECATED] The --binstubs option will be removed in favor of bundle binstubs
2023-02-01T23:26:55.626898Z The dependency tzinfo (>= 1, < 3) will be unused by any of the platforms Bundler is installing for. Bundler is installing for ruby but the dependency is only for x86-mingw32, x64-mingw32, x86-mswin32, java. To add those platforms to the bundle, run bundle lock --add-platform x86-mingw32 x64-mingw32 x86-mswin32 java
.
2023-02-01T23:26:55.627211Z The dependency tzinfo-data (>= 0) will be unused by any of the platforms Bundler is installing for. Bundler is installing for ruby but the dependency is only for x86-mingw32, x64-mingw32, x86-mswin32, java. To add those platforms to the bundle, run bundle lock --add-platform x86-mingw32 x64-mingw32 x86-mswin32 java
.
2023-02-01T23:26:55.627363Z The dependency wdm (> 0.1.1) will be unused by any of the platforms Bundler is installing for. Bundler is installing for ruby but the dependency is only for x86-mingw32, x64-mingw32, x86-mswin32. To add those platforms to the bundle, run > 0.6.0) will be unused by any of the platforms Bundler is installing for. Bundler is installing for ruby but the dependency is only for java. To add those platforms to the bundle, run bundle lock --add-platform x86-mingw32 x64-mingw32 x86-mswin32
.
2023-02-01T23:26:55.627494Z The dependency http_parser.rb (bundle lock --add-platform java
.
2023-02-01T23:26:57.898377Z Fetching gem metadata from https://rubygems.org/............
2023-02-01T23:26:58.015672Z Fetching gem metadata from https://rubygems.org/.
2023-02-01T23:26:58.083109Z Resolving dependencies...
2023-02-01T23:26:58.121553Z sass-embedded-1.57.1-x86_64-linux-gnu requires rubygems version >= 3.3.22, which
2023-02-01T23:26:58.121847Z is incompatible with the current version, 3.1.2
2023-02-01T23:26:58.157761Z Error during gem install
2023-02-01T23:26:58.164279Z Failed: build command exited with code: 1
2023-02-01T23:26:59.048184Z Failed: an internal error occurred
It's not entirely clear who the GSD data is for. We need to define personas and use cases with examples of each.
This issue should be broken apart into many smaller issues
Possible issues (this list is not complete or accurate)
Currently, "gsd-bot" only produces OSV data for kernel entries. It should produce OSV format data for all sources.
This will require some updates to the create gsd webform.
Create a sandbox for people to play with and demo with GSD both for the purposes of improving the product, but also to allow interactive potential feedback
MUST use test data
MUST provide clear instructions for use.
The LICENSE file has no year and owner
Currently the Javascript GitHub Action fails due to CodeQL v1 being deprecated. See latest build: https://github.com/cloudsecurityalliance/gsd-tools/actions
The GSD doesn't define "vulnerability", so it may be unclear to others what is in and out of scope.
I believe we want to have a superset, and let the user filter based upon what they find useful. The ability to parse -- thus filter -- is a killer feature IMO.
List of interesting security vendor pages and data sources that have good quality data and partial CVE coverage:
https://www.debian.org/security/
https://www.twcert.org.tw/en/lp-139-2.html
https://github.com/SummitRoute/csp_security_mistakes
https://docs.r3.com/en/platform/corda/4.8/open-source/release-notes.html (and other versions)
https://huntr.dev/bounties/hacktivity
https://www.malvuln.com/
https://github.com/RhinoSecurityLabs/CVEs
https://blog.sonarsource.com/tag/security
a wide variety of data, data types and so on from generally reliable sources.
Add the information from the CISA known exploited vulnerabilities catalog to the relevant GSD entries under a cisa.gov
namespace. The JSON version is at https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json
Reminder for @joshbressers
Squash commits into one descriptive commit.
Include helpful info in the commit like:
These are kernel requests from X to Y time
This was automatically generated by the script
ID Range: GSD-2022-1000xxx - GSD-2022-1000yyy
TODO: investigate GitHub actions for finding vulns being published?
Automatic monitor github cve using Github Actions
https://github.com/p1ay8y3ar/cve_monitor
CVEMON - Monitoring exploits & references for CVEs
https://github.com/ARPSyndicate/cvemon
PoC in GitHub
https://github.com/nomi-sec/PoC-in-GitHub
Collecting vulnerabilities for 2022
https://github.com/binganao/vulns-2022
Bug writeups for OpenGitLab
https://github.com/OpenGitLab/Bug-Storage
Packetstorm
https://packetstormsecurity.com/
Hi All,
I'm sharing the repository (gsd-analysis) I brought up Monday in the GSD meeting for analysis of the gsd-database. I made a few updates to the documentation for reproducibility.
I think the README from the gsd-analysis repo starts to handle some of the "Data" to-dos for the GSD project, mainly for documentation purposes.
Example, the current overall data structure of the gsd-database:
{
"GSD": {"type": "object"},
"OSV": {"type": "object"},
"namespaces": {
"properties": {
"cisa.gov": {"type": "object"},
"cve.org": {"type": "object"},
"gitlab.com": {"type": "object"},
"nvd.nist.gov": {"type": "object"},
"github.com/kurtseifried:582211": {"type": "object"}
}
},
"overlay": {
"properties": {
"cve.org": {"type": "object"}
}
}
}
GSD Object Schema: ./data/schemas/schema_gsd_object.json
OSV Schema: ./data/schemas/schema_osv.json
Schemas would obviously need to be agreed/approved/validated before we could make any helper functions to validate incoming data into the gsd-database, but the gsd-analysis.py helps produce the initial bases for what's in the database.
I'm currently unsure of the future home of the gsd-analysis; maybe a new tool within gsd-tools? Could it be the start of a dashboard? Also, while not the initial intention, it looks for inconsistencies/outliers within the gsd-database that could help with data cleansing.
Let me know what you all think and the direction to potentially integrate the analysis into the project.
Thanks!
Related to #120
The gsd-bot has a "special kernel mode" that is hard to understand and if left alone will become an impediment to future development.
The scripts need to be documented, provided some care and test coverage, and eventually rewritten.
We will likely need some help from Josh and Oliver to understand the nuance of this scripts so let the learning begin.
Related to #18.
GSD Web allows users signed into Github to edit the JSON, description, and references for an existing GSD ID.
There are other details that may need to updates or correction over the life of a GSD:
...
"vendor_name": "Uber",
"product_name": "email for uber.com",
"product_version": "current as of 2022-01-02",
"vulnerability_type": "unspecified",
"affected_component": "unspecified",
"attack_vector": "unspecified",
"impact": "email spoofing for uber.com",
"credit": "",
...
Each field in the GSD should editable as an individual field. We should be able to validate the core schema against a JSON Schema. namespaces should allow for fields outside the schema.
Timestamps and Dates related to published and modified should not persist.
The resulting change should be submitted to Github as a Pull Request.
Below are the current show/edit views available so we can use them as a reference to improve
All the repos should have a CONTRIBUTOR.md
https://github.com/cloudsecurityalliance/gsd-database/blob/draft-docs/CONTRIBUTOR.md
I'm assigning this to @athix who will then mail the group about these files
Mirror GitLab Community Advisories to a gitlab.com
namespace
Problem 1: The URL fields for references are fixed length and to short to see most of any URL.
Problem 2: if you leave a blank URL(s) at the end the json has blank URLs that get populated to the entry.
The "gsd-bot" lives in the gsd
directory its purpose is to look at issues in Github and create entries in the GSD database. It currently has little to no error checking and was the result of MVP work.
This tool could use some ❤️ and possibly a small refactor. The goal of this work would be to make this script more reliable and better documented.
The expectation here is to incrementally de-risk this code by:
Since the CVE data is inside a git repo, it should be possible with something like this
pre_pull_commit=$(git rev-parse HEAD)
git pull
git diff --name-only "$pre_pull_commit" HEAD
It is possible to use GitPython for this, but this is quite a large dependency and internally it uses the git executable.
For this the fetching of the CVEs should be moved from securitylist/src/update.sh
into a separate fetch function.
It might in general be a good idea to have the fetching of the external sources (CVE, NVD, CISA, GitLab) in seperate modules and let these manage their own repos and caches.
Document format of files - waiting on email thread to confirm changes to file format. https://groups.google.com/a/groups.cloudsecurityalliance.org/g/gsd/c/DgSaS-lAsP4
I would like to start proposing improvements to the NVD data from the nvd.nist.gov
namespace on GSD entries in the hope that someday we can find a way to provide those updates back to the source. The nvd.nist.gov namespace is readonly and populated by the GSD bot, so editing it directly is not supported. I know at some point @kurtseifried had provided a correction to a CPE and I think that was done by copying the affected property into a namespace nvd.nist.gov
under the GSD
namespace, but I'd like to understand if that is really how we want this to work moving forward.
I do have an example of a small proposed GSD entry change at CloudSecurityAlliance/gsd-database#2400
python3 -m json.tool defaults to indent=4 and there isn't any way to set it to indent=2 (there is in newer versions perhaps but I don't have an up to date version apparently).
#!/usr/bin/env python3
import sys
import json
from tempfile import mkstemp
import os
import shutil
filename = sys.argv[1]
with open(filename, 'r') as filehandle:
try:
json_data = json.load(filehandle)
except:
print("Your JSON is broken, reloading to spit the error out\n\n")
data = json.load(filehandle)
quit()
json_output = json.dumps(json_data, indent=2)
json_output = json_output.rstrip("\n")
file_descriptor, path = mkstemp()
with open(path, 'w') as f:
f.write(json_output)
os.close(file_descriptor)
shutil.move(path, filename)
I don't see any problems here,
Hi,
we are working on an open source project to make exploitation information easier to collect and access: https://github.com/gmatuz/inthewilddb
It is mostly driven by our own need to help prioritisation in vulnerability and patch management, using what is the simplest (it is clear and binary) and most salient information about a vulnerability in this aspect.
I'm wondering if you think this is useful and we could find a way to collaborate
In our old data, we have a GSD and OSV namespace. The GSD namespace is old and not a standard. The OSV namespace in most instances will be a mirror of the GSD data. We should get rid of the old GSD space.
I would prefer we do not use a namespace called OSV as I think it's not clear that it has nothing to do with osv.dev
So I often run across interesting vulnerabilities, e.g.:
https://twitter.com/balancerlabs/status/1525277944674930689?s=11&t=3gk-4MzTiAsNvZmog9ttOA
should we perhaps have some sort of of queue (text file of urls? csv with some basic info fields?) to track these so if someone wants to investigate and assign GSD's there's a list of interesting items to start with?
Add last commit to the web page as version string so once an update is made and the site rebuilds/reloads we know if it actually took.
CONTRIBUTOR_LADDER.md
GOVERNANCE.md
SECURITY_CONTACTS.md
ISSUE_TEMPLATE.md
embargo-policy.md
embargo.md
incident-response.md
REVIEWING.md
Basic "HOWTO find and file a GSD issue" with links to other resources.
The How To should include an explanation and resources for how to:
Create GSD contribution guide
user stories/cases:
researcher/GSD user
get 1 GSD ID
get multiple GSD IDs
get multiple GSD IDs over time
update of 1, many, many over time GSD IDs
how to mark as duplicate of another
how to challenge/delete a GSD
dev
how to get involved in GSD automation/tooling/consumption tools/schema validators/etc.
other stories/use cases?
This text is from David Brumley in the GSD call. Most of this should be added to the GSD site.
Difference between what our website says and what we say on calls (David Brumley):
The vulnerability landscape is evolving, with many sources of new information, often in their own proprietary format.
The GSD solves three problems in the current vulnerability management landscape:
GSD allows you to make sense of vulnerabilities across vulnerability authorities, security findings, and other threat feeds. Think of it as a machine-parsable modern version of the 1990's bugtraq mailing list.
How to use this: (these are our personas)
If you have sources of new vulnerability information:
If you wish to report a security finding:
If you want to consume the feed for your own projects and products:
Guided process / web form for creation of new GSDs
We need a document that begins to explain policy and expectations.
We should be able to answer questions like
There will be many more such questions and decisions. This repo may not be the best place to track this, but it's currently the most obviously place to have a discussion at the moment
Per MVP Requirements
When attempting to edit: https://gsd.id/GSD-2022-47966
And add a reference of Article (https://vulncheck.com/blog/cve-2022-47966-payload)
I get the above error. Guessing this is due to an issue with how I wrote the parsing for GSD IDs that are not conformant to the schema yet.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.