GithubHelp home page GithubHelp logo

Comments (11)

joshbressers avatar joshbressers commented on July 20, 2024 2

I've been putting some thought into this, and I think I have some ideas.

The GSD namespace is really what we want to be THE source of information. If we want to see updates and/or corrections, that's really what we want. I think we should figure out how to turn existing NVD entries into GSD (OSV) format.

If we want to modify the data, that's the place to do it. There will be some issues with keeping NVD changes current is a different discussion, so let's just ignore that point for the moment.

If someone wants to add their own data, that's where a namespace makes sense I think. Some namespaces will be merged into GSD, some won't. It'll just depend, but the point is your namespace is for whatever you want.

from gsd-tools.

kurtseifried avatar kurtseifried commented on July 20, 2024

So several questions/comments:

  1. we want to import the data automatically from various sources as much as possible (scaling/efficiency, etc)
  2. what do we do when that data is broken or incorrect? do we fix it? where?
  3. what happens when (if?) the original source changes it, was that a fix? (or is it more brokan?) do we want someone to review it? do we flag it?
  4. as for where we fix the data, if we write the NVD data in, then fix it, we have that trail in git, e.g. we still have the original data (if anyone cares to go look for it)
  5. if we are fixing it by overwritting it, how do we indicate we changed it from the original? "hey GSD, you have a typo, NVD says CVSS score of X, you say Y"
  6. how do we indicate WHY we changed it, e.g. what evidence caused us to change this data?
  7. if we fix the data outside the namespace where it exists, and then for example have the API overlay our improved data when serving the file, do we put the overlay data in the root namespace, or what?

from gsd-tools.

kurtseifried avatar kurtseifried commented on July 20, 2024

Also your commit seems to assume the NVD namespace is empty, which it isn't, so the commit needs to be reworked to overwrite (merge? overlay? some word like that) the data into the existing "nvd.nist.gov" space (or else the json breaks, can't have multiple identically named keys).

This also all assumes we want to overwrite the "broken" entry and not just insert a more correct one.

from gsd-tools.

kurtseifried avatar kurtseifried commented on July 20, 2024

More thinking outloud:

this is why we picked git initially. We can do things like have a random person overwrite a "Read only" namespace and 1) we can revert it easily and 2) we can know exactly who did what when, and with a good commit message why.

So I'm inclined, for now, to allow overwrites of these "Read only" spaces, and we see where this goes and how to best deal with it, worst case we clean it up by overwriting it with the NVD data and we move the altered data somewhere else.

from gsd-tools.

westonsteimel avatar westonsteimel commented on July 20, 2024

I was basing it off of what you'd done at CloudSecurityAlliance/gsd-database@1efab96, though once it was fixed in NVD someone else reverted that change. The bot that runs to populate the nvd.nist.gov is definitely going to clobber anything we put there, so it doesn't really make sense to suggest changes at all I guess

from gsd-tools.

joshbressers avatar joshbressers commented on July 20, 2024

The whole point to a namespace is keeping it off limits to others. I like that we can't monkey with the NVD or CVE data. What we have is what they have. If they update something, we pick up that update quickly.

We know we want to be able to add enrichment data, that's basically the whole point right now. There's no good way to enrich the existing NVD upstream data without a lot of pain

I think there are two types of enrichment (let's just think about this in the context of NVD for the moment, it will help keep the scope sane)

  • Correcting an existing read only namespace
  • Adding new data

For adding new data, I think either adding something the GSD namespace or your own namespace would be fine.

For corrections there isn't a great way I'm aware of that can correct portions of the NVD data

from gsd-tools.

kurtseifried avatar kurtseifried commented on July 20, 2024

Ok so adding data is easy if you use your own namespace, the trick becomes knowing when/how to overlay it, e.g. let's assume I use seifried.org, and people trust/want to use my data in my namespace. If I have something like (again for the sake of argument):

overlay: { namespaces: { cve.org: [some CVE data like an affects set of data]

are we adding my data to the cve.org data? replacing it entirely? Because two very common cases are "they got it wrong, here's the correct one" and "they are incomplete, here's more data". When you can only have one item, like a description it's easy, it overwrites the existing one, but when you have lists (e.g. affects, or references) then what?

So we may also need a way to indicate that this data is "in addition to" or "replaces" whatever keys are in the same space. We also need a way to specify what we're adding or overwriting (originally I used the term "overlay", I still can't think of a better one).

One option would be:

overlay: { replace: { namespaces: { cve.org: [some CVE data like an affects set of data]

overlay: { addto: { namespaces: { cve.org: [some CVE data like CVSS environmental data]

Now having said this all, there may be a better solution:

We populate the root osv:{} based on data in the root and in namespaces (e.g. CVE,. NVD, etc.). We can basically just sort it out and write "the best truth" in osv:{}, if people don't l;ike it, they can choose to have their own parsing rules (e.g. "do we trust seifried.org namepsace to overlay for cve.org?") and so on.

My vote, for now:

People write stuff to their namespaces. We parse it and write it to the root osv:{}. This will be a natural extension of the GSD to osv:{} conversion I'm working on (it already has this mindset).

from gsd-tools.

raphaelahrens avatar raphaelahrens commented on July 20, 2024

To describes changes to JSON there is JSON Patch defined in RFC 6902 and the there is definition to merge JSON objects in RFC 7396

from gsd-tools.

kurtseifried avatar kurtseifried commented on July 20, 2024

The challenge here is we then... store the original JSON files and a series of patches and the final file (e.g. so we don't make everyone apply the series to get up to date), or.. something else? For now, the solution is git, this gives us the history, and the ability to roll back, the hosting is easy (github), and distribution (git clone/git pull) all in one tidy bundle.

from gsd-tools.

raphaelahrens avatar raphaelahrens commented on July 20, 2024

Fist let me clarify I am not proposing that JSON patch should be used or other forms of diffs. But the question was raised how this could be encoded and before a new format is defined I wanted to mention that others already worked on this problem.

It is also possible to include a JSON patch inside the object that should be changed.

{
  "foo":  1,
  "patch": [
    { "op": "replace", "path": "/foo", "value": 2 }
  ]
}

So the patch could be put into a namespace and there is no need to manage multiple files for one GSD entry.

But this is only a viable solution if you want to store changes to the read only data. Maybe to highlight false data in CVE and co or to clarify contradicting data with GSD and CVE.

I agree with @joshbressers the approach to improve the GSD namespace is the most sensible.

from gsd-tools.

kurtseifried avatar kurtseifried commented on July 20, 2024

So there's two issues here:

  1. what technical method do we use to update the JSON data (e.g. direct overwrite? patch?)
  2. depending on the technical method, if we overwrite the data how do we handle updates from NIST for example? If we use patche(s) how do we apply them and in what order? What happens when a patch gets out of synch with the data (e.g. NIST updates it to delete an entry or something)

My thinking here is we don't touch cve.org/nist, we synthesize that data, patch it, whatever, and put the result into the GSD namespace. Then for example when the API is serving the data the requestor gets the best up to date complete data from GSD.

from gsd-tools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.