GithubHelp home page GithubHelp logo

cinnqi / vulkg Goto Github PK

View Code? Open in Web Editor NEW
15.0 1.0 2.0 708 KB

Vulnerability knowledge graph construction

Home Page: https://cinnqi.github.io/VulKG/Neo4j-D3-VKG/

Python 91.06% CSS 0.18% HTML 8.68% JavaScript 0.07%
cve d3js knowledge-graph named-entity-recognition neo4j vulnerability

vulkg's Introduction

Fine-grained Named Entity Recognition and Knowledge Graph Construction

Paper published at https://dl.acm.org/doi/abs/10.1145/3540250.3558920

cve-ner: Fine-grained Named Entity Recognition

Neo4j-D3-VKG :Vulnerability knowledge graph visualization

1. Introduction

1.1 Project Introduction

This is my machine learning project, the system is defined as a platform for extracting knowledge from the vulnerability descriptions in the current mainstream vulnerability database CVE and visualizing the results of the extraction. The visualization results are displayed in a knowledge graph, and the value of vulnerability information is deeply explored. It can be analyzed from the time dimension, space dimension, and vulnerability field dimensions, etc.....

For example, the text below is a description of CVE-2009-1194, the description is from "http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2009-1194"

Integer overflow in the pango_glyph_string_set_size function in pango/glyphstring.c in Pango before 1.24 allows context-dependent attackers to cause a denial of service (application crash) or possibly execute arbitrary code via a long glyph string that triggers a heap-based buffer overflow, as demonstrated by a long document.location value in Firefox.

Basically, what I need to do is extracting the information from description above, factors like cause, location, consequence, version need to be recognized. For this specific instance, the extracted info should like this:

cause: Integer overflow

location: in the pango_glyph_string_set_size function in pango/glyphstring.c

version: in Pango before 1.24

attacker: context-dependent attackers

consequence: denial of service (application crash) or possibly execute arbitrary code

triggering operation: a long glyph string

After extracting info and adding some keys of vulnerabilities from CVE website, we can conduct a knowledge graph and visualize it.

1.2 Previews Steps

There is a lot of work to be done before visualizing the knowledge graph.

  1. create own dataset

    For this project, the dataset is from an article "A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries"

  2. date labeling

    For all 3000 records, labeled 1000 records

  3. NER (model training and prediction)

    I use google pretrain model "Bert_base" to do NER task.

    Bert_pretrain_model: https://huggingface.co/bert-base-uncased/tree/main

    The distribution of labeled dataset as below

    training set(915) dev set(102)
    version: 901 version: 100
    consequence: 871 consequence: 94
    attacker: 823 attacker: 88
    triggering operation: 819 triggering operation: 86
    location: 755 location: 84
    cause: 730 cause: 75
    happened scenario: 64 happened scenario: 9

    After adjusting the parameters and countless times of training, finally after 20 epochs the model performance as below:

    image-20210506125621583

  4. import data into Neo4j

    the data in Neo4j

    image-20210506123243107

    When all those previews steps were done, final step is visualize the graph in neo4j.

2. User Guide

  1. If the graph does not appear at the beginning, it will appear after a few refreshes.

  2. Use the mouse wheel to zoom in or out of the graph.

  3. Place the mouse on any node, all the nodes related to this node and the relationship between them will appear, and the related information will be automatically displayed on the right side.

    image-20210506131056041

  4. Mode switch button to switch between different visual representations of nodes, circle or text.

  5. The bars in different colors on the left represent different types of nodes, and the On/Off switch can turn on or off the visual display of all nodes of the same type.

    image-20210506131716394

vulkg's People

Contributors

cinnqi avatar cinqisap avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

vulkg's Issues

如何运行本项目?

你好,我想请问一下在克隆本项目之后,应该怎么运行呢?我没有看到README中有相关介绍。

数据集的问题?

您好,您这个项目非常棒,我下载了下来,但是训练集只有450条数据,您readme中提到的训练集数据是915条。您可以提供全部数据吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.