shiftleftsecurity / codepropertygraph Goto Github PK
View Code? Open in Web Editor NEWCode Property Graph: specification, query language, and utilities
License: Apache License 2.0
Code Property Graph: specification, query language, and utilities
License: Apache License 2.0
I was trying to get data-flow to a specific argument to a function call.
For example, considering the following snippet of code:
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <arpa/inet.h>
int main() {
uint32_t a = 28;
uint32_t b = 42;
uint32_t a_n = ntohl(a);
uint32_t b_n = ntohl(b);
char *buf;
uint32_t offset = a_n + 5;
memcpy(buf + offset, buf, b_n);
}
I want to get the dataflow from calls to ntohl
, to the size
argument of memcpy
. So in the example, I would expect the flow b_n = ntohl(a) -> ... -> memcpy(buf + offset, buf, b_n)
.
My query is:
def networkToMemcpy() = {
val source = cpg.call.name("ntoh(s|l|ll)")
val sink = cpg.call.name("memcpy").argument(3)
val paths = sink.reachableByFlows(source)
paths.l.map(
l => l.elements.map(
call => (
call.asInstanceOf[Call].name,
call.asInstanceOf[Call].code,
call.location.filename,
call.location.lineNumber match {
case Some(n) => n.toString
case None => "n/a"
}
)
)
)
}
The problem is, apart from the expected flow, I am also getting the flow of identifier a_n -> memcpy(buf + offset)
which is the first argument of memcpy
.
joern> networkToMemcpy
res100: List[List[(String, String, String, String)]] = List(
List(
("ntohl", "ntohl(b)", "/mnt/c/wd/tmp/t/a.c", "10"),
("<operator>.assignment", "b_n = ntohl(b)", "/mnt/c/wd/tmp/t/a.c", "10"),
("memcpy", "memcpy(buf + offset, buf, b_n)", "/mnt/c/wd/tmp/t/a.c", "15")
),
List(
("ntohl", "ntohl(a)", "/mnt/c/wd/tmp/t/a.c", "9"),
("<operator>.assignment", "a_n = ntohl(a)", "/mnt/c/wd/tmp/t/a.c", "9"),
("<operator>.addition", "a_n + 5", "/mnt/c/wd/tmp/t/a.c", "13"),
("<operator>.assignment", "offset = a_n + 5", "/mnt/c/wd/tmp/t/a.c", "13"),
("<operator>.addition", "buf + offset", "/mnt/c/wd/tmp/t/a.c", "15"),
("memcpy", "memcpy(buf + offset, buf, b_n)", "/mnt/c/wd/tmp/t/a.c", "15")
)
)
It seems that argument
in val sink = cpg.call.name("memcpy").argument(3)
doesn't change the result.
Is there currently a way of getting data-flow for just one argument of a call?
An assignment like
auto [x, y] = std::tuple<int, int>{23, 27};
merely results in an empty block in the AST:
summary: io.shiftleft.codepropertygraph.generated.nodes.Block[label=BLOCK; id=1000106]
id: 1000106
label: BLOCK
propertyKeys: [DYNAMIC_TYPE_HINT_FULL_NAME, INTERNAL_FLAGS, TYPE_FULL_NAME, COLUMN_NUMBER, ARGUMENT_INDEX, ORDER, DEPTH_FIRST_ORDER, CODE, LINE_NUMBER]
propertyMap: {ORDER=1, ARGUMENT_INDEX=1, CODE=, COLUMN_NUMBER=36, TYPE_FULL_NAME=void, LINE_NUMBER=7, DYNAMIC_TYPE_HINT_FULL_NAME=List()}
No variable names, no types, no tuple, no assignments, no 23, and no 27.
Hello,
I created a binary CPG from Jar file and I loaded it in ocular
ocular> loadCpg("log4j.bin.zip")
And then I export it in JSON format:
ocular> cpg.method.toJson |> "/tmp/log4j.json"
The JSON Format is not the one I need, because it's not full, there is only methods.. . I need I full graph representation with alls nodes and edges etc.. The format sould by like this file base.json
So how can I do ? there a command to execue in ocular ?
Thanks in advance for your help.
We currently have two public classes for CPG loading: CPGLoader and ProtoCpgLoader. This is due to the fact that we supported multiple CPG formats in the past. As of today, only the proto format survived. As I was documenting CPG loading, I was wondering whether we should make ProtoCpgLoader a private class that provides the default implementation for CpgLoader. Let's locate all the places where ProtoCpgLoader is used directly and see if we can instead use CpgLoader. If this is not possible, we may need to make modifications to CpgLoader.
If this all works well, I think we should round off the CPG loading topic by creating unit tests against CpgLoader
. What do you guys think?
The main readme of this repository says:
Additional build-time dependencies are automatically downloaded as part of the build process. To build and install into your local Maven cache, issue the command
./publishLocal.sh
.
However, there is no file ./publishLocal.sh
.
The corresponding classes extend from Noderef.
This means that they create a second Noderef referencing the same underlying Node.
However, overflowdb cannot deal with that: The entire logic (e.g. https://github.com/ShiftLeftSecurity/overflowdb/blob/5bf234034dc7b58edf0983753adb253ce340578a/core/src/main/java/overflowdb/NodeRef.java#L91) synchronizes the storage on the Ref, not the node.
Afaiu this part of the API is not used in prod. So let's get rid of it, and afterwards add checks in overflowdb that guarantee the right invariant (every node can only have a unique reference to it).
cc @fabsx00 because you know best which parts of the API are important for whom, and @mpollmeier because you know overflowdb best.
We need a validator pass which checks that all CALL nodes with dispatchType == DispatchType.DYNAMIC_DISPATCH have an outgoing RECEIVER edge and all that have dispatchType == DispatchType.STATIC_DISPATCH do not have an outgoing RECEIVER edge.
Parsing this code:
1 class MyClass
2 {
3 public:
4 int bar()
5 {
6 return 1;
7 }
8 };
9
10 void myfunc()
11 {
12 MyClass *foo = new MyClass();
13 foo->bar();
14 }
I expect the CALL node name on line 13 to be MyClass::bar()
, but the name is foo->bar()
. Further, an internal
METHOD node is correctly created for MyClass::bar()
, but there is also one created for an external
function called foo->bar()
.
Is this expected/correct? If yes, is there a way to determine that CALL foo->bar()
node is referring to type MyClass
, or would one need to implement some type propagation atop the graph to arrive at that?
Thanks!
Currently ProtoCpgLoader.loadOverlays
returns a list of CpgOverlays, which means that we need to hold all overlays in memory at once. We should rather return an iterator and ensure that the users of ProtoCpgLoader.loadOverlays
do not gather all on this iterator. This is probably something we should do after porting ProtoCpgLoader to Scala.
A call to CpgLoader.load(fileName,...)
does not throw a FileNotFoundException if provided with a non existing file name. Please investigate why this is not the case and restore the expectation.
We need a tutorial that explains how to program CPG passes and when that makes sense.
Is there a way to load an existing CPG without creating a new one?
I only see a create_cpg
function in cpgclientlib
.
cpg.call.codeExact("...").head match {
case (call : nodes.Call) :: Nil =>
println(call.argument.l.length)
}
leads to compiler error:
missing argument list for method argument in class CallMethods
[error] Unapplied methods are only converted to functions when a function type is expected.
[error] You can make this conversion explicit by writing `argument _` or `argument(_)` instead of `argument`.
When applying the suggested fix:
cpg.call.codeExact("...").head match {
case (call : nodes.Call) :: Nil =>
println(call.argument(_).l.length)
}
I get the compiler error:
missing parameter type for expanded function ((<x$1: error>) => call.argument(x$1).l.length.shouldBe(1))
[error] println(call.argument(_).l.length)
[error]
I was expecting to simply print the amount of arguments connected to that call node.
Current work-around:
cpg.call.codeExact("...").head match {
case (call : nodes.Call) :: Nil =>
println(call.out(EdgeTypes.ARGUMENT).asScala.toList.length)
}
I am using cpg version: 1.2.25
I just noticed that when running sbt test:scalafmt
on master, formatting is performed, indicating that we don't enforce correctly formatted test code for builds.
The generated NodeKeys.java contains
public static final Key<String> INHERITS_FROM_TYPE_FULL_NAME = new Key<>("INHERITS_FROM_TYPE_FULL_NAME");
This is incorrect: It should be a list of strings.
This trips up vertex.value2(NodeKeys.INHERITS_FROM_TYPE_FULL_NAME)
in fuzzyc2cpg.
WithinMethod
is a quite widely used trait: given it's name, I would have expected e.g. the subclasses MethodReturn
, MethodParameter[In|Out]
etc.
However, because TrackingPoint extends WithinMethod
and Expression extends TrackingPoint
, almost everything extends WithinMethod
.
Is that design intentional or did we end up here accidentally?
Just want to gather some more context while I'm refactoring the DSL.
The readme contains an image link to img/method-header.jpg
. However, this image does not exist.
We recently had some confusion caused by a mixup between the argument
and parameter
step.
One possible usage of the parameter
step is to traverse from a Method
node to its formal parameter nodes.
However, parameter
is defined for ExpressionBase
. So we should document whatever it is supposed to do on Expression nodes that are not Method, potentially move the definition away from ExpressionBase, and potentially throw a runtime exception when called on something that is not of type Method.
We have recently introduced the concept of node tuples: a node can contain other nodes, and we represent this in the graph via edges. There is currently no documentation on how to use this. We currently do not make use of node tuples in the base specification, but the feature is available for developers of graph extensions. It would therefore be nice to include it in the documentation.
Enums are mostly unsupported. In particular, I noticed the following issues:
ANY
. I'm not talking about the identifier nodes of variables of the enum's type. These are set correctly. But the identifiers of enum fields (i.e. their symbolic names) are not.I am not sure whether this is already possible.
I'm specifying ASTs and am creating CPGs from them. I had some trouble finding out how return values need to be specified. I think that there needs to be a gap of 1 between the last order
of an input parameter (METHOD_PARAMETER_IN
) and the order
of the (first) return value (METHOD_RETURN
). This seems to work reliably as long as there only is 1 return value.
Are multiple return values already possible?
Both if I increment order
further for a second return value and if I leave it just the same as ther first return value's order
, I get this error message:
[error] (Writer) java.lang.RuntimeException: Edge of type CFG with direction OUT not supported by class MethodReturnDb
[error] java.lang.RuntimeException: Edge of type CFG with direction OUT not supported by class MethodReturnDb
[error] at overflowdb.NodeDb.storeAdjacentNode(NodeDb.java:621)
[error] at overflowdb.NodeDb.storeAdjacentNode(NodeDb.java:602)
[error] at overflowdb.NodeDb.addEdge(NodeDb.java:298)
[error] at overflowdb.NodeRef.addEdge(NodeRef.java:151)
[error] at overflowdb.SemiEdge.$minus$minus$greater(SyntacticSugar.scala:59)
[error] at io.shiftleft.passes.DiffGraph$Applier.odbAddEdge(DiffGraph.scala:388)
[error] at io.shiftleft.passes.DiffGraph$Applier.addEdge(DiffGraph.scala:380)
[error] at io.shiftleft.passes.DiffGraph$Applier.$anonfun$run$1(DiffGraph.scala:332)
[error] at io.shiftleft.passes.DiffGraph$Applier.$anonfun$run$1$adapted(DiffGraph.scala:328)
[error] at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
[error] at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
[error] at scala.collection.AbstractIterator.foreach(Iterator.scala:1196)
[error] at io.shiftleft.passes.DiffGraph$Applier.run(DiffGraph.scala:328)
[error] at io.shiftleft.passes.DiffGraph$Applier$.applyDiff(DiffGraph.scala:417)
[error] at io.shiftleft.passes.ParallelCpgPass$Writer.run(ParallelCpgPass.scala:105)
[error] at java.lang.Thread.run(Thread.java:748)
It is thrown only when building the CPG, not when just specifying the AST.
Hi, I really admire your works to create this tool and am interested. I would like to use this tool to find some vulnerabilities. I read your paper Modeling and Discovering Vulnerabilities with Code Property Graphs https://www.sec.cs.tu-bs.de/pubs/2014-ieeesp.pdf. And I found that we can traversal a code property graph to find syntax-only, taint-style and control-flow vulnerabilities, like papers' types. But It is difficult for me to write some patterns. Can you provide me some codes of patterns of syntax-only, taint-style and control-flow vulnerabilities? Thank you so much. I have tried to run some examples in joern.
The documentation says:
Variable declaration nodes (type: DeclStmt). Finally, declarations of global variables are saved in declaration statement nodes and connected to the source file they are contained in usingIS_FILE_OFedges.
This does not seem to be the case.
When creating a CPG for
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int foo = 0;
int main(int argc, char *argv[]) {
foo = argc;
exit(0);
}
, joern finds thees nodes: https://gist.github.com/m1cm1c/da34d0cb559cf8fba7360ce51b3de0ed If you search for "foo", you will only find:
Call(
id -> 1000106L,
code -> "foo = argc",
name -> "<operator>.assignment",
order -> 1,
methodInstFullName -> None,
methodFullName -> "<operator>.assignment",
argumentIndex -> 1,
dispatchType -> "STATIC_DISPATCH",
signature -> "TODO assignment signature",
typeFullName -> "ANY",
dynamicTypeHintFullName -> List(),
lineNumber -> Some(8),
columnNumber -> Some(2),
resolved -> None,
depthFirstOrder -> None,
internalFlags -> None
),
Identifier(
id -> 1000107L,
code -> "foo",
name -> "foo",
order -> 1,
argumentIndex -> 1,
typeFullName -> "ANY",
dynamicTypeHintFullName -> List(),
lineNumber -> Some(8),
columnNumber -> Some(2),
depthFirstOrder -> None,
internalFlags -> None
)
Both of these are in line 8, meaning that they are about the assignment foo = argc;
, not about the declaration and definition int foo = 0;
.
The problem seems to be in this repo. The AST created by the code in this repo for the above-mention code is: https://gist.github.com/m1cm1c/4392d54c19e927b998bdf1462fa41573 foo
only occurs in two AST nodes:
summary: io.shiftleft.codepropertygraph.generated.nodes.Call[label=CALL; id=1000106]
id: 1000106
label: CALL
propertyKeys: [RESOLVED, DISPATCH_TYPE, DYNAMIC_TYPE_HINT_FULL_NAME, INTERNAL_FLAGS, METHOD_FULL_NAME, SIGNATURE, TYPE_FULL_NAME, COLUMN_NUMBER, ARGUMENT_INDEX, ORDER, DEPTH_FIRST_ORDER, METHOD_INST_FULL_NAME, NAME, CODE, LINE_NUMBER]
propertyMap: {ORDER=1, ARGUMENT_INDEX=1, CODE=foo = argc, COLUMN_NUMBER=2, METHOD_FULL_NAME=<operator>.assignment, TYPE_FULL_NAME=ANY, LINE_NUMBER=8, DISPATCH_TYPE=STATIC_DISPATCH, SIGNATURE=TODO assignment signature, DYNAMIC_TYPE_HINT_FULL_NAME=List(), NAME=<operator>.assignment}
summary: io.shiftleft.codepropertygraph.generated.nodes.Identifier[label=IDENTIFIER; id=1000107]
id: 1000107
label: IDENTIFIER
propertyKeys: [DYNAMIC_TYPE_HINT_FULL_NAME, NAME, INTERNAL_FLAGS, TYPE_FULL_NAME, COLUMN_NUMBER, ARGUMENT_INDEX, ORDER, DEPTH_FIRST_ORDER, CODE, LINE_NUMBER]
propertyMap: {ORDER=1, ARGUMENT_INDEX=1, CODE=foo, COLUMN_NUMBER=2, TYPE_FULL_NAME=ANY, LINE_NUMBER=8, DYNAMIC_TYPE_HINT_FULL_NAME=List(), NAME=foo}
Again, both of these reference line 8, meaning that they are about foo
's use, not about foo
's declaration or definition.
Implement FreeTrackingPoint via json definition as I proposed in this PR comment: #492 (comment)
.dump
now supports syntax highlighting via source-highlight
. Unfortunately, the escape sequences generated by source-highlight
seem to not work together with Ammonite:
joern> cpg.method.name("malloc").callIn.dump
java.lang.IllegalArgumentException: Unknown ansi-escape [00;38;05;70m at index 3 inside string cannot be parsed into an fansi.Str
fansi.ErrorMode$Throw$.handle(Fansi.scala:419)
fansi.ErrorMode$Throw$.handle(Fansi.scala:407)
fansi.Str$.apply(Fansi.scala:272)
fansi.Str$.implicitApply(Fansi.scala:227)
pprint.Renderer.$anonfun$rec$27(Renderer.scala:136)
pprint.Result$.fromString(Result.scala:53)
pprint.Renderer.rec(Renderer.scala:136)
pprint.PPrinter.tokenize(PPrinter.scala:110)
ammonite.repl.FullReplAPI$Internal.print(FullReplAPI.scala:106)
ammonite.repl.FullReplAPI$Internal.print$(FullReplAPI.scala:61)
ammonite.repl.FullReplAPI$$anon$1.print(FullReplAPI.scala:34)
Note that the following works:
println(cpg.method.name("malloc").callIn.dump)
``
as here, println interpretes the escape sequences.
Related post here: https://gitter.im/lihaoyi/Ammonite?at=5d1a93e19cbde24b2f59b509
This isn't a viable workaround for us though as we also want to make use of `browse`.
Regarding our discussion about ensuring that .foo
always works when .start.foo
works, I am wondering whether this is a corner case or whether I'm just missing an import:
expression.start.inCall
follows incoming argument edges to reach the call for an argument, however, expression.inCall
seems to not work.
I am encountering an issue where deleting nodes and edges in a pass leads to additional edges, never created, to nodes created in a later pass.
I have written and published a proof of concept that replicates the issue in a minimal fashion.
As a quick explanation:
I have three passes:
CreateInitialSetupPass
- creates the initial AST/CFG of the minimal exampleDeleteExtStmtCalls
- this is a pass that goes over the current graph and deletes all call nodes with name EXT_STMT
and rewires their CFG
edges to not leave a gap in the CFGTriggerBugPass
- this pass triggers the bugYou can run the (single) unit test failing due to the bug by
sbt test
The unit test expects there to be a single CFG
edge from DO_FCALL
to a node with code after call
.
Initially a CFG
edge to EXT_STMT
is created here. The second pass removes that edge, and replaces that with a CFG
edge skipping this call. This works as expected and can be verified by commenting out the pass triggering the bug and seeing the unit test pass.
However, after the third pass, there is suddenly an additional CFG
edge leading to the newly created node with the code test
, which inevitably leads to the failing of the unit test. This edge is never created and must not be there
For a quick reference
current output:
List(test, after call)
[info] TestForBug:
[info] the cpg
[info] - should have a single CFG edge after DO_FCALL *** FAILED ***
[info] 2 was not equal to 1 (TestForBug.scala:19)
[info] Run completed in 389 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 0, failed 1, canceled 0, ignored 0, pending 0
[info] *** 1 TEST FAILED ***
expected output (currently achievable by commenting out the third pass):
List(after call)
[info] TestForBug:
[info] the cpg
[info] - should have a single CFG edge after DO_FCALL
[info] Run completed in 352 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 1 s, completed Nov 29, 2020, 6:56:40 PM
[info] 14. Monitoring source files for cpg-bug-poc/test...
[info] Press <enter> to interrupt or '?' for more options.
I want to use joern-plot-proggraph to get cfg
On the first, it run well, but then i got java.lang.OutOfMemoryError as below
`2020-04-17 07:30:30.219+0000 INFO [API] Remote interface ready and available at [http://localhost:7474/]
Exception in thread "qtp443496729-55"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "qtp443496729-55"
Exception in thread "qtp443496729-57" Exception in thread "qtp443496729-56"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "qtp443496729-57"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "qtp443496729-56"
15:35:02.808 [qtp443496729-58] WARN o.e.j.util.thread.QueuedThreadPool -
java.lang.OutOfMemoryError: PermGen space
`
It seems that I can't run joern-plot-proggraph too many times.
Dose anyone know what's the problem?
I reinstalled my system and I didn't have git-lfs installed and had not run git-lfs pull. As a consequence, sbt test
failed with a rather mysterious message about broken ZIP files. We should catch this error and suggest to install git-lfs
.
In the same way we have parallelized TypeDeclStubCreator
, MethodStubCreator
, and MethodDecorator
(see c75a0a8), we should be able to parallelize the Linker
and MemberAccessLinker
using ParallelIteratorExecutor
. This should give us a performance boost on large code bases when processed on machines with many cores.
Travis currently only builds and deploys the scala code in this repository. cpgclientlib
needs to be built and deployed to PyPi. Alternatively, it might be a better choice to simply host that library in another repository.
When running sbt stage
, where my systems protobuf version is 3.9.1, I get errors like:
[error] projects/codepropertygraph/proto-bindings/target/scala-2.12/src_managed/main/compiled_protobuf/io/shiftleft/proto/cpg/Cpg.java:4534:1: cannot find symbol
[error] symbol: class UnusedPrivateParameter
[error] location: class io.shiftleft.proto.cpg.Cpg.StringList
If I downgrade to version 3.7, the build works.
This issues seems similar to this one.
When input code is this:
1 void myfunc()
2 {
3 int a = 42;
4 foo(a);
5 bar(a);
6 }
I get REACHING_DEF
edges (written by line number here):
3 -> 4
4 -> 5
I was expecting edges:
3 -> 4
3 -> 5
That is, I expected line 3 to be a def
and lines {4,5} to be use
, but it appears line 4 is counting as a def
of a
.
Is the behavior I am observing correct / expected (if so, why?) ? Or is this a bug?
Here is a joern query you can run to get the same result:
joern> cpg.graph.edges("REACHING_DEF").foreach((e: Edge) => { println(s"${e.outNode.propertyMap.get("LINE_NUMBER")}: ${e.outNode.propertyMap.get("CODE")} -> ${e.inNode.propertyMap.get("LINE_NUMBER")}: ${e.inNode.propertyMap.get("CODE")}") })
3: a -> 4: a
3: a -> 3: a = 42
3: 42 -> 3: a = 42
3: 42 -> 3: a
4: a -> 5: a
4: a -> 4: foo(a)
5: a -> 5: bar(a)
Thanks!
Hi, may I ask how can I get AST, CFG graph with this tool. We can just use the simple interfaces supplied by queryprimitives.
Currently, the test cases in cpgclientlib
only work with a running JoernServer. To be able to run them as part of the build, we need a mock server (NullServer) that does nothing, but fills the gap.
Hi,
I am curious about the implementation of AST parser. Is it based on antlr4 and is there any optimization on native antlr4 parser?
Thanks!
The examples for querying the CPG that the README provides do not work:
scala> val cpg = io.shiftleft.codepropertygraph.cpgloading.CpgLoader.load("./resources/testcode/cpgs/hello-shiftleft-0.0.5/cpg.bin.zip")
cpg: io.shiftleft.codepropertygraph.Cpg = io.shiftleft.codepropertygraph.Cpg@85a5f4d
scala> cpg.literal.toList
^
error: value literal is not a member of io.shiftleft.codepropertygraph.Cpg
scala> cpg.file.toList
^
error: value file is not a member of io.shiftleft.codepropertygraph.Cpg
scala> cpg.namespace.toList
^
error: value namespace is not a member of io.shiftleft.codepropertygraph.Cpg
scala> cpg.types.toList
^
error: value types is not a member of io.shiftleft.codepropertygraph.Cpg
scala> cpg.methodReturn.toList
^
error: value methodReturn is not a member of io.shiftleft.codepropertygraph.Cpg
scala> cpg.parameter.toList
^
error: value parameter is not a member of io.shiftleft.codepropertygraph.Cpg
scala> cpg.member.toList
^
error: value member is not a member of io.shiftleft.codepropertygraph.Cpg
scala> cpg.call.toList
^
error: value call is not a member of io.shiftleft.codepropertygraph.Cpg
scala> cpg.local.toList
^
error: value local is not a member of io.shiftleft.codepropertygraph.Cpg
scala> cpg.identifier.toList
^
error: value identifier is not a member of io.shiftleft.codepropertygraph.Cpg
scala> cpg.argument.toList
^
error: value argument is not a member of io.shiftleft.codepropertygraph.Cpg
scala> cpg.typeDecl.toList
^
error: value typeDecl is not a member of io.shiftleft.codepropertygraph.Cpg
I'm not sure what
Once you've loaded a cpg you can run queries, which are provided by the
query-primitives
subproject.
means but even when I start the sbt console not via sbt semanticcpg/console
but via sbt queries/console
, the output is the same.
The package io.shiftleft.codepropertygraph.cpgloading is all cleaned up, apart from the fact that there is a Scala class named NodeFilter
that is used by Java code of the CPGLoader. Overall, the loader is only written in a mixture of Scala and Java for historical reasons. We should port the Java parts to Scala to fully clean up the package io.shiftleft.codepropertygraph.cpgloading
.
after i run :
cpg.runScript("pdg-for-funcs-dump.sc")
i got a json file named "pdg-for-funcs.json".it looks like this:
{"functions":** [{
"function" : "printSizeTLine",
"id" : "io.shiftleft.codepropertygraph.generated.nodes.Method@1e0",
"PDG" : [
]
},{
"function" : "printShortLine",
"id" : "io.shiftleft.codepropertygraph.generated.nodes.Method@1b4",
"PDG" : [
]
},{
"function" : "globalReturnsTrueOrFalse",
"id" : "io.shiftleft.codepropertygraph.generated.nodes.Method@2d4",
"PDG" : [
]
},{
"function" : "<operator>.indirectFieldAccess",
"id" : "io.shiftleft.codepropertygraph.generated.nodes.Method@376",
"PDG" : [
]
},{
"function" : "bad9",
"id" : "io.shiftleft.codepropertygraph.generated.nodes.Method@322",
"PDG" : [
]
},{
"function" : "bad8",
"id" : "io.shiftleft.codepropertygraph.generated.nodes.Method@31e",
"PDG" : [
how could i read this? i am totally not understand this. could i transfer this to code again?
There are currently outgoing REF
edges from CALL
to members in case of member accesses, but information about this is not in the documentation. This also seems a bit non-standard, as it only exists for member accesses. We should either document this, or take this as an opportunity to simplify: are these REF edges really required, and if so, would it possibly make more sense to create them from the call arguments to members?
I noticed that some of the enums used only in the backend are referred to in generateJava.py
. For example,
refers to "Frameworks". We need to, instead, look for arbitrary enums (that is members of the root level with names not nodeType
, edgeType
...).
While we have put some work into reducing the memory footprint of the Cpg, DiffGraphs (io.shiftleft.passes.DiffGraph
) are still rather wasteful. Let's explore whether we can further reduce its memory footprint.
sbt publishM2 spits errors... can you help?
the generation was done with protoc version 3.7.1
[error] /home/ubuntu/study/ast-cfg-pdg/codepropertygraph-0.9.141/proto-bindings/target/src_managed/main/compiled_protobuf/io/shiftleft/proto/cpg/Cpg.java:3616:1: cannot find symbol
[error] symbol: class UnusedPrivateParameter
[error] location: class io.shiftleft.proto.cpg.Cpg.StringList
[error] UnusedPrivateParameter unused) {
[warn] There may be incompatibilities among your library dependencies.
[warn] Run 'evicted' to see detailed eviction warnings
[error] /home/ubuntu/study/ast-cfg-pdg/codepropertygraph-0.9.141/proto-bindings/target/src_managed/main/compiled_protobuf/io/shiftleft/proto/cpg/Cpg.java:4220:1: cannot find symbol
[error] symbol: class UnusedPrivateParameter
[error] location: class io.shiftleft.proto.cpg.Cpg.BoolList
[error] UnusedPrivateParameter unused) {
[error] /home/ubuntu/study/ast-cfg-pdg/codepropertygraph-0.9.141/proto-bindings/target/src_managed/main/compiled_protobuf/io/shiftleft/proto/cpg/Cpg.java:4811:1: cannot find symbol
[error] symbol: class UnusedPrivateParameter
[error] location: class io.shiftleft.proto.cpg.Cpg.IntList
[error] UnusedPrivateParameter unused) {
[error] /home/ubuntu/study/ast-cfg-pdg/codepropertygraph-0.9.141/proto-bindings/target/src_managed/main/compiled_protobuf/io/shiftleft/proto/cpg/Cpg.java:5405:1: cannot find symbol
[error] symbol: class UnusedPrivateParameter
[error] location: class io.shiftleft.proto.cpg.Cpg.LongList
[error] UnusedPrivateParameter unused) {
[error] /home/ubuntu/study/ast-cfg-pdg/codepropertygraph-0.9.141/proto-bindings/target/src_managed/main/compiled_protobuf/io/shiftleft/proto/cpg/Cpg.java:5999:1: cannot find symbol
[error] symbol: class UnusedPrivateParameter
[error] location: class io.shiftleft.proto.cpg.Cpg.FloatList
[error] UnusedPrivateParameter unused) {
[error] /home/ubuntu/study/ast-cfg-pdg/codepropertygraph-0.9.141/proto-bindings/target/src_managed/main/compiled_protobuf/io/shiftleft/proto/cpg/Cpg.java:6590:1: cannot find symbol
[error] symbol: class UnusedPrivateParameter
[error] location: class io.shiftleft.proto.cpg.Cpg.DoubleList
[error] UnusedPrivateParameter unused) {
[error] /home/ubuntu/study/ast-cfg-pdg/codepropertygraph-0.9.141/proto-bindings/target/src_managed/main/compiled_protobuf/io/shiftleft/proto/cpg/Cpg.java:1368:1: cannot find symbol
[error] symbol: class UnusedPrivateParameter
[error] location: class io.shiftleft.proto.cpg.Cpg.PropertyValue
[error] UnusedPrivateParameter unused) {
...
[info] Test Scala API documentation successful.
[info] Packaging /home/ubuntu/study/ast-cfg-pdg/codepropertygraph-0.9.141/codepropertygraph/target/codepropertygraph-HEAD+20190425-2148-tests-javadoc.jar ...
[info] Done packaging.
[info] published codepropertygraph to file:/home/ubuntu/.m2/repository/io/shiftleft/codepropertygraph/HEAD+20190425-2148/codepropertygraph-HEAD+20190425-2148-javadoc.jar
[info] published codepropertygraph to file:/home/ubuntu/.m2/repository/io/shiftleft/codepropertygraph/HEAD+20190425-2148/codepropertygraph-HEAD+20190425-2148.pom
[info] published codepropertygraph to file:/home/ubuntu/.m2/repository/io/shiftleft/codepropertygraph/HEAD+20190425-2148/codepropertygraph-HEAD+20190425-2148.jar
[info] published codepropertygraph to file:/home/ubuntu/.m2/repository/io/shiftleft/codepropertygraph/HEAD+20190425-2148/codepropertygraph-HEAD+20190425-2148-tests-javadoc.jar
[info] published codepropertygraph to file:/home/ubuntu/.m2/repository/io/shiftleft/codepropertygraph/HEAD+20190425-2148/codepropertygraph-HEAD+20190425-2148-sources.jar
[info] published codepropertygraph to file:/home/ubuntu/.m2/repository/io/shiftleft/codepropertygraph/HEAD+20190425-2148/codepropertygraph-HEAD+20190425-2148-tests-sources.jar
[error] (protoBindings / Compile / compileIncremental) javac returned non-zero exit code
[error] Total time: 15 s, completed Apr 25, 2019 9:48:20 PM
Not sure if a github issue is the right spot for this, but I couldn't find a place to ask about this.
I am trying to get cpg installed for the sole purpose of being able to extract, in csv format, all of the vertices and edges (with their labels/properties) of the cpg for c/c++ code (for the purpose of getting the graph, not to analyze the code itself). I could not determine a way to do this with just Joern, so I am trying to get the codepropertygraph installed and setup, because according to the understanding-cpg link on the documentation wesbite, it is possible to serialize the graph to CSV.
While that is a problem in itself (I have not found anywhere in the docs that mention how to serialize to csv), I am having problems getting started with this.
I have scala-build-tools and open-jdk8. (output of java -version is: openjdk version "1.8.0_222"
OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1ubuntu1~18.04.1-b10)
OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode)"
I also installed protoc-3.9.1-linux-x86_64 from the releases page linked in this repo's readme, and added both the includes and the binary to /usr/local/include and /usr/local/bin respectively.
I then ran sbt publishM2, which had quite a few errors along the way. I saved the output to a logfile here:
publishM2.log
Any idea why I am running into troubles here?
Alternatively, if there is a different way of serializing an entire cpg using just joern (which I was able to successfully download and get running), any pointers on how to do that?
Thanks
In this PR #461 i introduced a the new ARGUMENT edge in the usual backwards compatible manner with a compatibility pass which handles old format CPGs.
ARGUMENT edges need to be present between CALL nodes and their arguments and RETURN nodes and their returnExpression. The ARGUMENT do not replace the current AST edges which stay as they are.
Why we need the ARGUMENT edges:
So far the arguments of a CALL node where defined via its AST children. This is not possible anymore for constructs like function pointer calls where the receiver is not an argument to the called function. Thus the arguments needed an explicit representation in the graph. E.g. C call funcPtr(a)
: funcPtr
is the receiver but not an argument to the called function. a
is the argument and both funcPtr
and a
are AST children of the CALL node.
For the RETURN node this problem did not occur so we could have stayed with the AST edge but to keep things homogen I also added the addition ARGUMENT edge requirement there so that one always finds the instruction using an argument via the ARGUMENT edge.
Let me know if this is a problem for one of the languages we support or if you see other problems with this or have questions regarding the format change.
The following list shows who adjusts which frontend:
The maven central badge shows that the latest version is 0.10.25, while really, the latest version is 0.10.96. Did this possibly break when we switched to publishing to _2.12
directories?
https://maven-badges.herokuapp.com/maven-central/io.shiftleft/codepropertygraph/badge.svg
.toJson
seems to be available only for pipes derived from NodeSteps
, but not from NewNodeSteps
. In particular, cpg.method.location.toJson
is not defined at the moment. It would be nice if .toJson
worked on all pipes, regardless of whether they inherit from NewNodeSteps
or from NodeSteps
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.