GithubHelp home page GithubHelp logo

bfergerson / arthur Goto Github PK

View Code? Open in Web Editor NEW
16.0 2.0 3.0 4.76 MB

Semantic language-agnostic source code schema

License: Apache License 2.0

Groovy 88.12% Java 5.84% Go 0.71% JavaScript 0.77% Python 0.50% PHP 0.55% Ruby 0.46% C++ 1.13% C# 1.42% Shell 0.50%
uast schema ontology

arthur's Introduction

Arthur

Build Status

Arthur is a semantic language-agnostic UAST (universal abstract syntax tree) schema generator which uses source code as input and outputs unilingual and omnilingual ontologies derived from those language(s). Arthur parses source code using Babelfish and constructs the observed schema for use in a Grakn knowledge graph.

Schemas

Omnilingual Schema

Languages Segments
Bash, C++, C#, Go, Java, JavaScript, PHP, Python, Ruby Arthur_Omnilingual_Base_Structure.gql
Arthur_Omnilingual_Semantic_Roles.gql

Unilingual Schemas

Language Segments
Bash Arthur_Bash_Base_Structure.gql
Arthur_Bash_Semantic_Roles.gql
C++ Arthur_Cplusplus_Base_Structure.gql
Arthur_Cplusplus_Semantic_Roles.gql
C# Arthur_Csharp_Base_Structure.gql
Arthur_Csharp_Semantic_Roles.gql
Go Arthur_Go_Base_Structure.gql
Arthur_Go_Semantic_Roles.gql
Java Arthur_Java_Base_Structure.gql
Arthur_Java_Semantic_Roles.gql
JavaScript Arthur_Javascript_Base_Structure.gql
Arthur_Javascript_Semantic_Roles.gql
PHP Arthur_Php_Base_Structure.gql
Arthur_Php_Semantic_Roles.gql
Python Arthur_Python_Base_Structure.gql
Arthur_Python_Semantic_Roles.gql
Ruby Arthur_Ruby_Base_Structure.gql
Arthur_Ruby_Semantic_Roles.gql

Supported Concepts

Structural

Conditional

  • If/ElseIf/Else
  • Switch/SwitchCase

Exception

  • Try/Catch/Finally

Loop

  • ForLoop
  • ForEachLoop
  • WhileLoop
  • DoWhileLoop

Operator

Logical

  • AndOperator
  • OrOperator

Misc

  • TernaryOperator

Relational

  • RelationalOperator
Compare
  • IsEqualOperator/IsNotEqualOperator
  • IsEqualTypeOperator/IsNotEqualTypeOperator
Define
  • DeclareVariableOperator
  • InitializeVariableOperator

Misc

  • Child
  • Function
  • InternalRole
  • Language
  • Literal
  • Multi
  • Name
  • Role
  • Token
  • Type
  • Wildcard

arthur's People

Contributors

bfergerson avatar chess-equality avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

arthur's Issues

Update StructureFilters to use semantic roles in determing predicate

Currently, the StructureFilter sub-classes all use the internalType/token in determining if the predicate passes or fails. An example of this would be:

class FunctionFilter extends StructureFilter<FunctionFilter, Void> {

    private static final Set<String> functionTypes = new HashSet<>()
    static {
        functionTypes.add("def") //ruby
        functionTypes.add("FuncDecl") //go
        functionTypes.add("MethodDeclaration") //java
        functionTypes.add("FunctionDeclaration") //js
        functionTypes.add("Stmt_Function") //php
        functionTypes.add("FunctionDef") //python
    }

    @Override
    boolean evaluate(SourceNode node) {
        return node != null && node.internalType in functionTypes
    }
}

This results in checking for a specific type per-language. These need to be re-written so the semantic roles are used instead.

Provide segmented schemas

So you can install different parts of the ontology separately.
Example:

  • Entities and attributes
  • Entities, attributes, base roles
  • Entities, attributes, full roles

Roles to indicate multiple roles

Need to look into these roles:

NullLiteralArtifact sub JavaSourceArtifact
	# Semantic
	plays NULL
	plays LITERAL
	plays EXPRESSION
	plays BINARY
	plays RIGHT
	plays CALL
	plays POSITIONAL
	plays ARGUMENT
	plays ASSIGNMENT
	plays IF #null = if?
	plays ELSE
	plays THEN
	plays FUNCTION
	plays BODY
	plays LEFT;

How a null literal can play the role of an IF? Pretty sure it was simply playing a role inside an IF among a couple other roles which signify more concretely what the null literal was doing. Long story short roles of roles may need to be introduced to bring semantic roles back to the realm of being semantic. Something like 'NULL_IF_LITERAL_POSITIONAL_ARGUMENT_EXPRESSION` to mean this artifact can be used as an argument to an if expression.

Each filter should be able to accept/reject

Rewrite filters so each can accept or reject by their desired predicate. Shouldn't need WhitelistRoleFilter/BlacklistRoleFilter and should have same ability for TypeFilter, etc.

Add C# language

  • Add to SourceLanguage
  • Implement Naming
  • Implement Literal
  • Write tests

Tests for literals

The purpose of this ticket is to ensure StructureLiteral and its descendants properly handle the more complex of literal types.

Below I have some for Java. I doubt these are handled correctly. Tests will need to be made to show that Arthur handles them correctly. Ideally, some complex examples from other supported languages should be found and tests added for them too.

Configurable schema

Ability to provide entities, attributes, roles, etc that you wish to capture and schema is generated to hold that data

Common language-agnostic filters

Be able to filter by any language on common concepts such as:

  • operators
    • and operator
    • or operator
    • ternary / Elvis operator
    • null check operator
  • conditionals
    • if
    • else
    • else if
  • loops
    • for
    • foreach
    • while
    • do while
  • switch/case
  • try/catch

Add Bash language

  • Add to SourceLanguage
  • Implement Naming
  • Implement Literal
  • Write tests

Include function arguments in function name

There are currently no tests for StructureNaming based classes. This means that a lot of function names are going to be returned incorrectly.

The goal here would be to start with testing JavaNaming with a simple function functionName(int var) and confirming that works. Then moving on to more complex function names (var args, generics, etc).

The JavaNaming class is also unnecessarily redundant so if you see something that looks like it could be refactored then it's likely a good idea to go for it.

Add C++ language

  • Add to SourceLanguage
  • Implement Naming
  • Implement Literal
  • Write tests

Duplicate role in omnilingual schema

NullLiteralArtifact sub OmnilingualSourceArtifact
	# Semantic
	plays CALL
	plays NULL
	plays LEFT
	plays LITERAL
	plays BINARY
	plays FUNCTION
	plays ARGUMENT
	plays ASSIGNMENT
	plays ELSE
	plays RIGHT
	plays THEN
	plays IF
	plays EXPRESSION
	plays BODY;

JavascriptNullLiteralArtifact sub NullLiteralArtifact
	# Structural
	plays has_leading_comments_relation
	# Semantic
	plays VALUE
	plays MAP
	plays INITIALIZATION
	plays BOOLEAN;

JavaNullLiteralArtifact sub NullLiteralArtifact
	# Semantic
	plays POSITIONAL; #dupe

Necessary schema

Ability to point at local/remote repo and create only the necessary schema to store that repo

Implement StructureType

This class would be used to get a more strongly typed version of SourceNode. For example, instead of a SourceNode with an internal type of For there would be a ForType which allows for more context around the node.

These would also be the results of the various filters instead of the SourceNode(s) they currently return.

Make the tests more language-agnostic

When I first started making the tests I just copied one test to the next. This resulted in many tests which look almost the same but scans a different file. I started merging tests down so they look like this:

@Test
void equalOperator_Java() {
    assertEqualOperatorPresent(new File("src/test/resources/same/operators/Operators.java"),
            "Operators.")
}

@Test
void equalOperator_Javascript() {
    assertEqualOperatorPresent(new File("src/test/resources/same/operators/Operators.js"))
}

@Test
void equalOperator_Php() {
    assertEqualOperatorPresent(new File("src/test/resources/same/operators/Operators.php"))
}

This process needs to be repeated for the remaining classes and used going forward. The only reason it isn't just something like:

SourceLanguages.values().each {
    assertEqualOperatorPresent(new File("src/test/resources/same/operators/Operators." + it.getExtension()))
}

would be because sometimes the tests assert slightly different naming. For example, Java expects names to be fully qualified. If it can be made better, go for it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.