GithubHelp home page GithubHelp logo

srcml / srcml Goto Github PK

View Code? Open in Web Editor NEW
100.0 14.0 24.0 34.02 MB

srcML Toolkit

Home Page: srcml.org

License: GNU General Public License v3.0

CMake 3.26% XSLT 1.35% C++ 53.89% Shell 20.69% C 2.87% GAP 17.17% HTML 0.11% Inno Setup 0.66%

srcml's Introduction

srcML

srcML is an XML format for source code. The XML markup identifies elements of the abstract syntax of the source-code language. The toolkit includes parsing supports conversion of C, C++, C#, and Java both to and from the srcML format. The format allows leveraging XML tools to support the various tasks of source code exploration, analysis, and manipulation.

srcML toolkit includes:

  • srcML client srcml

    Conversion to the srcML format, querying, and transformation on srcML, and conversion of srcML back to source code

  • libsrcml

    A C interface for translation of source code to and from srcML, as well as, efficient manipulation and fact extraction (XPath, XSLT, and RelaxNG). The srcML client srcml is built using libsrcml.


Webpage

srcML.org

Contact & Discussion

For questions or suggestions, please contact us via email [email protected].

To keep up with development, ask questions, or get involved with the conversation, join our Discord server srcML.org. To be invited, please contact us via email [email protected] and provide your Discord id.

srcml's People

Contributors

bbartman avatar dtg3 avatar hmm34 avatar jmaletic avatar mikeweyandt avatar mjdecker avatar mlcollard avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

srcml's Issues

C++11 Parameter packs

C++11 adds support for the following parameter packs

type ... Args
typename|class ... Args
template < parameter-list >

Args ... args

pattern ...

The first three are template parameter packs (variadic templates), the second is function parameter pack, and the last is a parameter pack expansion.

All of these need to be verified if they work and test cases may need to be added.

At the very least

template<class ... Types> struct Tuple;

Does not appear to be correct.

<template>template<parameter_list>&lt;<param><type/>class ... 
<name>Types</name></param>&gt;</parameter_list> <struct_decl>struct
<name>Tuple</name>;</struct_decl></template>

C++11 Uniform Initialization in member initialization list

Most forms of Uniform Initialization seem to work, but member list is incorrect:

 AltStruct(int x, double y) : x_{x}, y_{y} {}

This causes early termination.

 <member_list>: <call><name>x_</name><argument_list>{<argument><expr><name>x</name></expr></argument></argument_list></call></member_list></constructor></public>}</block><decl>, <name>y_</name><block>{<expr_stmt><expr><name>y</name></expr></expr_stmt>}</block></decl>

C++11 enum class

C++ provides support for enum classes. Of the various types, the only thing that probably needs considering is that a type can be specified. This is currently marked with a name and not surrounded with a type

enum class Enum2 : unsigned int {Val1, Val2};
<enum>enum class <name>Enum2</name> : <name>unsigned</name> <name>int</name> <block>{<expr><name>Val1</name></expr>, <expr><name>Val2</name></expr>}</block>;</enum>

I think this should be the change.

: <name>unsigned</name> <name>int</name>
: <type><name>unsigned</name> <name>int</name></type>

Identifiers beginning UR seem to break.

URSG;

Will cause a seg fault. Reason raw strings are trying to eat U and R then " it fails but look ahead is only one. So, seems to loose UR.

This may also occur for other combinations.

C++11 initializer lists.

This is valid.

A a{1, 2};

Currently, it is marked up as a block.

<decl_stmt><decl><type><name>A</name></type> <name>a</name>
<block>{<expr_stmt><expr>1</expr>, <expr>2</expr></expr_stmt>}</block>
</decl></decl_stmt><empty_stmt>;</empty_stmt>

C Digraphs and trigraphs

C language allows for the following:

void f() {
}

is equivalent to

void f() <%
%>

and were probably used for systems that say do not have the keys { or }

This is most likely an extremely low priority item that may not need to be implemented.

C++11 specifier thread_local

C++11 added the specifier thread_local to indicate thread storage duration.

thread_local unsigned int rage = 1; 

C++11 noexcept

C++ 11 deprecates throw and replaces it with noexcept.

template <class T>
void foo() noexcept(noexcept(T())) {}

void bar() noexcept(true) {}

Preprocessor missing additional markup

line has already been corrected.

pragma looks like it needs some additional markup.

#pragma GCC dependency "parse.y"

becomes

<cpp:pragma>#<cpp:directive>pragma</cpp:directive> GCC dependency "parse.y"</cpp:pragma>

Think going to add a cpp:name (seems cpp_symbol does this so will hold of on this one for now) and maybe a cpp:arg or cpp:argument.

The preprocessor warning does not see to be handled at all.

#warning "Do not use ABC, which is deprecated. Use XYZ instead."

region also may need additional markup.

#region a b

becomes

<cpp:region>#<cpp:directive>region</cpp:directive> a b</cpp:region>

error also may need additional markup. warning is pretty much the same thing and may require the same markup.

#error "message"

becomes

<cpp:error>#<cpp:directive>error</cpp:directive> "message"</cpp:error>

I will update this comment if I find more.

C++11 default and delete

C++11 added support for explicitly defaulted and deleted member functions.

SomeType() = default;
NonCopyable & operator=(const NonCopyable&) = delete;

C++11 nullptr pointer literal

C++11 added a constant null pointer literal 'nullptr' (keyword).

Recommend this is treated like true or false but with the correct type.

C sharp seg fault on

The following code seems to causing a segfault.

if(select){
PostConfig = PostConfig | ep;
}

using #line for position

A thought occurred to me while using srcML to work on srcML. I wanted to know where different calls were made to a function 'call'.

So, I used the xpath and turned on position. The problem is I want to know where it was in the original grammar file. antlr usually puts in #line to indicate the number. I think this might not be a bad option if we process the lines and use those as well for the positions.

For instance,

#line 171 "srcMLParser.g"

when parsed we update the current line count to 171 or something.

C++11 final and override

C++ 11 added support for final and override.

In the following examples they are wrapped with a name. Would specifier be better. It should be noted these are not keywords.

virtual void f() final;
virtual void some_func(int) override;

What is wrong is when they say on a struct.

struct Base1 final { };
<struct>struct <macro><name>Base1</name></macro> <name>final</name> <block>{<public type="default"> </public>}</block>;</struct>
</unit>

Mismark of MACRO union/class/struct

Occurrences of the following:

MACRO union {} a;
MACRO struct {} a;
MACRO class {} a;

is produces the following:

<decl_stmt><decl><type><name>MACRO</name></type> class 
<block>{}</block></decl></decl_stmt> <expr_stmt><expr>
<name>a</name></expr>;</expr_stmt>

That is, it is marking it up as a decl_stmt instead of a macro followed by a class/union/struct definition.

C++11 alignof() operator

C++11 adds the keyword alignof which according to cppreference.com

Returns alignment in bytes (an integer power of two) required for any instance of the given type, which is either complete type, an array type, or a reference type.

For example:

alignof(char)

Should this just be treated as a call/macro or should this receive special handling.

Do...while loops


/do ends up in the wrong place. It is placed inside of the if(!out) statement along with /block, /decl, and /decl_stmt tags that correspond to a macro (DBVT_PREFIX) that I think is improperly marked up as a decl_stmt.

DBVT_PREFIX
inline void btDbvt::collideKDOP(const btDbvtNode* root,
const btVector3* normals,
const btScalar* offsets,
int count,
DBVT_IPOLICY)
{
DBVT_CHECKTYPE
if(root)
{
const int inside=(1<<count)-1;
btAlignedObjectArray stack;
int signs[sizeof(unsigned)*8];
btAssert(count<int (sizeof(signs)/sizeof(signs[0])));
for(int i=0;i<count;++i)
{
signs[i]= ((normals[i].x()>=0)?1:0)+
((normals[i].y()>=0)?2:0)+
((normals[i].z()>=0)?4:0);
}
stack.reserve(SIMPLE_STACKSIZE);
stack.push_back(sStkNP(root,0));
do {
sStkNP se=stack[stack.size()-1];
bool out=false;
stack.pop_back();
for(int i=0,j=1;(!out)&&(i<count);++i,j<<=1)
{
if(0==(se.mask&j))
{
const int side=se.node->volume.Classify(normals[i],offsets[i],signs[i]);
switch(side)
{
case -1: out=true;break;
case +1: se.mask|=j;break;
}
}
}
if(!out)
{
if((se.mask!=inside)&&(se.node->isinternal()))
{
stack.push_back(sStkNP(se.node->childs[0],se.mask));
stack.push_back(sStkNP(se.node->childs[1],se.mask));
}
else
{
if(policy.AllLeaves(se.node)) enumLeaves(se.node,policy);
}
}
} while(stack.size());
}
}


/block for the do's block is closed too early and is closed within some /decl and /decl_stmt tags that correspond to the same macro in the above example

DBVT_PREFIX
inline void btDbvt::collideOCL( const btDbvtNode* root,
const btVector3* normals,
const btScalar* offsets,
const btVector3& sortaxis,
int count,
DBVT_IPOLICY,
bool fsort)
{
DBVT_CHECKTYPE
if(root)
{
const unsigned srtsgns=(sortaxis[0]>=0?1:0)+
(sortaxis[1]>=0?2:0)+
(sortaxis[2]>=0?4:0);
const int inside=(1<<count)-1;
btAlignedObjectArray stock;
btAlignedObjectArray ifree;
btAlignedObjectArray stack;
int signs[sizeof(unsigned)8];
btAssert(count<int (sizeof(signs)/sizeof(signs[0])));
for(int i=0;i<count;++i)
{
signs[i]= ((normals[i].x()>=0)?1:0)+
((normals[i].y()>=0)?2:0)+
((normals[i].z()>=0)?4:0);
}
stock.reserve(SIMPLE_STACKSIZE);
stack.reserve(SIMPLE_STACKSIZE);
ifree.reserve(SIMPLE_STACKSIZE);
stack.push_back(allocate(ifree,stock,sStkNPS(root,0,root->volume.ProjectMinimum(sortaxis,srtsgns))));
do {
const int id=stack[stack.size()-1];
sStkNPS se=stock[id];
stack.pop_back();ifree.push_back(id);
if(se.mask!=inside)
{
bool out=false;
for(int i=0,j=1;(!out)&&(i<count);++i,j<<=1)
{
if(0==(se.mask&j))
{
const int side=se.node->volume.Classify(normals[i],offsets[i],signs[i]);
switch(side)
{
case -1: out=true;break;
case +1: se.mask|=j;break;
}
}
}
if(out) continue;
}
if(policy.Descent(se.node))
{
if(se.node->isinternal())
{
const btDbvtNode
pns[]={ se.node->childs[0],se.node->childs[1]};
sStkNPS nes[]={ sStkNPS(pns[0],se.mask,pns[0]->volume.ProjectMinimum(sortaxis,srtsgns)),
sStkNPS(pns[1],se.mask,pns[1]->volume.ProjectMinimum(sortaxis,srtsgns))};
const int q=nes[0].value<nes[1].value?1:0;
int j=stack.size();
if(fsort&&(j>0))
{
/* Insert 0 */
j=nearest(&stack[0],&stock[0],nes[q].value,0,stack.size());
stack.push_back(0);

if DBVT_USE_MEMMOVE

                        memmove(&stack[j+1],&stack[j],sizeof(int)*(stack.size()-j-1));

else

                        for(int k=stack.size()-1;k>j;--k) stack[k]=stack[k-1];

endif

                        stack[j]=allocate(ifree,stock,nes[q]);
                        /* Insert 1 */ 
                        j=nearest(&stack[0],&stock[0],nes[1-q].value,j,stack.size());
                        stack.push_back(0);

if DBVT_USE_MEMMOVE

                        memmove(&stack[j+1],&stack[j],sizeof(int)*(stack.size()-j-1));

else

                        for(int k=stack.size()-1;k>j;--k) stack[k]=stack[k-1];

endif

                        stack[j]=allocate(ifree,stock,nes[1-q]);
                    }
                    else
                    {
                        stack.push_back(allocate(ifree,stock,nes[q]));
                        stack.push_back(allocate(ifree,stock,nes[1-q]));
                    }
                }
                else
                {
                    policy.Process(se.node,se.value);
                }
            }
        } while(stack.size());
    }

}

C++11 alignas()

alignas() is specified in C++11 as a specifier.

alignas(float) unsigned char c[sizeof(float)];

It is marked as a macro.

This is the suggested markup of a

<decl_stmt><decl><type><specifier>alignas<argument_list>(<argument>
float</argument>)</argument_list></specifier> <name>unsigned</name> 
<name>char</name></type> <name><name>c</name><index>[<expr>
<call><name>sizeof</name><argument_list>(<argument><expr><name>
float</name></expr></argument>)</argument_list></call></expr>]</index>
</name></decl>;</decl_stmt>

Where alignas(float) is marked as

<specifier>alignas<argument_list>(<argument>float</argument>)</argument_list></specifier>

try-with-resoruces statement

Java 1.7 introduced the try-with-resources-statement

This does not appear to be handled yet. For instance:

try (BufferedReader br = new BufferedReader(new FileReader(path))) {
return br.readLine();
}

Is an example of the new statement.

C++11 Lambda functions

[](int x, int y) { return x + y; }

Is not recognized as a function and the block ends early.

<expr_stmt><expr><index>[]</index>(<name>int</name> <name>x</name>, <name>int</name> <name>y</name>) <block>{ <return>return <expr><name>x</name> + <name>y</name></expr>;</return></block></expr></expr_stmt> }

C# Contextual linq keywords are being treated as link in other contexts

linq keywords such as select and where are being treated and marked up as linq queries in situations where they are identifiers.

for instance

select = a;
select +=;

The above problems have already been addressed. However, more variations may exist and a better way to solve it may exist.

svn_io in wrong directory

The files svn_io.{hpp,cpp} are currently in the parser directory.

They belong in the oldclient directory.

Note that input file handling is changing with the new client and library.

C++11 trailing-return

Trailing return types is not supported.

auto func_name(int x, int y) -> int;

Currently this is parsed as a decl_stmt.

C++ expressions in templates

It is valid in C++ (or at least this compiles without C++11 extensions) to have expressions within a template such as:

f<1<2>();
// or
f<(1>2)>(); // parenthesis are necessary because of >

C++11 removes the need to have a space between closing >> so this is valid:

std::vector<SomeType<(1>2)>> a;

This is treated as a declaration statement or a template. It seems to treat them as operators.

<expr_stmt><expr><name><name>std</name>::<name>vector</name>
</name>&lt;<name>SomeType</name>&lt;(1&gt;2)&gt;&gt; <name>a</name></expr>;</expr_stmt>

Java Annotations

Complete support for java annotations is not supported. At least this case (a custom annotation) is not supported.

  // Declares the annotation Twizzle.
  public @interface Twizzle {
  }

Underscores in Numeric Literals

Java 1.7 added support for underscores between digits in numeric literals.

For Instance:

int a = 1_0;

is valid.

and

int a = _10;
int a = 10_;

are invalid.

srcML seems to correctly parse valid literals, however both invalid are not marked as non-literals. The first is marked non-literal and the second literal.

Testing needs to be added, and these may need to be made consistent.

comments have no line number with --position

Comments are calling processText directly instead of num2process[2].

This means when --position is on they still do not get line number information. I will look for others as well and include any others a find in this issue.

Bit Fields.

To my knowledge there is no testing for bit fields.

unsigned int b : 3;

Right now this is marked up as a range. Is this correct?

<decl_stmt><decl><type><name>unsigned</name> <name>int</name>
</type> <name>b</name> :<range> <expr>3</expr></range>, 
<name>c</name> :<range> <expr>3</expr></range></decl>;</decl_stmt>
</unit>

C++11 using statement as type aliasing

C++ provided new use of using for type aliasing.

using FunctionType = void (*)(double);

Not sure if this should be considered an argument list or an init.

<using>using <name>FunctionType</name> =<init> <expr><call><name>void</name> <argument_list>(<argument><expr>*</expr></argument>)</argument_list></call>(<name>double</name>)</expr></init>;</using>

SAX2 Framework instead of text reader

Propose using the sax2 framework for the reading portion of libsrcml,
instead of text reader interface.

Use on large srcML files suggest that the sax2 is about 5 times faster.

The question is how to integrate the Framework code from the other repository. Should we copy it or should it be referenced as an external repo somehow.

C++ Alternative operator representations

In C++, and, or, xor, etc are keywords and are alternatives for &&, ||, ^, etc. In C they are macros.

Currently, the following marking occurs

<expr_stmt><expr>1 <name>and</name> 2</expr>;</expr_stmt>

where they should most likely be treated the same as if it were &&

<expr_stmt><expr>1 &amp;&amp; 2</expr>;</expr_stmt>
<expr_stmt><expr>1 and 2</expr>;</expr_stmt>
<expr_stmt><expr>1 <op:operator>and</op:operator> 2</expr>;</expr_stmt>

Issue with large complex for loop

The for loop gets a bit jumbled towards the end of the markup. Biggest issue is that the for tag is closed before all of its children.

for (endIslandIndex = startIslandIndex+1;(endIslandIndex<numElem) && (getUnionFind().getElement(endIslandIndex).m_id == islandId);endIslandIndex++)
{
}

C++11 attributes

C++11 added support for attributes.

int [[attr1]] i [[attr2, attr3]];

C++11 new string literals

The following are new string literals

u8"I'm a UTF-8 string."
u"This is a UTF-16 string."
U"This is a UTF-32 string."
R"(The String Data \ Stuff " )"

R can be combined with the previous types as well.

These should be parsed similarly to L"".

Catching multiple exceptions types

Java 1.7 added the catching of multiple exception types in the same catch block.

For instance.

catch (IOException|SQLException ex) {
    logger.log(ex);
    throw ex;
}

Currently it marked as one item.

<param><decl>IOException|SQLException ex</decl></param>

Seg fault on #endif //!

A seg fault seem to occur with the following code:

#endif //! 

And only when they are on the same line.

C++11 sizeof...(Args)

This is related to parameter packs. In C++11, sizeof...(Args) will query the number of elements in a parameter pack.

For example,

int a = sizeof...(Args);

srcML thinks this is a function_decl. This might be easier if sizeof were to be treated like alignof, that is, as a special operator with its own special markup instead of just being a call. However, then we might also want to consider the different casts (such as dynamic_cast and reinterpret_cast).

C++ Comma separated declarations.

Function declarations can be comma separated. They can also be mixed with other declarations

struct S {
    virtual int f(char) const, g(int); // declares two non-static member functions
    int f(), b; // declare a function that returns an int and an integer b.
};

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.