GithubHelp home page GithubHelp logo

tunnelvisionlabs / antlr4ts Goto Github PK

View Code? Open in Web Editor NEW
625.0 23.0 104.0 2.6 MB

Optimized TypeScript target for ANTLR 4

License: Other

TypeScript 92.70% Java 1.65% ANTLR 5.57% JavaScript 0.08%
typescript antlr4 antlr

antlr4ts's People

Contributors

amilner42 avatar brucificus avatar burtharris avatar dberlin avatar dependabot[bot] avatar houfu avatar mike-lischke avatar que3216 avatar sharwell avatar ydarias avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

antlr4ts's Issues

Consider using non-null arguments for VocabularyImpl constructor

The VocabularyImpl constructor in #34 was ported rather literally from the original Java code. Since we don't have to worry about legacy generated code in TypeScript (prior to the time when Vocabulary was introduced), it might not be necessary to support null values for these constructor arguments. After we have completed more of the port, we can examine all uses of VocabularyImpl in generated code and the runtime library to determine if one of the following approaches would be better.

  1. Using non-null arguments, and require all values be passed by the caller to the constructor
  2. Allowing one or more arguments to be omitted, but providing default values for the parameters so the resulting call is made with non-null arguments

๐Ÿ“ First mentioned by @BurtHarris in a comment in #34.

Document ATNConfigSet basics

The intent/behavior of the ATNConfigSet class is not obvious to new readers. Some suggestions for improvements were given in the code review for #108.

Optimize ATNConfig types for JavaScript memory behavior

The implementation of ATNConfig uses two primary strategies for optimizing memory for this heavily-allocated type:

  1. Derived types are used to avoid storing unnecessary data where possible.
    • Most ATNConfig instances use SemanticContext.NONE as the semantic context, so storage is not provided for anything else in the base type.
    • Only ATNConfig instances in the lexer need to store a LexerActionExecutor, so storage is not provided for it in the base type (many more instances are allocated by the parser than the lexer).
  2. Bit fields are used to store three base properties in a single number.

The design decisions for optimizing memory are currently interleaved with design decisions for algorithmic correctness without explanation. See comments in the code review for #108 for several specific requests related to improving clarity.

๐Ÿ“ This issue specifically relates to clarity and efficiency with respect to the performance characteristics of ATNConfig. A separate issue will be filed for questions related to the correctness characteristics of ATNConfig.

ParseTreePatternMatcher matchImpl

Seems to be an odd condition in the Java reference implementation of ParseTreePatternMatcher I noticed while porting. In that file, there are four if statements like this:

                if (mismatchedNode == null) {
                    mismatchedNode = ...;
                }

What's odd is that prior to testing it for null, no code ever assigns it a non-null value. Is this a bug?

Prepare for/export to npmjs.com

There are some change we should probably make in the project.json file so that npm will know about our package. Once that is done, I think we'll be able to use npm link to simplify referring to the package for debugging, samples, etc. And the import directives can avoid some or all of the ..\ prefixes.

I've never done this before, so learning curve is probably about the same for either of us.

Consider adding a default hashCode() and equals() to Object.prototype

As I understand it, in JavaScript it's possible to actually add methods to the root Object type's prototype!

We might consider writing a module that adds default implementations of hashCode() and equals(other: any) to Object, and optimized ones to String. At a minimum this might make it easier to write test-cases for the collection classes, possibly eliminate the need for the HashedValueType interface outside those collections.

Because reflection is built-in, the central version might do for many low-frequency cases, with per-class implementations of those methods considered an optimization.

Your thoughts?

Add leading underscore to private/protected members

After the sources are ported, it would be a good idea for us to go through all the non-public properties and methods, adding a leading underscore to the names. This is a convention in JavaScript indicating a member is a implementation detail. Might be good to do for some public variables that are only public due to lack of friend or nested class escapes to private/protected.

Document ATNConfig basics

The intent/behavior of the ATNConfig class is not obvious to new readers. Many suggestions for improvements were given in the code review for #108.

Document ATNConfig.toDotString

Not all users are familiar with GraphViz, or that "dot" is a term associated with that application. This method should be properly explained in a documentation comment.

๐Ÿ“ This was requested in #108.

Create constants for Array2DHashSet<T>

The implementation of Array2DHashSet<T> contains the following values which are intended to be constants. We should investigate if there is a better way to express this intent, since the properties in their current form are technically mutable.

private static INITAL_CAPACITY: number =  16; // must be power of 2
private static INITAL_BUCKET_CAPACITY: number =  8;
private static LOAD_FACTOR: number =  0.75;

Line ending question

I noticed something early in about line ending normalization. I understand the different conventions, but not how you (and git) want to deal with it.

I have seen messages like this, which I don't understand:

C:\code\antlr4ts>git add package.json
warning: LF will be replaced by CRLF in package.json.
The file will have its original line endings in your working directory.

I think the npm tool created the file. Is there anything I should do differently?

Don't block the event loop...

In JavaScript asynchronous I/O is the rule, rather than the exception. This is important in many contexts because JavaScript generally runs in a single-threaded environment with an event loop. To remain responsive, that event loop should not be blocked while waiting for input.

IMHO its too much work to recast ANTLR's API to allow it to do asynchronous I/O in JavaScript environments. To remain compliant with JavaScript environment norms, it probably make sense to avoid doing any I/O in the antlr4ts runtime , and expect callers deal with I/O.

Rather than follow the Java this pattern, I suggest that the only implementation of IntStream/CharStream we should provide by default would be a simple wrapper over a string containing the entire input stream!. That way we can move any blocking I/O operations to the tests, and/or testrig.

Background: The Java Model (with threads) isn't the JavaScript way.

The JavaScript asynchronous I/O model is different from the assumptions of design of ANTLR's Java runtime, in particular UnbufferedCharStream which uses blocking read operations so that it can maintain only a small portion of a file's stream in memory at a time. Directly porting this model to JavaScript would lead to code that is wrong by JavaScript and Node.js standards. See for example http://www.nodewiz.biz/your-doing-node-js-wrong-avoid-synchronous-code/.

The current JavaScript runtime blocks in a constructor...

The current JavaScript runtime for ANTLR4 deals with this in a file named FileStream.js, who's constructor function accepts a file name to read. In the constructor, it uses one of the Node.JS rare synchronous I/O calls, fs.readFileSync(), operation to read the entire file into memory as a string, then passing this data to its superclass (InputStream) constructor which proceeds to copy the string into an array of character codes. It should be noted that `fs.readFileSystem()' is generally only available in NodeJS, and not in browser environments.

Type derived from Array2DHashSet generates runtime error

On branch ex, the purpose was to convert the exception types, and I added a few stub classes to misc/Stubs.ts. One of those stubs, ATNConfigSet, is generating a runtime error when imported, but I haven't figured out why... Now that I think about it, perhaps this is one of those circular-dependency problems.

Here's the message I get:

PS C:\try\antlr4ts\src\atn> npm test

> [email protected] test C:\try\antlr4ts
> tsc & mocha


C:\try\antlr4ts\src\misc\Stubs.ts:197
export class ATNConfigSet extends Array2DHashSet<ATNConfig> {
                                  ^
TypeError: Class extends value undefined is not a function or null
    at Object.<anonymous> (C:\try\antlr4ts\src\misc\Stubs.ts:197:35)
    at Module._compile (module.js:556:32)
    at Object.Module._extensions..js (module.js:565:10)
    at Module.load (module.js:473:32)
    at tryModuleLoad (module.js:432:12)
    at Function.Module._load (module.js:424:3)
    at Module.require (module.js:483:17)
    at require (internal/module.js:20:19)
    at Object.<anonymous> (C:\try\antlr4ts\src\misc\DefaultEqualityComparator.ts:32:1)
    at Module._compile (module.js:556:32)
    at Object.Module._extensions..js (module.js:565:10)
    at Module.load (module.js:473:32)
    at tryModuleLoad (module.js:432:12)
    at Function.Module._load (module.js:424:3)
    at Module.require (module.js:483:17)
    at require (internal/module.js:20:19)
    at Object.<anonymous> (C:\try\antlr4ts\src\misc\Array2DHashSet.ts:35:1)
    at Module._compile (module.js:556:32)
    at Object.Module._extensions..js (module.js:565:10)
    at Module.load (module.js:473:32)
    at tryModuleLoad (module.js:432:12)
    at Function.Module._load (module.js:424:3)
    at Module.require (module.js:483:17)
    at require (internal/module.js:20:19)
    at Object.<anonymous> (C:\try\antlr4ts\test\TestArray2DHashSet.ts:3:1)
    at Module._compile (module.js:556:32)
    at Object.Module._extensions..js (module.js:565:10)
    at Module.load (module.js:473:32)
    at tryModuleLoad (module.js:432:12)
    at Function.Module._load (module.js:424:3)
    at Module.require (module.js:483:17)
    at require (internal/module.js:20:19)
    at C:\try\antlr4ts\node_modules\mocha\lib\mocha.js:220:27
    at Array.forEach (native)
    at Mocha.loadFiles (C:\try\antlr4ts\node_modules\mocha\lib\mocha.js:217:14)
    at Mocha.run (C:\try\antlr4ts\node_modules\mocha\lib\mocha.js:485:10)
    at Object.<anonymous> (C:\try\antlr4ts\node_modules\mocha\bin\_mocha:457:18)
    at Module._compile (module.js:556:32)
    at Object.Module._extensions..js (module.js:565:10)
    at Module.load (module.js:473:32)
    at tryModuleLoad (module.js:432:12)
    at Function.Module._load (module.js:424:3)
    at Module.runMain (module.js:590:10)
    at run (bootstrap_node.js:394:7)
    at startup (bootstrap_node.js:149:9)
    at bootstrap_node.js:509:3
npm ERR! Test failed.  See above for more details.

Consider removing AbstractEqualityComparator<T>

This class may not be useful in the TypeScript target. See comments in #6.

๐Ÿ“ This, along with other items labeled ts-flavor, will be addressed after the port is initially complete and working.

Fix doc comments re null.

There number of places where where changes in a return type were made allowing undefined in the return type, but the comments still talk about returning null.

For example: Vocabulary.ts line 82-85:

     * @return The string literal associated with the specified token type, or
     * {@code null} if no string literal is associated with the type.
     */
    getLiteralName(tokenType: number): string | undefined;

After the port work is more stable, let's review and decide if we should update the comments or declarations.

Add tryGet* methods

Several methods which return null in the Java code were updated to validate their arguments and throw exceptions if information is missing. Methods should be added for these which do not throw, but instead return undefined. These methods include:

  • RuleNode.tryGetChild (non-throwing form of RuleNode.getChild)
  • ParserRuleContext.tryGetToken (non-throwing form of ParserRuleContext.getToken)
  • ParserRuleContext.tryGetRuleContext (non-throwing form of ParserRuleContext.getRuleContext)

Document the purpose of the DFASerializer class

It appears that DFASerializer and LexerDFASerializer may not be used for a critical purpose in the core of the algorithm, but instead used for emitting debug/diagnostic information. This should be reviewed and noted in the documentation for these types, since it would be a notable contrast especially from the ATNDeserializer type.

๐Ÿ“ See comments in the review of #107.

Add test code coverage tool

There is one called Istanbul that seems to be used a lot for javascript projects, but I have no particular experience with it. Sam are you interested in tackling this?

Consider changing DFA.states to Array2DHashSet

It seems that Array2DHashSet.putIfAbsent provides the necessary functionality for the DFA.states property. Since it's a lighter-weight type than Array2DHashMap, we should using it instead. See comment from @BurtHarris in #107.

๐Ÿ“ The comment for this property needs to be updated to reflect the actual final behavior as implemented in TypeScript.

Verify Lexer.MAX_CHAR_VALUE after Lexer is ported

The Java version relies on Character.MAX_VALUE instead of Lexer.MAX_CHAR_VALUE to establish the bounds on transitions in LexerATNSimulator. The TypeScript target uses Lexer.MAX_CHAR_VALUE instead. After the port is complete, verify that Lexer.MAX_CHAR_VALUE in the TypeScript target is set to 0xFFFF, which is higher than the value seen in the Java target.

Evaluate performance of MurmurHash.update

Currently this method operates on both primitives and objects. Since it is a performance-critical method, we should evaluate if it would improve performance to create separate methods for these two cases and remove the dynamic type checking from the one operating on number.

Found during review of #6.

In ATNConfigSet, is mergedConfigs being a JavaScript Map OK?

I noticed that in the member mergedConfigs is declared a Map with { state: number, alt: number } as a key. JavaScript maps are by identity for object as keys, so this seems suspect. Should this be a Array2DHashMap with a special EqualityComparer?

Consider using user-defined type guards to remove casts in DFA

Several expressions in the implementation of DFA require explicit casts to DFAState. Many of these casts are conditioned on the result of isPrecedenceDfa, which means it may be possible to use user-defined type guards to eliminate the need to cast.

๐Ÿ“ See comments in #107.

๐Ÿ“ A similar situation occurs with DFAState.isContextSensitive(). If they are not reviewed at the same time, a separate issue should be opened to examine the use of user-defined type guards for the DFAState case.

Selecting a test framework...

I don't have a lot of experience with JavaScript test frameworks, so after reviewing the existing Java tests so I asked for help on stack overflow.

The only answer received so far (from Basarat who has very strong reputation) said that Mocha is the current King of JavaScript Test Frameworks. He cited this as evidence, which looks pretty significant.

Mocha has a lot of features we probably won't need (e.g. async testing), but it looks to me like they won't get in the way. But Mocha seems very adaptable. The default API style quite different from JUnit, but I have seen some projects that are trying to put a decorator-based model on it, like mocha-typescript.

P.S. Basarat also has an interesting-looking IDE project at: http://alm.tools, and TypeScript deep dive e-book at http://basarat.gitbooks.io/typescript/, seems like both are review-worthy.

Document minDfaEdge and maxDfaEdge

These constants control the selection of EdgeMap implementations to minimize the overall amount of memory required to store the DFA. In addition to documenting them, the specific heuristic used to choose between SparseEdgeMap and ArrayEdgeMap likely needs to be updated for TypeScript, since JavaScript implementations use a different underlying data structure than the JVM for the type T[].

In addition to minDfaEdge and maxDfaEdge, there are also two constants (0 and 200) used when creating initial states for a precedence DFA. The meaning of these constants should be described in code.

๐Ÿ“ See also comment(s) by @BurtHarris in #107.

Check the intent of this assert is implemented

AbstractEdgeMap.ts, line 44 the assert and comment seem mismatched, given the semantics of JavaScript's number type...

        // the allowed range (with minIndex and maxIndex inclusive) should be less than 2^32
        assert(maxIndex - minIndex + 1 >= 0);

TsTool could generate `.js` + `.d.ts` files rather than `.ts`

For later consideration...

We could have the tool's output already in JavaScript, while simultaneously emitting a corresponding .d.ts file that declares type information. This would avoid making a consuming package dependent on TypeScript at build time. Seems like a nice-to-have.

Implementation-wise in the tool, .d.ts files may be considered similar to the "C/C++" concept of a .h file. It seems like that target is getting close to integration, so we should probably defer this.

Broken test discovery VS Test Explorer and mocha-typescript test

Visual studio gets confused when more than one file with these tests is present with the "Mocha" test framework property. Furthermore, it gets confused and associated the tests with the generated .js file, rather than the .ts source.

I can take another look at this once we get things better merged. For now I suggest we use the command-line tests, via npm test.

Document ATNConfig.contains

This method is currently undocumented.

โš ๏ธ This is a priority documentation item since this method is part of the ATNConfigSet reduction and tighter SLL termination condition implemented by the optimized fork of ANTLR 4.

๐Ÿ“ This was originally requested in #108.

ANTLR tool code generation for TypeScript

To wrap this up, we'll need to provide a new code generator for the tool. I was thinking that it should probably be usable for JavaScript tools without a TypeScript dependency. To do this the tool should emit .js files, with added .d.ts files to enable TypeScript users to benefit from the typings.

This probably belongs in a later milestone.

Fix copyright headers and notice prior to public repository

It would be preferable from a maintenance standpoint to continue with a copyright assignment plus a list of contributors. In other words, the copyright notice in each file should be the same. Due to @BurtHarris' contributions to the derived work (the TypeScript target), it makes sense to me to have three names listed for each file.

  • Terence Parr
  • Sam Harwell
  • Burt Harris

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.