GithubHelp home page GithubHelp logo

Comments (7)

dennwc avatar dennwc commented on August 16, 2024 2

It depends on the parsing mode you use.

For Native and Annotated, the AST is not normalized, thus it preserves any form that the JS parser emits. According to your comment, it looks like JS parser does not preserve this invariant.

For Semantic mode this invariant usually is not respected either, but for a different reason. All languages have different AST shapes and the (normalized) UAST structure may break the invariant by moving nodes around.

But for your specific case, this issue is caused by bblfsh/javascript-driver#74 - the UAST pipeline does not process comments correctly, and they are preserved in the same places as in Native mode.

from sdk.

dennwc avatar dennwc commented on August 16, 2024 1

Reopening and moving to SDK.

from sdk.

m09 avatar m09 commented on August 16, 2024

Thanks for the clarifications.

from sdk.

vmarkovtsev avatar vmarkovtsev commented on August 16, 2024

@dennwc @creachadair I wouldn't really close this one. Although I do understand the implemented logic,

  1. Somebody has to calculate the "correct" node spans and currently, it is us, the ML team.
  2. This behavior is confusing a common user. Tested on all 7 of us, and additionally on Martin's research group in KTH Stockholm. Everybody wants the "correct" positions in Semantic mode.
  3. Nothing prevents us from fixing the positions at the end of the normalization, e.g. by doing an extra pass over all the nodes. Scales linearly.

from sdk.

dennwc avatar dennwc commented on August 16, 2024

@vmarkovtsev You have to realize that nodes have a totally different order in Semantic. You just can't have a single universal representation and still have the "correct" node hierarchy in regards to positions for all languages. At least with the tree structures.

It is possible though if we switch to graph representation (#339) because you will be able to jump from Semantic nodes to Native and get positions and the "correct" hierarchy.

@creachadair @bzz As I mentioned in last couple of months, the issues with the current representation (tree structure) start to actually matter. I think we should bump the priority for the transition to the new representation.

from sdk.

vmarkovtsev avatar vmarkovtsev commented on August 16, 2024

Actually, we are likely to use the Annotated mode in our current analyzers, because we need to reconstruct the original token stream byte-to-byte.

from sdk.

dennwc avatar dennwc commented on August 16, 2024

Everybody wants the "correct" positions in Semantic mode.

You mentioned Semantic, so I focused the answer on it.

Actually, we are likely to use the Annotated mode in our current analyzers, because we need to reconstruct the original token stream byte-to-byte.

Right, Annotated will work better for this use case, but again, it has a similar issue - some AST does not provide a "correct" hierarchy. This time we cannot fix it because we cannot modify the structure in this mode by definition.

We really need a way to link those trees (+ token stream) in an arbitrary way. I will dedicate some time this week to outline the proposal in regards to the new representation (graphs). It will solve all those issues.

from sdk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.