With the concept of leading comments in the UAST, the simple invariant of parent cover

It depends on the parsing mode you use. For <code class="notranslate

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Everybody wants the "correct" positions in Semantic mode. </blockquot

Parent covers children invariant is not respected about sdk HOT 7 OPEN

m09 commented on August 16, 2024

Parent covers children invariant is not respected

from sdk.

Comments (7)

dennwc commented on August 16, 2024 2

It depends on the parsing mode you use.

For Native and Annotated, the AST is not normalized, thus it preserves any form that the JS parser emits. According to your comment, it looks like JS parser does not preserve this invariant.

For Semantic mode this invariant usually is not respected either, but for a different reason. All languages have different AST shapes and the (normalized) UAST structure may break the invariant by moving nodes around.

But for your specific case, this issue is caused by bblfsh/javascript-driver#74 - the UAST pipeline does not process comments correctly, and they are preserved in the same places as in Native mode.

from sdk.

dennwc commented on August 16, 2024 1

Reopening and moving to SDK.

from sdk.

m09 commented on August 16, 2024

Thanks for the clarifications.

from sdk.

vmarkovtsev commented on August 16, 2024

@dennwc @creachadair I wouldn't really close this one. Although I do understand the implemented logic,

Somebody has to calculate the "correct" node spans and currently, it is us, the ML team.
This behavior is confusing a common user. Tested on all 7 of us, and additionally on Martin's research group in KTH Stockholm. Everybody wants the "correct" positions in Semantic mode.
Nothing prevents us from fixing the positions at the end of the normalization, e.g. by doing an extra pass over all the nodes. Scales linearly.

from sdk.

dennwc commented on August 16, 2024

@vmarkovtsev You have to realize that nodes have a totally different order in Semantic. You just can't have a single universal representation and still have the "correct" node hierarchy in regards to positions for all languages. At least with the tree structures.

It is possible though if we switch to graph representation (#339) because you will be able to jump from Semantic nodes to Native and get positions and the "correct" hierarchy.

@creachadair @bzz As I mentioned in last couple of months, the issues with the current representation (tree structure) start to actually matter. I think we should bump the priority for the transition to the new representation.

from sdk.

vmarkovtsev commented on August 16, 2024

Actually, we are likely to use the Annotated mode in our current analyzers, because we need to reconstruct the original token stream byte-to-byte.

from sdk.

dennwc commented on August 16, 2024

Everybody wants the "correct" positions in Semantic mode.

You mentioned Semantic, so I focused the answer on it.

Actually, we are likely to use the Annotated mode in our current analyzers, because we need to reconstruct the original token stream byte-to-byte.

Right, Annotated will work better for this use case, but again, it has a similar issue - some AST does not provide a "correct" hierarchy. This time we cannot fix it because we cannot modify the structure in this mode by definition.

We really need a way to link those trees (+ token stream) in an arbitrary way. I will dedicate some time this week to outline the proposal in regards to the new representation (graphs). It will solve all those issues.

from sdk.

Parent covers children invariant is not respected about sdk HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs