Comments (7)
It depends on the parsing mode you use.
For Native
and Annotated
, the AST is not normalized, thus it preserves any form that the JS parser emits. According to your comment, it looks like JS parser does not preserve this invariant.
For Semantic
mode this invariant usually is not respected either, but for a different reason. All languages have different AST shapes and the (normalized) UAST structure may break the invariant by moving nodes around.
But for your specific case, this issue is caused by bblfsh/javascript-driver#74 - the UAST pipeline does not process comments correctly, and they are preserved in the same places as in Native
mode.
from sdk.
Reopening and moving to SDK.
from sdk.
Thanks for the clarifications.
from sdk.
@dennwc @creachadair I wouldn't really close this one. Although I do understand the implemented logic,
- Somebody has to calculate the "correct" node spans and currently, it is us, the ML team.
- This behavior is confusing a common user. Tested on all 7 of us, and additionally on Martin's research group in KTH Stockholm. Everybody wants the "correct" positions in Semantic mode.
- Nothing prevents us from fixing the positions at the end of the normalization, e.g. by doing an extra pass over all the nodes. Scales linearly.
from sdk.
@vmarkovtsev You have to realize that nodes have a totally different order in Semantic. You just can't have a single universal representation and still have the "correct" node hierarchy in regards to positions for all languages. At least with the tree structures.
It is possible though if we switch to graph representation (#339) because you will be able to jump from Semantic nodes to Native and get positions and the "correct" hierarchy.
@creachadair @bzz As I mentioned in last couple of months, the issues with the current representation (tree structure) start to actually matter. I think we should bump the priority for the transition to the new representation.
from sdk.
Actually, we are likely to use the Annotated
mode in our current analyzers, because we need to reconstruct the original token stream byte-to-byte.
from sdk.
Everybody wants the "correct" positions in Semantic mode.
You mentioned Semantic, so I focused the answer on it.
Actually, we are likely to use the Annotated mode in our current analyzers, because we need to reconstruct the original token stream byte-to-byte.
Right, Annotated will work better for this use case, but again, it has a similar issue - some AST does not provide a "correct" hierarchy. This time we cannot fix it because we cannot modify the structure in this mode by definition.
We really need a way to link those trees (+ token stream) in an arbitrary way. I will dedicate some time this week to outline the proposal in regards to the new representation (graphs). It will solve all those issues.
from sdk.
Related Issues (20)
- driver failure: native driver is not running HOT 6
- drivers output on debug level does not include requests
- schema: Add string interpolation UAST type HOT 1
- Grpc: received message larger than max (5797446 vs. 4194304) HOT 1
- Clarify backward compatibility rules for uast packages
- Parent covers children invariant is not respected HOT 2
- uast.TokenOf returns unexpected results for position nodes
- Support persistent UAST hashes HOT 1
- Strange requests during driver discovery tests HOT 1
- Incorrect check for invalid positions HOT 1
- .filter() - Invalid memory address or nil pointer dereference HOT 5
- Move protocol to a separate package
- ChildrenOrder unexpected behaviour HOT 2
- Feature request: UAST diffing HOT 6
- Fix xpath library wrong query failures in case we continue supporting it
- Update drivers template with new performance tests
- Roles are sorted in encoding step but not in decoding one
- Comments including utf8 characters are not parsed correctly HOT 8
- Automation: driver releases
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sdk.