GithubHelp home page GithubHelp logo

Comments (6)

bzz avatar bzz commented on August 23, 2024 2

@bzz Babelfish defines positions as byte offsets, and it's consistent with how JS does this, according to the bug report:

In JS, the position is returned based on the offset in the bytes content

I'm playing the devil's advocate here, so bare with me for a while pease - my question was more like: how is the user supposed to figure this one out, without asking? :)

I know the answer, https://doc.bblf.sh/uast/semantic-uast.html#position but I also think it may be a signal for us, that it's something not clear/under-communicated to our users (even given the fact of bblfsh/libuast#102)

I also presume that this issue could be an opportunity to verify and post results of this behavior across all the drivers - who knows, may be as with string and number literals before, it could be treated differently in more different drivers. WDYT?

from java-driver.

dennwc avatar dennwc commented on August 23, 2024 2

@m09 I'm working on the conversion function that returns UTF-8 offsets for the Python client, but it may take some time.

@bzz You are right, the docs should give a bit more insight into how positions work. Also, it's definitely worth adding a test for UTF-8 offsets. I remember doing a pass over some drivers, but maybe I missed Java that time.

from java-driver.

dennwc avatar dennwc commented on August 23, 2024

@m09 Thanks for reporting it! It is indeed a bug in the Java driver. Moving to that repository.

from java-driver.

bzz avatar bzz commented on August 23, 2024

no variation between an ascii string and a more complex utf8 string

It is indeed a bug in the Java driver.

@dennwc Would you be so kind to elaborate, why do you think it's not a bug in JS driver instead?

from java-driver.

dennwc avatar dennwc commented on August 23, 2024

@bzz Babelfish defines positions as byte offsets, and it's consistent with how JS does this, according to the bug report:

In JS, the position is returned based on the offset in the bytes content

from java-driver.

m09 avatar m09 commented on August 23, 2024

I should add that we maintain custom logic to do the conversion (since the bytes offset is rarely what we want) in several projects right now. I think an option to specify whether we want utf8 or bytes position would be appreciated by the ML team (and utf8 default would be the handiest for us).

from java-driver.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.