GithubHelp home page GithubHelp logo

Comments (7)

RXminuS avatar RXminuS commented on August 17, 2024 2

It's not actively maintained and you need to do some hacky things such as replacing script/style/no script content otherwise the ranges will be off since it still matches on those tokens inside (e.g. no state switching)

from html5ever.

jdm avatar jdm commented on August 17, 2024

Changes in line numbers are available to client code in the tree builder (

/// Called whenever the line number changes.
fn set_current_line(&mut self, _line_number: u64) {}
). We didn't have a reason to expose column number data in Servo so far, so we didn't both looking into it.

from html5ever.

jdm avatar jdm commented on August 17, 2024

Simiarly, the tokenizer receives a line number with each token:

/// Process a token.
fn process_token(&mut self, token: Token, line_number: u64) -> TokenSinkResult<Self::Handle>;

from html5ever.

hoijui avatar hoijui commented on August 17, 2024

thank you @jdm ! :-)
I am working on some code that checks links in documents, and tells the user which ones are valid and which not (anymore). For this, I have to be able to tell the user where exactly these links are in the document, so they can fix them.
I am currently using some very shady, ueber-simple, self-made HTML parser, because none of the libraries for HTML parsing seem to supply line&column info. I understand, it makes no sense to track these for each little detail, in 99% of use-cases for these libraries, so I am not suggesting to add this. Would be glad for some hints about how to go about this.
Will I need to maintain a fork of one of these libraries (eg. html5ever)?

from html5ever.

RXminuS avatar RXminuS commented on August 17, 2024

Yeah the line number on its own is kind of useless for certain applications. For my own project I'm having to resort to https://github.com/y21 just to get the exact byte positions of each DOM node.

Positions for DOM nodes were also recently added to JSoup and also seems available in HTML parsers in other major languages, so I think it would make sense if we could figure out a way for html5ever to provide the same. Also there's been several issues over the years asking for similar features.

One thing that I was trying to make work but couldn't quite yet is to provide a byte stream that I can read the offset from as tokens are emitted from html5ever, however since tokens are actually consumed ahead of time it doesn't quite give the right positions. This could maybe be fixed by providing something that's Peekable, but tbh. I didn't really like the direction anyways.

Are there any better ideas of how this could potentially be added in such a way that it's an opt-in performance penalty?

from html5ever.

hoijui avatar hoijui commented on August 17, 2024

hey @RXminuS :-)
... you resorted to https://github.com/y21/tl?
why is it not optimal?

from html5ever.

domenic avatar domenic commented on August 17, 2024

For anyone else running into this problem, in whatwg/html-build#291 I'm creating a RcDomWithLineNumbers which overrides the two methods necessary to at least track line numbers in the errors recorded. I'm very much a Rust beginner so it's just kind of been a process of flailing around until I got something working, and the fact that Rust makes you delegate all methods of TreeSink just to override set_current_line (to record the current line) and parse_error (to augment the recorded error with the current line) seems bonkers. But it seems to work so far.

Column numbers, of course, are not so easy.

from html5ever.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.