rusterlium / html5ever_elixir Goto Github PK
View Code? Open in Web Editor NEWNIF wrapper of html5ever using Rustler
Home Page: https://hexdocs.pm/html5ever
License: Apache License 2.0
NIF wrapper of html5ever using Rustler
Home Page: https://hexdocs.pm/html5ever
License: Apache License 2.0
Currently html5ever_elixir is requiring rustler 0.10.*, which is too old for OTP20.2.
Installing html5ever {0.7.0} causes mix to not compile
`
Is the erlang_nif-sys version up to date in the Cargo.toml?
Does 'cargo update' fix it?
If not please report at https://github.com/goertzenator/erlang_nif-sys.
--- stderr
thread 'main' panicked at 'gen_api.erl encountered an error.', /home/parsingpeppers/.cargo/registry/src/github.com-1ecc6299db9ec823/erlang_nif-sys-0.6.4/build.rs:28:13
note: run with RUST_BACKTRACE=1
environment variable to display a backtrace.
warning: build failed, waiting for other jobs to finish...
error: build failed
could not compile dependency :html5ever, "mix compile" failed. You can recompile this dependency with "mix deps.compile html5ever", update it with "mix deps.update html5ever" or clean it with "mix deps.clean html5ever"
** (RuntimeError) Rust NIF compile error (rustc exit code 101)
lib/mix/tasks/compile.rustler.ex:60: Mix.Tasks.Compile.Rustler.compile_crate/1
(elixir) lib/enum.ex:1336: Enum."-map/2-lists^map/1-0-"/2
lib/mix/tasks/compile.rustler.ex:14: Mix.Tasks.Compile.Rustler.run/1
(mix) lib/mix/task.ex:331: Mix.Task.run_task/3
(mix) lib/mix/tasks/compile.all.ex:73: Mix.Tasks.Compile.All.run_compiler/2
(mix) lib/mix/tasks/compile.all.ex:53: Mix.Tasks.Compile.All.do_compile/4
(mix) lib/mix/tasks/compile.all.ex:24: anonymous fn/1 in Mix.Tasks.Compile.All.run/1
(mix) lib/mix/tasks/compile.all.ex:40: Mix.Tasks.Compile.All.with_logger_app/1
`
Generated rustler app
==> html5ever
Compiling NIF crate :html5ever_nif (native/html5ever_nif)...
Compiling erlang_nif-sys v0.6.4
Compiling rustler_codegen v0.18.0
Compiling syn v0.15.22
Compiling tendril v0.4.1
error: failed to run custom build command for `erlang_nif-sys v0.6.4`
process didn't exit successfully: `/home/user/app/_build/dev/rustler_crates/html5ever_nif/release/build/erlang_nif-sys-d37b2e3dcb9ae709/build-script-build` (exit code: 101)
--- stdout
Unsupported Erlang version.
Thanks!
Example:
iex(4)> Enum.reduce(1..4000, "", fn _,acc -> "<div>" <> acc end) |> Html5ever.parse()
Segmentation fault
While I doubt any HTML document really needs 4k nested tags, this could be abused by attackers if the library is used to parse user-generated content.
I am not too familiar with Rust, but I am pretty sure you are hitting recursion depth limit when transforming the parsed tree to erlang terms.
Tried forcing Rustler to 0.18 to no avail, same error.
html5ever v0.16.0 changes a bunch of things and will probably break the nif, so you may want to lock down the html5ever version in your Cargo.toml until you fix things.
Hi, I am not able to use html5ever, apparently because of a dependency issue. The latest release, 1.14.0 can only work with rustler_precompiled ~> 0.5.2
, but I'm using another package, mjml_eex, that already depends on rustler_precompiled ~> 0.6.0
. In the latest master
branch of html5ever, it now works with rustler_precompiled ~> 0.6.0
. I configured the html5ever dependency in my mix.exs to point to master
branch in the git repo, but when I deployed my app, it wouldn't use a precompiled NIF. Instead, it tried to compile the project with cargo
. I then installed cargo
, but it couldn't find the command.
Is there a step missing that normally runs when a release of html5ever is created? Is that why no precompiled NIF was found when using the master
branch?
I encounter a panic on some html:
thread '<unnamed>' panicked at src/flat_dom.rs:218:9:
Templates not supported
I understand the proper support is hard; is is possible to gracefully degrade by ignoring or emitting raw template tags?
In general, I think panicking should be avoided in a parser.
html5ever_elixir
is impossible to build in a sandbox because of an unpinned build-time dependency fetch. Could you lock Cargo.toml
?
I wonder if there is an easy way to extract comments embedded inside an HTML document.
I tried using html5ever with Floki and using the default parser comments are present in the parsed document as
{:comment, "My Comment"}
but when I switch the parser to html5ever they are just stripped. This can also be verified running:
html = """
<html><title>Some Title</title><body><!-- some comment --></body></html>
"""
Floki.parse_document(html)
|> IO.inspect()
Floki.parse_document(html, html_parser: Floki.HTMLParser.Html5ever)
|> IO.inspect()
that results in this output:
{:ok,
[
{"html", [],
[{"title", [], ["Some Title"]}, {"body", [], [comment: " some comment "]}]}
]}
{:ok,
[
{"html", [],
[{"head", [], [{"title", [], ["Some Title"]}]}, {"body", [], ["\n"]}]}
]}
Parsing pages not written in UTF-8 currently produces errors:
> %HTTPoison.Response{body: body} = HTTPoison.get!("http://manybooks.net/index.xml")
> Html5ever.parse(body)
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 4070 }', src/libcore/result.rs:859
note: Run with `RUST_BACKTRACE=1` for a backtrace.
{:error, "called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 4070 }"}
In this case this XML feed has the encoding in the xml preeamble:
<?xml version="1.0" encoding="iso-8859-1"?>
<rss version="2.0">
...
Can I get around this problem or can the library be fixed to handle this situation?
When parsing large (314kB) HTML to create a Meeseeks.Document
, Html5ever.Native.parse_sync
runs 2x faster than Html5ever.Native.parse_async
.
Example project: https://github.com/mischov/meeseeks_html5ever_parse
Example output:
$ MIX_ENV=prod mix run -e MeeseeksHtml5everParse.run
Running tests...
Parsed with Html5ever async in 17250.7 us
Parsed with Html5ever sync in 18877.3 us
Created Meeseeks Document from tuples in 6883.2 us
Parsed with Meeseeks async in 66956.9 us
Parsed with Meeseeks sync in 33076.9 us
Edit: I'm running Erlang/OTP 19.
I'm on Mac OS X El Capitan (10.11.6). Elixir version
Erlang/OTP 19 [erts-8.2] [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false] [dtrace]
Elixir 1.4.1
==> html5ever
Compiling NIF crate :html5ever_nif (native/html5ever_nif)...
could not compile dependency :html5ever, "mix compile" failed. You can recompile this dependency with "mix deps.compile html5ever", update it with "mix deps.update html5ever" or clean it with "mix deps.clean html5ever"
** (ErlangError) erlang error: :enoent
(elixir) lib/system.ex:564: System.cmd("cargo", ["rustc", "--no-default-features", "--release", "--", "--codegen", "link-args=-flat_namespace -undefined suppress"], [cd: "/Users/jonathanlin/Documents/blitz/blitz-cms/deps/html5ever/native/html5ever_nif", stderr_to_stdout: true, env: [{"CARGO_TARGET_DIR", "/Users/jonathanlin/Documents/blitz/blitz-cms/_build/dev/rustler_crates/html5ever_nif"}], into: %IO.Stream{device: :standard_io, line_or_bytes: :line, raw: false}])
lib/mix/tasks/compile.rustler.ex:49: Mix.Tasks.Compile.Rustler.compile_crate/1
(elixir) lib/enum.ex:1229: Enum."-map/2-lists^map/1-0-"/2
lib/mix/tasks/compile.rustler.ex:12: Mix.Tasks.Compile.Rustler.run/1
(mix) lib/mix/task.ex:294: Mix.Task.run_task/3
(elixir) lib/enum.ex:1229: Enum."-map/2-lists^map/1-0-"/2
(mix) lib/mix/tasks/compile.all.ex:19: anonymous fn/1 in Mix.Tasks.Compile.All.run/1
(mix) lib/mix/tasks/compile.all.ex:37: Mix.Tasks.Compile.All.with_logger_app/1
Running the latest version of html5ever (0.4.0). Upgraded to erlang 20 and got a compilation error on ubuntu 16.04. Stack trace from running mix deps.compile html5ever
:
Compiling NIF crate :html5ever_nif (native/html5ever_nif)...
Compiling erlang_nif-sys v0.6.1
Compiling rustler_codegen v0.14.0
Compiling html5ever v0.16.0
Compiling string_cache v0.5.0
error: failed to run custom build command for `erlang_nif-sys v0.6.1`
process didn't exit successfully: `/root/m_c_a/_build/dev/rustler_crates/html5ever_nif/release/build/erlang_nif-sys-ae2db8d6f62d8a63/build-script-build` (exit code: 101)
--- stdout
Unsupported Erlang version.
Is the erlang_nif-sys version up to date in the Cargo.toml?
Does 'cargo update' fix it?
If not please report at https://github.com/goertzenator/erlang_nif-sys.
--- stderr
thread 'main' panicked at 'gen_api.erl encountered an error.', /root/.cargo/registry/src/github.com-1ecc6299db9ec823/erlang_nif-sys-0.6.1/build.rs:28
note: Run with `RUST_BACKTRACE=1` for a backtrace.
Build failed, waiting for other jobs to finish...
error: build failed
could not compile dependency :html5ever, "mix compile" failed. You can recompile this dependency with "mix deps.compile html5ever", update it with "mix deps.update html5ever" or clean it with "mix deps.clean html5ever"
** (RuntimeError) Rust NIF compile error (rustc exit code 101)
lib/mix/tasks/compile.rustler.ex:58: Mix.Tasks.Compile.Rustler.compile_crate/1
(elixir) lib/enum.ex:1229: Enum."-map/2-lists^map/1-0-"/2
lib/mix/tasks/compile.rustler.ex:12: Mix.Tasks.Compile.Rustler.run/1
(mix) lib/mix/task.ex:300: Mix.Task.run_task/3
(elixir) lib/enum.ex:1229: Enum."-map/2-lists^map/1-0-"/2
(mix) lib/mix/tasks/compile.all.ex:19: anonymous fn/1 in Mix.Tasks.Compile.All.run/1
(mix) lib/mix/tasks/compile.all.ex:37: Mix.Tasks.Compile.All.with_logger_app/1
(mix) lib/mix/task.ex:300: Mix.Task.run_task/3
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.