GithubHelp home page GithubHelp logo

exml's Introduction

exml

GitHub Actions Codecov Hex pm Hex Docs Downloads License

exml is an Erlang library for parsing XML streams and doing complex XML structures manipulation.

Building

exml is a rebar3-compatible OTP application, run make or ./rebar3 compile in order to build it. A C++11 compiler is required.

Using

exml can parse both XML streams as well as single XML documents at once.

To parse a whole XML document:

{ok, Parser} = exml:parse(<<"<my_xml_doc/>">>).

To generate an XML document from Erlang terms:

El = #xmlel{name = <<"foo">>,
            attrs = [{<<"attr1">>, <<"bar">>}],
            children = [{xmlcdata, <<"Some Value">>}]},
exml:to_list(El).

or (pastable into erl shell):

El = {xmlel, <<"foo">>,
      [{<<"attr1">>, <<"bar">>}],
      [{xmlcdata, <<"Some Value">>}]}.
exml:to_list(El).

Which results in:

<foo attr1='bar'>Some Value</foo>

exml:to_binary/1 works similarly.

There's also exml:to_pretty_iolist/1,3 for a quick'n'dirty document preview (pastable into erl):

rr("include/exml.hrl").
El = #xmlel{name = <<"outer">>,
            attrs = [{<<"attr1">>, <<"val1">>},
                     {<<"attr2">>, <<"val-two">>}],
            children = [#xmlel{name = <<"inner-childless">>},
                        #xmlel{name = <<"inner-w-children">>,
                               children = [#xmlel{name = <<"a">>}]}]}.
io:format("~s", [exml:to_pretty_iolist(El)]).

which prints:

<outer attr2='val-two' attr1='val1'>
  <inner-childless/>
  <inner-w-children>
    <a/>
  </inner-w-children>
</outer>

For an example of using the streaming API see test/exml_stream_tests.erl.

XML Tree navigation

The exml_query module exposes powerful helper functions to navigate the tree, please refer to the documentation available.

Notes

The implementation uses C++ thread-local memory pools of size 10MB by default (override RAPIDXML_STATIC_POOL_SIZE and/or RAPIDXML_DYNAMIC_POOL_SIZE at compilation time if desired differently), to maximise cache locality and memory allocation patterns. To also improve performance, the NIF calls are not checking input size, nor timeslicing themselves, nor running in dirty schedulers: that means that if called with too big inputs, the NIFs can starve the VM. It's up to the dev to throttle the input sizes and fine-tune the memory pool sizes.

exml's People

Contributors

arcusfelis avatar chrisyunker avatar chrzaszcz avatar erszcz avatar fenek avatar goj avatar igors avatar jkingsbery avatar kianmeng avatar kzemek avatar lucafavatella avatar ludwikbukowski avatar michalwski avatar nelsonvides avatar paulgray avatar ppikula avatar pzel avatar rgafiyatullin avatar tomlegg avatar zofpolkowska avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

exml's Issues

Extend exml_query:path/2

We have {element, <<"name">>} selector, but it's often not enough.
Let's do more selectors:

  • {element_with_ns, <<"query">>, <<"urn:xmpp:mam:0">>}
  • {element_with_attr, <<"field">>, <<"var">>, <<"title">>}
  • {element_fun, fun(Elem) -> Matched :: boolean() end}
  • {element_match, #xmlel{name = <<"user">>, attrs = [{<<"jid">>, <<"alice@localhost">>}]}

element_with_ns matches <query xmlns="urn:xmpp:mam:0"/> but not <query xmlns="urn:xmpp:mam:2"/>.
element_with_attr matches <field var="title"/> but not <field var="description"/>.
element_math matches <user jid="alice@localhost" nick="Alice"><active/></user>.

exml_query:subelement_with_name_and_ns

Is this an expected behaviour that this function ignores the second element with the same name even though it has a different namespace? This functions role is not to distinguish elements based on both their names and their NSs?

If we have list of elements: [#xmlel{name = <<"a">>, attrs = [{<<"xmlns">>, <<"ns">>}]}]}, #xmlel{name = <<"a">>, attrs = [{<<"xmlns">>, <<"ns2">>}]}]}]
and call exml_query:subelement_with_name_and_ns(<<"a">>, <<"ns2">>) we will get:

  • currently - undefined
  • expected - #xmlel{name = <<"a">>, attrs = [{<<"xmlns">>, <<"ns2">>}]}]}

Build with GNU Make fails if LDFLAGS is defined in command-line

In our project the root Makefile calls rebar which then builds exml as a dependency (it is about rebar2).

Tricky thing is that by invoking gmake as gmake LDFLAGS="our_flags" the knowledge of that command-line argument propagates through rebar to gmake -C c_src in exml/rebar.config directly to exml/c_src/Makefile

At this moment gmake uses the overriding rules described in https://www.gnu.org/software/make/manual/html_node/Overriding.html#Overriding.

Particularly:

An argument that contains ‘=’ specifies the value of a variable: ‘v=x’ sets the value of the variable v to x. If you specify a value in this way, all ordinary assignments of the same variable in the makefile are ignored; we say they have been overridden by the command line argument.

As a consequence the line LDFLAGS += -shared doesn't work and the build fails with error:

gmake[1]: Entering directory '/data/buildbot/slave/freebsd11/d41d8cd9-pr-bugtrace-new-exml-fedc8fba226eeb8031754354e0140e65/deps/exml/c_src'
c++ -I/usr/local/include -std=c++11 -Wall -fPIC -O3 -I /data/erlang/otp_18.3_hacked/lib/erlang/erts-7.3/include/ -I /data/erlang/otp_18.3_hacked/lib/erlang/lib/erl_interface-3.8.2/include  -c -o exml.o exml.cpp
c++ exml.o -L/usr/local/lib -L /data/erlang/otp_18.3_hacked/lib/erlang/lib/erl_interface-3.8.2/lib -lerl_interface -lei -o /data/buildbot/slave/freebsd11/d41d8cd9-pr-bugtrace-new-exml-fedc8fba226eeb8031754354e0140e65/deps/exml/c_src/../priv/exml_nif.so
/usr/lib/crt1.o: In function `_start':
/usr/src/lib/csu/amd64/crt1.c:(.text+0x17b): undefined reference to `main'
exml.o: In function `load(enif_environment_t*, void**, unsigned long)':
exml.cpp:(.text+0x2d): undefined reference to `enif_system_info'
exml.cpp:(.text+0x187): undefined reference to `enif_open_resource_type'
exml.cpp:(.text+0x193): undefined reference to `enif_alloc_env'
exml.cpp:(.text+0x1a9): undefined reference to `enif_make_atom'
exml.cpp:(.text+0x1c3): undefined reference to `enif_make_atom'
exml.cpp:(.text+0x1dd): undefined reference to `enif_make_atom'
exml.cpp:(.text+0x1f7): undefined reference to `enif_make_atom'
And so on.

The fix is easy.
Remove:
LDFLAGS += -shared

And put -shared flag to the:

- $(link_verbose) $(CXX) $^ $(LDFLAGS) $(LDLIBS) -o $@
+ $(link_verbose) $(CXX) $^ $(LDFLAGS) -shared $(LDLIBS) -o $@

since you definitely know at that moment that you are going to build a shared library regardless of LDFLAGS.

Not sure that extra PR worth here :)

BTW: platform - FreeBSD.

Exml to binary produces malformed XML

1> M1Txt = <<"<frob>hello</frob>">>.
<<"<frob>hello</frob>">>

2> exml:to_binary(element(2, {ok, _} = exml:parse(M1Txt))).
<<"<frob>hello</frob>">>

3> M2Txt = <<"<frob>&amp;</frob>">>.
<<"<frob>&amp;</frob>">>

4> exml:to_binary(element(2, {ok, _} = exml:parse(M2Txt))).
<<"<frob>&</frob>">>

5> exml:parse(exml:to_binary(element(2, {ok, _} = exml:parse(M2Txt)))).
{error,{"not well-formed (invalid token)",
        <<"<stream><frob>&</frob></stream>">>}}

Error loading nif

I am getting error

exml_nif.so: undefined symbol: _ZNKSt13runtime_error4whatEv'

OTP: 22.2.1

uname -a

Linux ip-10-0-2-155 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3 (2019-02-02) x86_64 GNU/Linux

Whitespace Parsing

Currently whitespaces are not honored by exml:

# exml:parse(<<"<body> </body>">>).
{ok,{xmlel,<<"body">>,[],[]}}

The whitespace is not parsed as content of the XML tag. My first guess is that the parse_non_destructive has to be used in the rapidxml parser, but that did not work out so far :(.

`exml:to_(pretty_)iolist/1` don't properly escape attribute values

Parsing is OK:

> {ok, T1} = exml:parse(<<"<el attr=\"''''\"/>">>).
{ok,{xmlel,<<"el">>,[{<<"attr">>,<<"''''">>}],[]}}

But printing the term and parsing it again fails:

> R1 = io_lib:format("~ts", [exml:to_iolist(T1)]).
> exml:parse(erlang:iolist_to_binary(R1)).
{error,{"not well-formed (invalid token)",
        <<"<stream><el attr=''''''/></stream>">>}}

What we get by printing is:

> erlang:iolist_to_binary(R1).
<<"<el attr=''''''/>">>

While what we should get is <<"<el attr='&apos;&apos;&apos;&apos;'/>">>. Just to make sure:

> exml:parse(<<"<el attr='&apos;&apos;&apos;&apos;'/>">>) =:= exml:parse(<<"<el attr=\"''''\"/>">>).
true

The same error applies to &quot; / ".

attribute values with single quotes?

single quote are causing issues when creating xml.
Xml is being sent to a third party that expects double quotes on attr values. 🤦

a option to have have attribures with Double/Single quotes would be nice

ex:

Erlang/OTP 23 [erts-11.0] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [hipe] [dtrace]

Eshell V11.0  (abort with ^G)
1> application:ensure_all_started(exml).
{ok,[exml]}
2> exml
exml           exml_nif       exml_query     exml_stream    
2> rr(exml).     
[xmlcdata,xmlel,xmlstreamend,xmlstreamstart]
3> exml:to_binary(#xmlel{ name = <<"test">>, attrs = [{<<"test">>, <<"a">>}], children = []}).
<<"<test test='a'/>">>
4> ```

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.