GithubHelp home page GithubHelp logo

Exml to binary produces malformed XML about exml HOT 3 CLOSED

pzel avatar pzel commented on August 28, 2024
Exml to binary produces malformed XML

from exml.

Comments (3)

erszcz avatar erszcz commented on August 28, 2024

Parsing works fine:

(exml@x4)8> exml:to_binary(element(2, {ok, _} = exml:parse(M2Txt))).
(<0.56.0>) call exml:parse(<<"<frob>&amp;</frob>">>)
(<0.56.0>) returned from exml:parse/1 -> {ok,
                                          {xmlel,<<"frob">>,[],
                                           [{xmlcdata,<<"&">>}]}}
(<0.56.0>) call exml:to_binary({xmlel,<<"frob">>,[],[{xmlcdata,<<"&">>}]})
(<0.56.0>) call exml:to_iolist({xmlel,<<"frob">>,[],[{xmlcdata,<<"&">>}]})
(<0.56.0>) call exml:attrs_to_iolist([],[])
(<0.56.0>) returned from exml:attrs_to_iolist/2 -> []
(<0.56.0>) call exml:to_iolist([{xmlcdata,<<"&">>}])
(<0.56.0>) call exml:'-to_iolist/1-fun-0-'({xmlcdata,<<"&">>})
(<0.56.0>) call exml:to_iolist({xmlcdata,<<"&">>})
(<0.56.0>) returned from exml:to_iolist/1 -> [<<"&">>]
(<0.56.0>) returned from exml:'-to_iolist/1-fun-0-'/1 -> [<<"&">>]
(<0.56.0>) returned from exml:to_iolist/1 -> [[<<"&">>]]
(<0.56.0>) returned from exml:to_iolist/1 -> ["<",<<"frob">>,[],">",
                                              [[<<"&">>]],
                                              "</",<<"frob">>,">"]
(<0.56.0>) returned from exml:to_binary/1 -> <<"<frob>&</frob>">>
<<"<frob>&</frob>">>

It's printing that causes the problem:

(exml@x4)9> exml:to_iolist([{xmlcdata,<<"&">>}]).
(<0.56.0>) call exml:to_iolist([{xmlcdata,<<"&">>}])
(<0.56.0>) call exml:'-to_iolist/1-fun-0-'({xmlcdata,<<"&">>})
(<0.56.0>) call exml:to_iolist({xmlcdata,<<"&">>})
(<0.56.0>) returned from exml:to_iolist/1 -> [<<"&">>]
(<0.56.0>) returned from exml:'-to_iolist/1-fun-0-'/1 -> [<<"&">>]
(<0.56.0>) returned from exml:to_iolist/1 -> [[<<"&">>]]
[[<<"&">>]]

BUT (exml.erl:56):

to_iolist(#xmlcdata{content = Content}) ->
    %% it's caller's responsibility to make sure that
    %% #xmlcdata's content is escaped properly!
    [Content]. %% ensure we return io*list*

To cut long story short, I agree the sound plan is to make these operations reverses of each other, but in the transitional period it might break all call sites where the caller actually makes sure the CDATAs are escaped.

from exml.

pzel avatar pzel commented on August 28, 2024

I understand. This just shows that the function names are wrong. A function named exml:show/1 or exml:print/1 can be reasonably expected to produce a legible string/binary, but not necessarily one fit for computer consumption.

The name exml:to_binary/1 suggests to the user that this function can be depended upon to just serialize the data to a binary format. (like the term_to_binary / binary_to_term dyad).

Expecting the user to search an entire xml tree, find #xmlcdata{}s, and make sure they are serializable breaks the rule of least surprise.

If exml has read some data and deemed it well-formed, it should be able to produce that well-formed data again. Channeling Jef Raskin: The system should treat all user input as sacred.

Let's consider making a new, major version that escapes CDATA automatically. What are your thoughts?

In other news: PropEr tests show that many valid utf8 sequences cannot be binarized and unbinarized safely.

from exml.

erszcz avatar erszcz commented on August 28, 2024

I've started fixing this, in order to release a non-backwards compatible fix. As usual, the rabbit hole is deeper than I could've imagined (i.e. tests in test/eqc don't pass even without any changes).

from exml.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.