Comments (3)
Parsing works fine:
(exml@x4)8> exml:to_binary(element(2, {ok, _} = exml:parse(M2Txt))).
(<0.56.0>) call exml:parse(<<"<frob>&</frob>">>)
(<0.56.0>) returned from exml:parse/1 -> {ok,
{xmlel,<<"frob">>,[],
[{xmlcdata,<<"&">>}]}}
(<0.56.0>) call exml:to_binary({xmlel,<<"frob">>,[],[{xmlcdata,<<"&">>}]})
(<0.56.0>) call exml:to_iolist({xmlel,<<"frob">>,[],[{xmlcdata,<<"&">>}]})
(<0.56.0>) call exml:attrs_to_iolist([],[])
(<0.56.0>) returned from exml:attrs_to_iolist/2 -> []
(<0.56.0>) call exml:to_iolist([{xmlcdata,<<"&">>}])
(<0.56.0>) call exml:'-to_iolist/1-fun-0-'({xmlcdata,<<"&">>})
(<0.56.0>) call exml:to_iolist({xmlcdata,<<"&">>})
(<0.56.0>) returned from exml:to_iolist/1 -> [<<"&">>]
(<0.56.0>) returned from exml:'-to_iolist/1-fun-0-'/1 -> [<<"&">>]
(<0.56.0>) returned from exml:to_iolist/1 -> [[<<"&">>]]
(<0.56.0>) returned from exml:to_iolist/1 -> ["<",<<"frob">>,[],">",
[[<<"&">>]],
"</",<<"frob">>,">"]
(<0.56.0>) returned from exml:to_binary/1 -> <<"<frob>&</frob>">>
<<"<frob>&</frob>">>
It's printing that causes the problem:
(exml@x4)9> exml:to_iolist([{xmlcdata,<<"&">>}]).
(<0.56.0>) call exml:to_iolist([{xmlcdata,<<"&">>}])
(<0.56.0>) call exml:'-to_iolist/1-fun-0-'({xmlcdata,<<"&">>})
(<0.56.0>) call exml:to_iolist({xmlcdata,<<"&">>})
(<0.56.0>) returned from exml:to_iolist/1 -> [<<"&">>]
(<0.56.0>) returned from exml:'-to_iolist/1-fun-0-'/1 -> [<<"&">>]
(<0.56.0>) returned from exml:to_iolist/1 -> [[<<"&">>]]
[[<<"&">>]]
BUT (exml.erl:56
):
to_iolist(#xmlcdata{content = Content}) ->
%% it's caller's responsibility to make sure that
%% #xmlcdata's content is escaped properly!
[Content]. %% ensure we return io*list*
To cut long story short, I agree the sound plan is to make these operations reverses of each other, but in the transitional period it might break all call sites where the caller actually makes sure the CDATAs are escaped.
from exml.
I understand. This just shows that the function names are wrong. A function named exml:show/1
or exml:print/1
can be reasonably expected to produce a legible string/binary, but not necessarily one fit for computer consumption.
The name exml:to_binary/1
suggests to the user that this function can be depended upon to just serialize the data to a binary format. (like the term_to_binary / binary_to_term
dyad).
Expecting the user to search an entire xml tree, find #xmlcdata{}
s, and make sure they are serializable breaks the rule of least surprise.
If exml
has read some data and deemed it well-formed, it should be able to produce that well-formed data again. Channeling Jef Raskin: The system should treat all user input as sacred.
Let's consider making a new, major version that escapes CDATA automatically. What are your thoughts?
In other news: PropEr tests show that many valid utf8 sequences cannot be binarized and unbinarized safely.
from exml.
I've started fixing this, in order to release a non-backwards compatible fix. As usual, the rabbit hole is deeper than I could've imagined (i.e. tests in test/eqc
don't pass even without any changes).
from exml.
Related Issues (12)
- Publish hex.pm package? HOT 5
- escape_cdata() leads to double-escaping HOT 7
- Extend exml_query:path/2 HOT 2
- Remove workaround #35 when we ditch support for Erlang/OTP 19 HOT 1
- Whitespace Parsing HOT 2
- exml_query:subelement_with_name_and_ns HOT 1
- coredump on partial XML chunks with infinite_stream parser HOT 2
- Build with GNU Make fails if LDFLAGS is defined in command-line HOT 1
- Error loading nif HOT 4
- attribute values with single quotes? HOT 1
- `exml:to_(pretty_)iolist/1` don't properly escape attribute values HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from exml.