GithubHelp home page GithubHelp logo

hvr / xmlhtml Goto Github PK

View Code? Open in Web Editor NEW

This project forked from snapframework/xmlhtml

0.0 2.0 0.0 972 KB

XML parser and renderer with HTML 5 quirks mode

License: Other

Haskell 95.79% CSS 3.75% Shell 0.46%

xmlhtml's Introduction

xmlhtml - XML and HTML 5 parsing and rendering

This library implements both parsers and renderers for XML and HTML 5 document
fragments.  The two share data structures to represent the document tree, so
that you can write code to easily work with either XML or HTML 5.  Convenience
functions are also available to work with the internal data structure in
several natural ways.

Caveats:

- Both parsers are written to parse document fragments, not complete
  documents.  This means that they do not enforce rules about overall
  document structure.  There does not need to be only a single root node,
  and the HTML 5 implementation never inserts any missing start tags.

- The XML parser is incapable of handling processing instructions, or defined
  entities.  If will silently drop processing instructions, and will fail if
  encounters an entity reference for anything by the predefined entities
  (apos, quot, amp, lt, and gt).

- The HTML parser is really an XML parser with HTML 5 quirks mode.  It should
  be just fine for parsing documents that conform to the HTML 5 specification.
  However, it is *not* a compliant HTML 5 parser, as compliant parsers are
  required to be compatible with non-compliant documents in many ways that we
  aren't interested in.  So this is a great basis for a template system, for
  example, but a very poor basis for a web browser or web spider.

To get started, just use the parseHTML or parseXML functions from Text.XmlHtml
to parse a ByteString into a document tree.  On the other side, use render to
write the document tree back to a ByteString.

Working with document trees is easily done in two ways.

1. Text.XmlHtml exports the document tree types (notably, Document and Node)
   and functions like getAttribute, setAttribute, tagName, childNodes, etc. for
   working with them.

2. Text.XmlHtml.Cursor exports a zipper for node forests, which you can use to
   navigate and modify the document tree positionally.

That's it, basically.  This is hopefully a pretty simple package to use.

TO DO Items:

1. Do something better with character encodings.  For now, they are basically
   ignored, and we just use the byte order mark to distinguish between the
   three required encodings.  We should implement the encoding sniffing rules
   for both XML (the <?xml ... ?> declaration) and HTML 5.

2. Benchmark and improve performance of the parsers and renderers.

3. Ensure that rendering always gives an error rather than writing an invalid
   document. (Is this a good idea?  It does limit rendering speed.)

xmlhtml's People

Contributors

mightybyte avatar cdsmith avatar gregorycollins avatar jaspervdj avatar alexanderkjeldaas avatar aslatter avatar norm2782 avatar 23skidoo avatar zenzike avatar sebastiaanvisser avatar sol avatar tych0 avatar

Watchers

Herbert Valerio Riedel avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.