GithubHelp home page GithubHelp logo

saxe's Introduction

Saxe

Light-weight and efficient SAX parser for JavaScript (~6.6KB minified and gzipped).

Goals

  • Complete XML standard conformance
  • Simple and terse API
  • Reduced code footprint
  • Set a base for other standards built on XML, e.g. XHTML

Example

import {SaxParser} from "saxe";

let textContent = "";
const parser = new SaxParser({
  start(name, attributes) {
    // element start tag
  },
  empty(name, attributes) {
    // empty element
  },
  end(name, attributes) {
    // element end tag
  },
  text(text) {
    textContent += text;
  },
});
for (const chunk of INPUT_STREAM) {
  parser.write(chunk);
}
parser.end();

Runtime Support

Basic XML parsing is supported on any ES2017 runtime. Older runtimes can still run saxe after transpilation and polyfilling any missing functionality.

Encoding support requires TextDecoder; most runtimes support it natively, but it may be polyfilled.

Document Type Declaration

Most1 JavaScript XML parsers skip Document Type Declarations (DTD) without even checking for well-formedness or ignore most declarations.

Internal DTD subset parsing is required even for non-validating2 processors. So most JavaScript implementations are not compliant. Even if one were to manually parse the internal DTD and provide the entity values to isaacs/sax-js or lddubeau/saxes proper entity expansion cannot be replicated. This is fine where the DTDs are prohibited or explicitly ignored but is incorrect for any other protocol or format.

This parser checks the whole internal DTD subset for well-formedness and recognizes ATTLIST and ENTITY declarations. Attributes declared in the internal subset are normalized appropriately and entities are expanded correctly. This process has security implications; if the default behavior is undesirable it may be configured.

External markup declarations and external entities are not required for non-validating2 processors and are explicitly not supported.

Encoding Support

XML allows documents to specify their encoding through the XML or Text Declarations.

<?xml version="1.0" encoding="UTF-8" ?>

Parsing XML from raw binary data in unknown encoding is supported by the SaxDecoder class, which parses XML from Uint8Array chunks.

Do not use SaxDecoder when encoding information is provided externally, e.g. Content-Type MIME type or another specification, e.g. EPUB specifies all XML files MUST be UTF-8.

Supported Encodings

SaxDecoder uses TextDecoder so it supports all encodings defined by the Encoding Standard.

A polyfill may only implement a subset of the Encoding Standard § 4. Encodings. For full compliance ensure at least UTF-8 and UTF-16 are supported, as they are required by the XML standard.

Notes:

  • If a document specifies an unknown or unsupported encoding a SaxError with code ENCODING_NOT_SUPPORTED is thrown.
  • If a document contains data which is invalid for the declared encoding a SaxError with code ENCODING_INVALID_DATA is thrown.

Security

XML Parsers may be subject to a number of possible vulnerabilities, most common attacks exploit external entity resolution and entity expansion.

This parser is strictly non-validating, so by design it should not be vulnerable to any XXE3 based attack. Additionally the length of strings collected during parsing is capped to limit the efficacy of other denial-of-service attacks4.

Document Type Declaration processing may (at user option) be disabled altogether to prevent any attack based on them.

// Doctype declarations will be rejected
// Alternatively, set to "ignore" to allow them but prevent
// them from affecting further parsing
new SaxParser(reader, {dtd: "prohibit"})

Known XML Bombs are tested for as part of regular integration tests and the parser is fuzz tested regularly. Despite this being the case, for very sensible or security oriented apps you may want to conduct your own security audit.

License

Copyright 2024 Federico Carboni

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Footnotes

  1. Other JavaScript XML parser inspected include isaacs/sax-js, NaturalIntelligence/fast-xml-parser and lddubeau/saxes

  2. Non-validating XML processors (parsers) do not validate documents, but must still recognize and report well-formedness (syntax) errors. Non-validating processors are not required to fetch and parse external markup declarations and external entities. XML Standard § 5.1 Validating and Non-Validating Processors 2

  3. XML External Entity (XXE) Processing OWASP | Foundation

  4. XML Denial of Service Attacks and Defenses | Microsoft Learn

saxe's People

Contributors

federicocarboni avatar

Watchers

 avatar

saxe's Issues

Expose NotationDecl to the application

XML standard requires processors to pass notation declarations to the application.

[...] XML processors MUST provide applications with the name and external identifier(s) of any notation declared and referred to in an attribute value, attribute definition, or entity declaration

Entity expansion in attribute values is not implemented

Entity expansion is recursive in both element content and attribute values, but the JavaScript-only implementations I checked didn't implement it correctly.

sax and its fork saxes just blindly append the entity's replacement text (do they even consider it replacement text?) which is incorrect in both element content and attribute values (less noticeable in attribute values):

isaacs/sax-js/lib/sax.js#L919

lddubeau/saxes/src/saxes.ts#L2675

They assume entity content is only character data, but it can actually contain markup following the content production of the XML 1.0 (Fifth edition) specification. And entities can reference other entities.

At the moment entities in element content call the entityRef handler, while attribute values should call resolveEntityRef to get entity's replacement text and follow the proper procedure to append it to the normalized attribute value.

Verify this setup covers all use cases before implementing it.

Internal DTD are not parsed or checked for well-formedness

DTD has features that are difficult to implement correctly, like parameter entities and conditional sections.

Providing APIs to handle those or at least checking for well-formedness would require too much code and effort and is unlikely to be useful to most users.

So, at the cost of a bit of conformance to the XML specifications internal DTDs are not supported and are just ignored.

DOCTYPE is not properly checked for well-formedness

Even excluding internal DTDs the DOCTYPE's identifier is not parsed or checked for well-formedness.

doctypedecl ::= '<!DOCTYPE' S Name (S ExternalID)? S? ('[' intSubset ']' S?)? '>'

The parser should treat the intSubset as a black box and parse and check well-formedness for ExternalID

Drop parameter entity support in EntityValue

Current behavior in the dtd branch is not in spec.

[WFC: PEs in Internal Subset]

In the internal DTD subset, parameter-entity references MUST NOT occur within markup declarations; they may occur where markup declarations can occur. (This does not apply to references that occur in external parameter entities or to the external subset.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.