Light-weight and efficient SAX parser for JavaScript (~6.6KB minified and gzipped).
- Complete XML standard conformance
- Simple and terse API
- Reduced code footprint
- Set a base for other standards built on XML, e.g. XHTML
import {SaxParser} from "saxe";
let textContent = "";
const parser = new SaxParser({
start(name, attributes) {
// element start tag
},
empty(name, attributes) {
// empty element
},
end(name, attributes) {
// element end tag
},
text(text) {
textContent += text;
},
});
for (const chunk of INPUT_STREAM) {
parser.write(chunk);
}
parser.end();
Basic XML parsing is supported on any ES2017 runtime. Older runtimes can still
run saxe
after transpilation and polyfilling any missing functionality.
Encoding support requires TextDecoder
; most runtimes support it natively,
but it may be polyfilled.
Most1 JavaScript XML parsers skip Document Type Declarations (DTD) without even checking for well-formedness or ignore most declarations.
Internal DTD subset parsing is required even for non-validating2 processors. So most JavaScript implementations are not compliant. Even if one were to manually parse the internal DTD and provide the entity values to isaacs/sax-js or lddubeau/saxes proper entity expansion cannot be replicated. This is fine where the DTDs are prohibited or explicitly ignored but is incorrect for any other protocol or format.
This parser checks the whole internal DTD subset for well-formedness and recognizes ATTLIST
and ENTITY
declarations. Attributes declared in the internal subset are normalized appropriately and entities are expanded correctly. This process has security implications; if the default behavior is undesirable it may be configured.
External markup declarations and external entities are not required for non-validating2 processors and are explicitly not supported.
XML allows documents to specify their encoding through the XML or Text Declarations.
<?xml version="1.0" encoding="UTF-8" ?>
Parsing XML from raw binary data in unknown encoding is supported by the
SaxDecoder
class, which parses XML from Uint8Array
chunks.
Do not use SaxDecoder
when encoding information is provided externally, e.g.
Content-Type
MIME type or another specification, e.g. EPUB specifies all XML
files MUST be UTF-8
.
SaxDecoder
uses TextDecoder
so it supports all encodings defined by the
Encoding Standard.
A polyfill may only implement a subset of the Encoding Standard § 4.
Encodings. For full compliance ensure at least UTF-8
and UTF-16
are supported, as they are required by the XML standard.
Notes:
- If a document specifies an unknown or unsupported encoding a
SaxError
with codeENCODING_NOT_SUPPORTED
is thrown. - If a document contains data which is invalid for the declared encoding a
SaxError
with codeENCODING_INVALID_DATA
is thrown.
XML Parsers may be subject to a number of possible vulnerabilities, most common attacks exploit external entity resolution and entity expansion.
This parser is strictly non-validating, so by design it should not be vulnerable to any XXE3 based attack. Additionally the length of strings collected during parsing is capped to limit the efficacy of other denial-of-service attacks4.
Document Type Declaration processing may (at user option) be disabled altogether to prevent any attack based on them.
// Doctype declarations will be rejected
// Alternatively, set to "ignore" to allow them but prevent
// them from affecting further parsing
new SaxParser(reader, {dtd: "prohibit"})
Known XML Bombs are tested for as part of regular integration tests and the parser is fuzz tested regularly. Despite this being the case, for very sensible or security oriented apps you may want to conduct your own security audit.
Copyright 2024 Federico Carboni
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Footnotes
-
Other JavaScript XML parser inspected include isaacs/sax-js, NaturalIntelligence/fast-xml-parser and lddubeau/saxes ↩
-
Non-validating XML processors (parsers) do not validate documents, but must still recognize and report well-formedness (syntax) errors. Non-validating processors are not required to fetch and parse external markup declarations and external entities. XML Standard § 5.1 Validating and Non-Validating Processors ↩ ↩2
-
XML Denial of Service Attacks and Defenses | Microsoft Learn ↩