GithubHelp home page GithubHelp logo

anglesharp / anglesharp.xml Goto Github PK

View Code? Open in Web Editor NEW
18.0 4.0 6.0 266 KB

:angel: Library to add XML and DTD parsing capabilities to AngleSharp.

Home Page: https://anglesharp.github.io

License: MIT License

C# 99.40% PowerShell 0.31% Shell 0.24% TypeScript 0.02% Batchfile 0.02%
anglesharp dom xml parser c-sharp library

anglesharp.xml's Introduction

logo

AngleSharp

CI GitHub Tag NuGet Count Issues Open Gitter Chat StackOverflow Questions CLA Assistant

AngleSharp is a .NET library that gives you the ability to parse angle bracket based hyper-texts like HTML, SVG, and MathML. XML without validation is also supported by the library. An important aspect of AngleSharp is that CSS can also be parsed. The included parser is built upon the official W3C specification. This produces a perfectly portable HTML5 DOM representation of the given source code and ensures compatibility with results in evergreen browsers. Also standard DOM features such as querySelector or querySelectorAll work for tree traversal.

⚡⚡ Migrating from AngleSharp 0.9 to AngleSharp 0.10 or later (incl. 1.0)? Look at our migration documentation. ⚡⚡

Key Features

  • Portable (using .NET Standard 2.0)
  • Standards conform (works exactly as evergreen browsers)
  • Great performance (outperforms similar parsers in most scenarios)
  • Extensible (extend with your own services)
  • Useful abstractions (type helpers, jQuery like construction)
  • Fully functional DOM (all the lists, iterators, and events you know)
  • Form submission (easily log in everywhere)
  • Navigation (a BrowsingContext is like a browser tab - control it from .NET!).
  • LINQ enhanced (use LINQ with DOM elements, naturally without wrappers)

The advantage over similar libraries like HtmlAgilityPack is that the exposed DOM is using the official W3C specified API, i.e., that even things like querySelectorAll are available in AngleSharp. Also the parser uses the HTML 5.1 specification, which defines error handling and element correction. The AngleSharp library focuses on standards compliance, interactivity, and extensibility. It is therefore giving web developers working with C# all possibilities as they know from using the DOM in any modern browser.

The performance of AngleSharp is quite close to the performance of browsers. Even very large pages can be processed within milliseconds. AngleSharp tries to minimize memory allocations and reuses elements internally to avoid unnecessary object creation.

Simple Demo

The simple example will use the website of Wikipedia for data retrieval.

var config = Configuration.Default.WithDefaultLoader();
var address = "https://en.wikipedia.org/wiki/List_of_The_Big_Bang_Theory_episodes";
var context = BrowsingContext.New(config);
var document = await context.OpenAsync(address);
var cellSelector = "tr.vevent td:nth-child(3)";
var cells = document.QuerySelectorAll(cellSelector);
var titles = cells.Select(m => m.TextContent);

Or the same with explicit types:

IConfiguration config = Configuration.Default.WithDefaultLoader();
string address = "https://en.wikipedia.org/wiki/List_of_The_Big_Bang_Theory_episodes";
IBrowsingContext context = BrowsingContext.New(config);
IDocument document = await context.OpenAsync(address);
string cellSelector = "tr.vevent td:nth-child(3)";
IHtmlCollection<IElement> cells = document.QuerySelectorAll(cellSelector);
IEnumerable<string> titles = cells.Select(m => m.TextContent);

In the example we see:

  • How to setup the configuration for supporting document loading
  • Asynchronously get the document in a new context using the configuration
  • Performing a query to get all cells with the content of interest
  • The whole DOM supports LINQ queries

Every collection in AngleSharp supports LINQ statements. AngleSharp also provides many useful extension methods for element collections that cannot be found in the official DOM.

Supported Platforms

AngleSharp has been created as a .NET Standard 2.0 compatible library. This includes, but is not limited to:

  • .NET Core (2.0 and later)
  • .NET Framework (4.6.2 and later)
  • Xamarin.Android (7.0 and 8.0)
  • Xamarin.iOS (10.0 and 10.14)
  • Xamarin.Mac (3.0 and 3.8)
  • Mono (4.6 and 5.4)
  • UWP (10.0 and 10.0.16299)
  • Unity (2018.1)

Documentation

The documentation of AngleSharp is located in the docs folder. More examples, best-practices, and general information can be found there. The documentation also contains a list of frequently asked questions.

More information is also available by following some of the hyper references mentioned in the Wiki. In-depth articles will be published on the CodeProject, with links being placed in the Wiki at GitHub.

Use-Cases

  • Parsing HTML (incl. fragments)
  • Parsing CSS (incl. selectors, declarations, ...)
  • Constructing HTML (e.g., view-engine)
  • Minifying CSS, HTML, ...
  • Querying document elements
  • Crawling information
  • Gathering statistics
  • Web automation
  • Tools with HTML / CSS / ... support
  • Connection to page analytics
  • HTML / DOM unit tests
  • Automated JavaScript interaction
  • Testing other concepts, e.g., script engines
  • ...

Vision

The project aims to bring a solid implementation of the W3C DOM for HTML, SVG, MathML, and CSS to the CLR - all written in C#. The idea is that you can basically do everything with the DOM in C# that you can do in JavaScript (plus, of course, more).

Most parts of the DOM are included, even though some may still miss their (fully specified / correct) implementation. The goal for v1.0 is to have all practically relevant parts implemented according to the official W3C specification (with useful extensions by the WHATWG).

The API is close to the DOM4 specification, however, the naming has been adjusted to apply with .NET conventions. Nevertheless, to make AngleSharp really useful for, e.g., a JavaScript engine, attributes have been placed on the corresponding interfaces (and methods, properties, ...) to indicate the status of the field in the official specification. This allows automatic generation of DOM objects with the official API.

This is a long-term project which will eventually result in a state of the art parser for the most important angle bracket based hyper-texts.

Our hope is to build a community around web parsing and libraries from this project. So far we had great contributions, but that goal was not fully achieved. Want to help? Get in touch with us!

Participating in the Project

If you know some feature that AngleSharp is currently missing, and you are willing to implement the feature, then your contribution is more than welcome! Also if you have a really cool idea - do not be shy, we'd like to hear it.

If you have an idea how to improve the API (or what is missing) then posts / messages are also welcome. For instance there have been ongoing discussions about some styles that have been used by AngleSharp (e.g., HTMLDocument or HtmlDocument) in the past. In the end AngleSharp stopped using HTMLDocument (at least visible outside of the library). Now AngleSharp uses names like IDocument, IHtmlElement and so on. This change would not have been possible without such fruitful discussions.

The project is always searching for additional contributors. Even if you do not have any code to contribute, but rather an idea for improvement, a bug report or a mistake in the documentation. These are the contributions that keep this project active.

Live discussions can take place in our Gitter chat, which supports using GitHub accounts.

More information is found in the contribution guidelines. All contributors can be found in the CONTRIBUTORS file.

This project has also adopted the code of conduct defined by the Contributor Covenant to clarify expected behavior in our community.

For more information see the .NET Foundation Code of Conduct.

Funding / Support

If you use AngleSharp frequently, but you do not have the time to support the project by active participation you may still be interested to ensure that the AngleSharp projects keeps the lights on.

Therefore we created a backing model via Bountysource. Any donation is welcome and much appreciated. We will mostly spend the money on dedicated development time to improve AngleSharp where it needs to be improved, plus invest in the web utility eco-system in .NET (e.g., in JavaScript engines, other parsers, or a renderer for AngleSharp to mention some outstanding projects).

Visit Bountysource for more details.

Development

AngleSharp is written in the most recent version of C# and thus requires Roslyn as a compiler. Using an IDE like Visual Studio 2019+ is recommended on Windows. Alternatively, VSCode (with OmniSharp or another suitable Language Server Protocol implementation) should be the tool of choice on other platforms.

The code tries to be as clean as possible. Notably the following rules are used:

  • Use braces for any conditional / loop body
  • Use the -Async suffixed methods when available
  • Use VIP ("Var If Possible") style (in C++ called AAA: Almost Always Auto) to place types on the right

More important, however, is the proper usage of tests. Any new feature should come with a set of tests to cover the functionality and prevent regression.

Changelog

A very detailed changelog exists. If you are just interested in major releases then have a look at the GitHub releases.

.NET Foundation

This project is supported by the .NET Foundation.

License

AngleSharp is released using the MIT license. For more information see the license file.

anglesharp.xml's People

Contributors

florianrappl avatar jbrayfaithlife avatar kasthack avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

anglesharp.xml's Issues

Cloning self-closing XML element doesn't clone SelfClosing flag

Bug Report

Prerequisites

  • Can you reproduce the problem in a MWE?
  • Are you running the latest version of AngleSharp?
  • Did you check the FAQs to see if that helps you?
  • Are you reporting to the correct repository? (there are multiple AngleSharp libraries, e.g., AngleSharp.Css for CSS support)
  • Did you perform a search in the issues?

For more information, see the CONTRIBUTING guide.

Description

When using the .Clone method on a self-closing XML element I expect the clone to also carry the self-closing flag.

Steps to Reproduce

Here's a test case showing the behavior for a cloned <br /> element:

[TestMethod]
public void TestCloneSelfClosing()
{
    var config = Configuration.Default.With(HtmlEntityProvider.Resolver).WithDefaultLoader(new LoaderOptions { IsResourceLoadingEnabled = true }).WithCss().WithXml();
    var context = BrowsingContext.New(config);

    var xml = @"<xml><div><br /></div></xml>";

    var xmlParser = new XmlParser(new XmlParserOptions(), context);
    var xmlDoc = xmlParser.ParseDocument(xml);

    var div = xmlDoc.All.Where(e => e.NodeName == "div").Single();
    Assert.AreEqual(NodeFlags.SelfClosing, div.FirstElementChild!.Flags);
    var clonedDiv = div.Clone(true) as IElement;
    Assert.AreEqual(NodeFlags.SelfClosing, clonedDiv!.FirstElementChild!.Flags); // <---- fails here
}

Expected behavior: I expect the test to succeed in the last line.

Actual behavior: The test fails since the Flag is None rather than SelfClosing.

Environment details: Windows 11, .NET 6.0

Request for Support / Sponsorship

Over the years this project had great contributors and sponsors. Moving forward the last year has shown that dedicated support (e.g., as provided by AWS and JetBrains) is crucial to allocate time for maintenance and move it forward. I'd like to continue in this mode; not only sometimes cutting out some of my spare time, but actually being able to have dedicated time slots.

So with this sticky note I call to support this project. It would be really wonderful and there are still some plans for a potential v2 that would benefit a lot from additional support (as well as its ecosystem, esp. CSS / JS / ...).

Background

In the past we already had some great sponsors who brought this project forward.

By far the largest contribution came from AWS:

Closely behind (and still active) is the sponsorship from JetBrains:

And other companies:

| | |

We also had individuals (much appreciated!) that have been very gracious:

🙏 thanks to everyone!

Also to be clear; AngleSharp and all associated libraries will always be free and MIT licensed.

There is no consequence from no sponsors coming in - except that my time available for the project will definitely be less as compared to having some sponsorship.

One final remark: While GitHub sponsorships are potentially the best way of supporting the project there are also other ways, e.g., getting in touch regarding consulting on AngleSharp best practices if you use it or directly getting in touch regarding development of specific features.

(Original post / issue available at AngleSharp/AngleSharp#1163)

Parser ignoring IsKeepingSourceReferences

Bug Report

Prerequisites

  • Can you reproduce the problem in a MWE?
  • Are you running the latest version of AngleSharp?
  • Did you check the FAQs to see if that helps you?
  • Are you reporting to the correct repository? (there are multiple AngleSharp libraries, e.g., AngleSharp.Css for CSS support)
  • Did you perform a search in the issues?

Description

Running the parser with XmlParserOptions() { IsKeepingSourceReferences = true, } doesn't actually keep the source references, i looked into the source code and while the HtmlParser checks IsKeepingSourceReferences and assignes the source reference correctly as can be seen here
image

The xml parser never does that, meaning IElement.SourceReference is always null

Steps to Reproduce

  1. Create a XmlParser with XmlParserOptions() { IsKeepingSourceReferences = true, }
  2. Run Parse

**Expected behavior: IElements should have SourceReference set

**Actual behavior: SourceReference is always null

**Environment details: Linux, .NET 6

Possible Solution

Look into how the HtmlDomBuilder handles it and possibly just copy it, (don't know much about the repo so probably just ignore what i said)

Order of declaring namespace for attributes and using said namespace should not matter.

Bug Report

According to the xml (spec)[https://www.w3.org/TR/2006/REC-xml-names11-20060816/#sec-namespaces]:

The namespace prefix, unless it is xml or xmlns, must have been declared in a namespace declaration attribute in either the start-tag of the element where the prefix is used or in an ancestor element (i.e. an element in whose content the prefixed markup occurs). Furthermore, the attribute value in the innermost such declaration must not be an empty string.

Though it is admittedly harder to read, these two declarations should be both valid uses of the prefix:
<div xmlns:epub="http://www.idpf.org/2007/ops" epub:type="footnote">Test</div>
<div epub:type="footnote" xmlns:epub="http://www.idpf.org/2007/ops">Test</div>

Unfortunately, the way that the parser works, it parses attributes in the order they are declared, so the first example parses correctly to the expected namespace uri, but the second one does not.

Prerequisites

  • [/] Can you reproduce the problem in a MWE?
  • [/] Are you running the latest version of AngleSharp?
  • [/] Did you check the FAQs to see if that helps you?
  • [/] Are you reporting to the correct repository? (there are multiple AngleSharp libraries, e.g., AngleSharp.Css for CSS support)
  • [/] Did you perform a search in the issues?

For more information, see the CONTRIBUTING guide.

Description

Namespace declarations need to be parsed before other attributes on an element.

Steps to Reproduce

var document = new XmlParser().ParseDocument(@"<xml xmlns:epub=""http://www.idpf.org/2007/ops"" epub:type=""noteref"">1</xml>");
var root = document.DocumentElement;
root.Attributes.First(att => att.LocalName == "type").NamespaceUri.Dump();

document = new XmlParser().ParseDocument(@"<xml epub:type=""noteref"" xmlns:epub=""http://www.idpf.org/2007/ops"" >1</xml>");
root = document.DocumentElement;
root.Attributes.First(att => att.LocalName == "type").NamespaceUri.Dump();

Expected behavior: both Dump() calls should print out http://www.idpf.org/2007/ops.

Actual behavior: the first call to Dump() outputs the correct uri, the second outputs null.

Environment details: Win 10 .NET 6.0.15

Possible Solution

There are two approaches that could be taken, both around

for (var i = 0; i < tagToken.Attributes.Count; i++)
{
var attr = tagToken.Attributes[i];
var item = CreateAttribute(attr.Key, attr.Value.Trim());
element.AddAttribute(item);
}

First, we could make sure to process any namespace declarations before any other attributes, which seems like the simplest approach. I have a PR to this effect that I will put up for your review.

Second, we could do a second run through the created attributes, double checking the namespaces after all the attributes have been processed.

Poor invalid entity reference handling in attributes

Bug Report

Prerequisites

  • Can you reproduce the problem in a MWE?
  • Are you running the latest version of AngleSharp?
  • Did you check the FAQs to see if that helps you?
  • Are you reporting to the correct repository? (if its an issue with the core library, please report to AngleSharp directly)
  • Did you perform a search in the issues?

For more information, see the CONTRIBUTING guide.

Description

^ title

Steps to Reproduce

  1. Try to parse <foo bar="&#34" baz="123"/> with IsSuppressingErrors = true

Expected behavior:

  • Tag gets parsed
  • bar attribute contains " or at least &#34

Actual behavior:

XmlParserException gets thrown

Possible Solution

PR #8

XmlMarkupFormatter and self-closing tags

I can't see a way to choose self-closing tags.

The default format of many partially-human readable xml files e.g. csproj files, is to use self-closing tags. If you round trip a file through XmlMarkupFormatter then they will become uglified.

I would appreciate an option to choose self-closing but ideally I would support changing the default to be self-closing because it produces shorter and more readable output and I have no use-case for the verbose form.

Here is my current work-around to wrap the XmlMarkupFormatter:

public class SelfClosingXmlMarkupFormatter : IMarkupFormatter
{
    public string Text(ICharacterData text) =>
        XmlMarkupFormatter.Instance.Text(text);

    public string Comment(IComment comment) =>
        XmlMarkupFormatter.Instance.Comment(comment);

    public string Processing(IProcessingInstruction processing) =>
        XmlMarkupFormatter.Instance.Processing(processing);

    public string Doctype(IDocumentType doctype) =>
        XmlMarkupFormatter.Instance.Doctype(doctype);

    public string OpenTag(IElement element, bool selfClosing) =>
        XmlMarkupFormatter.Instance.OpenTag(element, !(element.HasChildNodes || element.HasTextNodes()));

    public string CloseTag(IElement element, bool selfClosing) =>
        XmlMarkupFormatter.Instance.CloseTag(element, !(element.HasChildNodes || element.HasTextNodes()));

    public string Attribute(IAttr attribute) =>
        XmlMarkupFormatter.Instance.Attribute(attribute);
}

XmlParser InnerHtml property not working

when i have used innerHtml property on xmlElement i didn't have any results.

exmple code

var xmlParser = new XmlParser();
var xmlDocument = xmlParser.Parse("<div></div>");
var querySelector = xmlDocument.QuerySelector("div");

querySelector.InnerHtml = "<a></a>";

TestContext.WriteLine(xmlDocument.ToHtml());

result <div></div>

`WithXml` Should Be In AngleSharp Namespace

The WithXml extension method is currently in AngleSharp.Xml, however, for visibility it should be in AngleSharp (like, e.g., WithCss or WithJs).

Note: this is a breaking change.

Allow to suppress '<' validation for attribute values when error suppression is enabled

New Feature Proposal

Description

Allow to bypass opening bracket validation when XmlParserOptions.IsSuppressingErrors is set to true. Just a single-line change

Background

Provide any additional background for the feature. e.g., user scenarios.

Parsing technically malformed XML-elements that contain opening diamond brackets in attributes.

  • This is a somewhat known issue
  • '<' handling is buried deep in .net's built-in System.Xml.XmlReader, so invalid documents can't be parsed without writing a substantial amount of code.
  • Correcting the input on the fly basically requires writing an own tokenizer which seems quite excessive for something as simple as ignoring an opening bracket that should be escaped.
  • Documentation for XmlParserOptions.IsSuppressingErrors explicitly states, that enabling this option may break the document and it's disabled by default anyway, so changing this behavior won't be an unpleasant surprise for the users.

Currently, trying to parse <condition value="a < b" /> would raise an exception even if error suppression is enabled. Proposed change would allow to successfully parse it.

Xml prefixed attributes do not appropriately find namespace

Bug Report

During Xml parsing, attributes with an xml prefix ought to be associated with the xml namespace, even if such a namespace is not explicitly declared. According to https://www.w3.org/TR/REC-xml-names/#ns-decl :

The prefix xml is by definition bound to the namespace name http://www.w3.org/XML/1998/namespace. It MAY, but need not, be declared, and MUST NOT be bound to any other namespace name. Other prefixes MUST NOT be bound to this namespace name, and it MUST NOT be declared as the default namespace.

Prerequisites

  • [/] Can you reproduce the problem in a MWE?
  • [/] Are you running the latest version of AngleSharp?
  • [/] Did you check the FAQs to see if that helps you?
  • [/] Are you reporting to the correct repository? (there are multiple AngleSharp libraries, e.g., AngleSharp.Css for CSS support)
  • [/] Did you perform a search in the issues?

For more information, see the CONTRIBUTING guide.

Description

Xml prefixed attributes ought to be associated with the Xml namespace even if it has not been explicitly declared.

Steps to Reproduce

var xmlParser = new XmlParser();
var doc = xmlParser.ParseDocument("<xml xml:lang=\"en\">Test</xml>");
using (var stringWriter = new StringWriter()){
	doc.ToHtml(stringWriter, new XhtmlMarkupFormatter());
	stringWriter.ToString().Dump();
}

Expected behavior: Output should be <xml xml:lang=\"en\">Test</xml>

Actual behavior: Output is <xml lang="en">Test</xml>

Compare with the output to the following linqpad script

var xmlParser = new XmlParser();
var doc = xmlParser.ParseDocument("<xml xmlns:xml=\"http://www.w3.org/XML/1998/namespace\" xml:lang=\"en\">Test</xml>");
using (var stringWriter = new StringWriter()){
	doc.ToHtml(stringWriter, new XhtmlMarkupFormatter());
	stringWriter.ToString().Dump();
}
	

Output: <xml xmlns:="http://www.w3.org/XML/1998/namespace" xml:lang="en">Test</xml>

Environment details: Win 10; .NET 6.0.15

Possible Solution

In the XmlDomBuilder we need to replace this code:
with this:

if (prefix.Is(NamespaceNames.XmlPrefix))
{
    ns = NamespaceNames.XmlUri;
}
else if (!prefix.Is(NamespaceNames.XmlNsPrefix))
{
    ns = CurrentNode.LookupNamespaceUri(prefix);
}

A PR can be made to this effect with a test by which I confirmed the bug and solution.

Latest devel branch: Time needed for deep cloning seems to scale exponentially

Bug Report

Prerequisites

  • Can you reproduce the problem in a MWE?
  • Are you running the latest version of AngleSharp? - WITH 8f3d6f5 applied
  • Did you check the FAQs to see if that helps you?
  • Are you reporting to the correct repository? (there are multiple AngleSharp libraries, e.g., AngleSharp.Css for CSS support)
  • Did you perform a search in the issues?

Description

It takes ages for deeply nested elements to be cloned.

This might be introduced by 8f3d6f5

Steps to Reproduce

Clone this (and don't ask...):

<p>
    <em>
        <em>
            <em>
                <em>
                    <span>
                        <em>
                            <em>
                                <span>
                                    <em>
                                        <em>
                                            <em>
                                                <em>
                                                    <em>
                                                        <em>
                                                            <em>
                                                                <span>
                                                                    <em>
                                                                        <em>
                                                                            <span>
                                                                                <em>
                                                                                    <em>
                                                                                        <em>
                                                                                            <em>
                                                                                                <em>
                                                                                                    <span>
                                                                                                        <em>
                                                                                                            <em>
                                                                                                                <span>Hi there</span>
                                                                                                            </em>
                                                                                                        </em>
                                                                                                    </span>
                                                                                                </em>
                                                                                            </em>
                                                                                        </em>
                                                                                    </em>
                                                                                </em>
                                                                            </span>
                                                                        </em>
                                                                    </em>
                                                                </span>
                                                            </em>
                                                        </em>
                                                    </em>
                                                </em>
                                            </em>
                                        </em>
                                    </em>
                                </span>
                            </em>
                        </em>
                    </span>
                </em>
            </em>
        </em>
    </em>
</p>

It takes 35 seconds. Take something deeper and it takes much longer.

Expected behavior: [What you expected to happen]

I expect it to be fast :)

Actual behavior: [What actually happened]

Sloooooooooooooooooooooooooooooooooow.

Possible Solution

It seems namespace resolution has something to do with it. It walks the parent hierarchy for each level in a non-efficient way?
image

Random call stack when pressing pause in VS:

>	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.GetNamespaceUri(AngleSharp.Dom.IElement element) Line 88	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.LocateNamespaceFor(AngleSharp.Dom.IElement element, string prefix) Line 56	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.LocateNamespaceFor(AngleSharp.Dom.IElement element, string prefix) Line 74	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.GetNamespaceUri(AngleSharp.Dom.IElement element) Line 103	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.LocateNamespaceFor(AngleSharp.Dom.IElement element, string prefix) Line 56	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.GetNamespaceUri(AngleSharp.Dom.IElement element) Line 103	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.LocateNamespaceFor(AngleSharp.Dom.IElement element, string prefix) Line 56	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.GetNamespaceUri(AngleSharp.Dom.IElement element) Line 103	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.LocateNamespaceFor(AngleSharp.Dom.IElement element, string prefix) Line 56	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.GetNamespaceUri(AngleSharp.Dom.IElement element) Line 103	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.LocateNamespaceFor(AngleSharp.Dom.IElement element, string prefix) Line 56	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.GetNamespaceUri(AngleSharp.Dom.IElement element) Line 103	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.LocateNamespaceFor(AngleSharp.Dom.IElement element, string prefix) Line 56	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.GetNamespaceUri(AngleSharp.Dom.IElement element) Line 103	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.LocateNamespaceFor(AngleSharp.Dom.IElement element, string prefix) Line 56	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.GetNamespaceUri(AngleSharp.Dom.IElement element) Line 103	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.LocateNamespaceFor(AngleSharp.Dom.IElement element, string prefix) Line 56	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.GetNamespaceUri(AngleSharp.Dom.IElement element) Line 103	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.LocateNamespaceFor(AngleSharp.Dom.IElement element, string prefix) Line 56	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.LocateNamespaceFor(AngleSharp.Dom.IElement element, string prefix) Line 74	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.LocateNamespaceFor(AngleSharp.Dom.IElement element, string prefix) Line 74	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.GetNamespaceUri(AngleSharp.Dom.IElement element) Line 103	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.LocateNamespaceFor(AngleSharp.Dom.IElement element, string prefix) Line 56	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.LocateNamespaceFor(AngleSharp.Dom.IElement element, string prefix) Line 74	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.GetNamespaceUri(AngleSharp.Dom.IElement element) Line 103	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.LocateNamespaceFor(AngleSharp.Dom.IElement element, string prefix) Line 56	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.GetNamespaceUri(AngleSharp.Dom.IElement element) Line 103	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.LocateNamespaceFor(AngleSharp.Dom.IElement element, string prefix) Line 56	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.LocateNamespaceFor(AngleSharp.Dom.IElement element, string prefix) Line 74	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.LocateNamespaceFor(AngleSharp.Dom.IElement element, string prefix) Line 74	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.GetNamespaceUri(AngleSharp.Dom.IElement element) Line 103	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.LocateNamespaceFor(AngleSharp.Dom.IElement element, string prefix) Line 56	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.GetNamespaceUri(AngleSharp.Dom.IElement element) Line 103	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.LocateNamespaceFor(AngleSharp.Dom.IElement element, string prefix) Line 56	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.LocateNamespaceFor(AngleSharp.Dom.IElement element, string prefix) Line 74	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.GetNamespaceUri(AngleSharp.Dom.IElement element) Line 103	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.LocateNamespaceFor(AngleSharp.Dom.IElement element, string prefix) Line 56	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.LocateNamespaceFor(AngleSharp.Dom.IElement element, string prefix) Line 74	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.GetNamespaceUri(AngleSharp.Dom.IElement element) Line 103	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.LocateNamespaceFor(AngleSharp.Dom.IElement element, string prefix) Line 56	C#
 	AngleSharp.dll!AngleSharp.Dom.ElementExtensions.GetNamespaceUri(AngleSharp.Dom.IElement element) Line 103	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 51	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Element.CloneElement(AngleSharp.Dom.Element element, AngleSharp.Dom.Document owner, bool deep) Line 637	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 52	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Element.CloneElement(AngleSharp.Dom.Element element, AngleSharp.Dom.Document owner, bool deep) Line 637	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 52	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Element.CloneElement(AngleSharp.Dom.Element element, AngleSharp.Dom.Document owner, bool deep) Line 637	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 52	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Element.CloneElement(AngleSharp.Dom.Element element, AngleSharp.Dom.Document owner, bool deep) Line 637	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 52	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Element.CloneElement(AngleSharp.Dom.Element element, AngleSharp.Dom.Document owner, bool deep) Line 637	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 52	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Element.CloneElement(AngleSharp.Dom.Element element, AngleSharp.Dom.Document owner, bool deep) Line 637	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 52	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Element.CloneElement(AngleSharp.Dom.Element element, AngleSharp.Dom.Document owner, bool deep) Line 637	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 52	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Element.CloneElement(AngleSharp.Dom.Element element, AngleSharp.Dom.Document owner, bool deep) Line 637	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 52	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Element.CloneElement(AngleSharp.Dom.Element element, AngleSharp.Dom.Document owner, bool deep) Line 637	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 52	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Element.CloneElement(AngleSharp.Dom.Element element, AngleSharp.Dom.Document owner, bool deep) Line 637	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 52	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Element.CloneElement(AngleSharp.Dom.Element element, AngleSharp.Dom.Document owner, bool deep) Line 637	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 52	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Element.CloneElement(AngleSharp.Dom.Element element, AngleSharp.Dom.Document owner, bool deep) Line 637	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 52	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Element.CloneElement(AngleSharp.Dom.Element element, AngleSharp.Dom.Document owner, bool deep) Line 637	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 52	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Element.CloneElement(AngleSharp.Dom.Element element, AngleSharp.Dom.Document owner, bool deep) Line 637	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 52	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Element.CloneElement(AngleSharp.Dom.Element element, AngleSharp.Dom.Document owner, bool deep) Line 637	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 52	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Element.CloneElement(AngleSharp.Dom.Element element, AngleSharp.Dom.Document owner, bool deep) Line 637	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 52	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Element.CloneElement(AngleSharp.Dom.Element element, AngleSharp.Dom.Document owner, bool deep) Line 637	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 52	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Element.CloneElement(AngleSharp.Dom.Element element, AngleSharp.Dom.Document owner, bool deep) Line 637	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 52	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Element.CloneElement(AngleSharp.Dom.Element element, AngleSharp.Dom.Document owner, bool deep) Line 637	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 52	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Element.CloneElement(AngleSharp.Dom.Element element, AngleSharp.Dom.Document owner, bool deep) Line 637	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 52	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Element.CloneElement(AngleSharp.Dom.Element element, AngleSharp.Dom.Document owner, bool deep) Line 637	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 52	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Element.CloneElement(AngleSharp.Dom.Element element, AngleSharp.Dom.Document owner, bool deep) Line 637	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 52	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Element.CloneElement(AngleSharp.Dom.Element element, AngleSharp.Dom.Document owner, bool deep) Line 637	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 52	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Element.CloneElement(AngleSharp.Dom.Element element, AngleSharp.Dom.Document owner, bool deep) Line 637	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 52	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Element.CloneElement(AngleSharp.Dom.Element element, AngleSharp.Dom.Document owner, bool deep) Line 637	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 52	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Element.CloneElement(AngleSharp.Dom.Element element, AngleSharp.Dom.Document owner, bool deep) Line 637	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 52	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Element.CloneElement(AngleSharp.Dom.Element element, AngleSharp.Dom.Document owner, bool deep) Line 637	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlElement.Clone(AngleSharp.Dom.Document owner, bool deep) Line 52	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.CloneNode(AngleSharp.Dom.Node target, AngleSharp.Dom.Document owner, bool deep) Line 855	C#
 	AngleSharp.dll!AngleSharp.Dom.Document.CloneDocument(AngleSharp.Dom.Document document, bool deep) Line 1535	C#
 	AngleSharp.Xml.dll!AngleSharp.Xml.Dom.XmlDocument.Clone(AngleSharp.Dom.Document owner, bool deep) Line 45	C#
 	AngleSharp.dll!AngleSharp.Dom.Node.Clone(bool deep) Line 593	C#
 	Contoso.Tests.dll!WikiTraccs.Tests.ToBeSortedTests.ToBeSortedTests.TestNestedStructure() Line 188	C#

Missing DTD in parsed document model

Bug Report

Description

I cannot see anything in the document model that seems to match the values defined in the DTD nor am I seeing the DTD when performing a round-trip on the XML. I was initially investigating self-closing tags and found Issue #11 . From there, I took the example code to test with and confirm it met my need. But I noticed my DTD wasn't getting written out. As far as I can tell, the DTD isn't brought into the parsed document.

Steps to Reproduce

(Tested in LINQPad)

  var xmlData = @"<?xml version=""1.0"" encoding=""UTF-8""?>
<Project Sdk=""Microsoft.NET.Sdk"">
    <ItemGroup>
        <PackageReference Include=""AngleSharp"" Version=""0.12.1""></PackageReference>
        <PackageReference Include=""AngleSharp.Xml"" Version=""0.12.1"" />
        <PackageReference Include=""AngleSharp.XPath"" Version=""1.1.4"" />
    </ItemGroup>   
</Project>";
  var xmlDoc = new XmlParser().ParseDocument(xmlData);

  using (var sw = new StringWriter())
  {
    xmlDoc.ToHtml(sw, xmlFormatter);

    Console.WriteLine(sw.ToString());
  }

Expected behavior: [What you expected to happen]

Output to look similar to

<?xml version="1.0" encoding="UTF-8"?>
<Project Sdk="Microsoft.NET.Sdk">
    <ItemGroup>
        <PackageReference Include="AngleSharp" Version="0.12.1" />
        <PackageReference Include="AngleSharp.Xml" Version="0.12.1" />
        <PackageReference Include="AngleSharp.XPath" Version="1.1.4" />
    </ItemGroup>  
</Project>

Actual behavior: [What actually happened]

Output is

<Project Sdk="Microsoft.NET.Sdk">
    <ItemGroup>
        <PackageReference Include="AngleSharp" Version="0.12.1" />
        <PackageReference Include="AngleSharp.Xml" Version="0.12.1" />
        <PackageReference Include="AngleSharp.XPath" Version="1.1.4" />
    </ItemGroup>  
</Project>

Environment details: [OS, .NET Runtime, ...]

Windows 10
LINQPad 7
AngleSharp 1.0.5 via NuGet
AngleSharp.Xml 1.0.0 via NuGet

Possible Solution

Am I missing some options/techniques to force the correct parse?

Invalid XML should not break parsing when IsSuppressingErrors = true

Bug Report

Prerequisites

  • [✓] Can you reproduce the problem in a MWE?
  • [✓] Are you running the latest version of AngleSharp?
  • [✓] Did you check the FAQs to see if that helps you?
  • [✓] Are you reporting to the correct repository? (there are multiple AngleSharp libraries, e.g., AngleSharp.Css for CSS support)
  • [✓] Did you perform a search in the issues?

For more information, see the CONTRIBUTING guide.

Description

When using IsSuppressingErrors = true in XmlParserOptions an exception is thrown when trying to parse an invalid XML.

The Stacktrace:

AngleSharp.Xml.Parser.XmlParseException: Error while parsing the provided XML document.
   at AngleSharp.Xml.Parser.XmlTokenizer.TagSelfClosing(XmlTagToken tag)
   at AngleSharp.Xml.Parser.XmlDomBuilder.ParseAsync(XmlParserOptions options, CancellationToken cancelToken)
   at AngleSharp.Xml.Parser.XmlParser.ParseAsync(XmlDocument document, CancellationToken cancel)

Steps to Reproduce

Given the following XML:

<P>
    <P>
        <FONT FACE="calibri" SIZE="14.666666666666666" COLOR="#000000"></FONT>
    </P>
    <P>
        <FONT FACE="calibri" SIZE="14.666666666666666" COLOR="#000000"></FONT>
    </P>
    <P>
        <FONT FACE="calibri" SIZE="14.666666666666666" COLOR="#0000ff">
            <U>
                <https://some.url.example.com></U>
            </FONT>
            <FONT FACE="calibri" SIZE="14.666666666666666" COLOR="#000000">
                <B></B>
            </FONT>
            <FONT FACE="calibri" SIZE="14.666666666666666" COLOR="#000000"></FONT>
        </P>
        <P>
            <FONT FACE="calibri" SIZE="14.666666666666666" COLOR="#000000"></FONT>
        </P>
        <P>
            <FONT FACE="calibri" SIZE="14.666666666666666" COLOR="#000000"></FONT>
        </P>
    </P>

The problem is the missing closing tag of the first <P>.
When parsing the xml like so, the exception from the description above is thrown:

var xml = "xml from above";
var config = Configuration
                    .Default
                    .WithXml();
var context = BrowsingContext.New(config);

var parser = new XmlParser(new XmlParserOptions { IsSuppressingErrors = true }, context);
var document = await parser.ParseDocumentAsync(xml, cancellationToken);
var html = document.ToHtml();

I know this sounds quite stupid, but I need to actually parse invalid XML data and convert it to HTML afterwards.
Is there some way to parse and/or fix an invalid XML with AngleSharp.Xml?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.