Comments (11)
The dom
is a document - if you would use it on dom.Body
or dom.DocumentElement
it should actually work.
from anglesharp.
@fpavlic this may not interest you anymore but you could remove elements you do not want from the DOM before calling .Text()
. So find and remove all <script>
and <style>
elements first.
from anglesharp.
The Text<T>
method does. You get it by using AngleSharp.Extensions
- as with all other extension methods. Hope this helps you @ricardobrandao!
from anglesharp.
The HTML DOM has a property called textContent (this is TextContent in AngleSharp) for node objects. Usually if you use this on e.g. the document root (HTML) element it should give you the whole textual content.
But beware - there might be an unusual amount of spaces and newlines in there, since those are not getting stripped out by the parser - that you do not see most of them in rendered content is a feature of the HTML renderer.
Tell me if it does fit your needs, otherwise I will implement another method to deal with this (outside of the official specification).
from anglesharp.
thanks for quick reply, excellent it worked, but the problem is it also gives you CSS and Scripts, we should strip them as we are only interested in Text inside body
from anglesharp.
Hm this is how real browsers also behave. I will write a method "ToText()" which will already give you stripped content.
from anglesharp.
thanks a lot :)
from anglesharp.
@FlorianRappl this was not implemented yet, right? I didn't found anything that extracts the text from a INodeList or even from the IHtmlDocument.
from anglesharp.
I saw that one but I wasn't able to make it work then. For instance, this test fails (it retuns a null value):
[Fact]
public void StripHtmlTagsTest1()
{
var input = "<div><span>Hello, world! Some <a href=\"#\">link.</a></span></div>";
var dom = new HtmlParser().Parse(input);
Assert.Equal("Hello, world! Some link.", dom.Text());
}
However I've found now that this one does the trick:
[Fact]
public void StripHtmlTagsTest2()
{
var input = "<div><span>Hello, world! Some <a href=\"#\">link.</a></span></div>";
var nodeList = new HtmlParser().ParseFragment(input, null);
Assert.Equal("Hello, world! Some link.", string.Concat(nodeList.Select(x => x.Text())));
}
Thanks for the quick reply :)
from anglesharp.
Is there a way to get just 'visible' content e.g. "DocumentElement.TextContent" without Scripts and Styles?
This is something @devmondo mentioned 2013th.
Thanks
from anglesharp.
AngleSharp can't decide which text is relevant for you and which isn't. It will just return what the spec says it should return.
from anglesharp.
Related Issues (20)
- Request for Support / Sponsorship HOT 1
- IHtmlDocument has IDisposable - for what?) HOT 1
- Redirect to Custom URL Scheme HOT 2
- Issues with Headers HOT 2
- Use libraries provided by framework HOT 3
- Provide repo link as part of nuget package HOT 8
- QuerySelectorAll problem HOT 1
- SemVer scheme in AngleSharp -alpha versions broken HOT 1
- Multipart/form-data support HOT 1
- IndexOutOfRangeException in AngleSharp.Common.ArrayPoolBuffer.Append HOT 1
- Getting Attributes for each Element HOT 1
- InvalidOperationException: Stack empty in AngleSharp.Html.Parser.HtmlDomBuilder HOT 1
- Attributes in Elements HOT 3
- Parser Issue Findings from Fuzzing HOT 4
- Additional Findings from Fuzzing HOT 3
- NullReferenceException when using own HttpClient HOT 1
- How to Change the Accept header in DocumentRequest HOT 1
- Link element source not loaded when element is appended to document
- AngleSharp 1.2.0-beta.410 can not be loaded in .Net Framework because of System.Memory version error HOT 3
- obsolete reference HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from anglesharp.