GithubHelp home page GithubHelp logo

Link in the text about rss-parser HOT 9 CLOSED

dhvcc avatar dhvcc commented on August 23, 2024
Link in the text

from rss-parser.

Comments (9)

dhvcc avatar dhvcc commented on August 23, 2024 1

I think it is being discarded by bs4 because is a tag and bs4 only gives back the text between the tag. Looking at the documentation, it should give back the raw text with tags and I do not see anywhere in the code where the rss parser deletes them. Also @dhvcc , since January you did not do a release, I would appreciate one on Pypi when this issue is resolved! 👍 I did not even recognise that it did not update because I kept using my hacky workaround in my code 😅

Yeah, will do one for sure. Thanks for using the package

from rss-parser.

BBArikL avatar BBArikL commented on August 23, 2024 1

So, while looking back at the code to make it ready for the next release I remembered that even if bs4 does strip out html tags ( with href and ), the parser does make a search for them so it wont appear in the description itself, but it should appear as one of the values of the attribute description_links. But if you really want to know the text that was marked by the tags then @dhvcc 's example could work.
Screenshot_20220422_224552

from rss-parser.

dhvcc avatar dhvcc commented on August 23, 2024

Could you please send an example of this RSS feed?

from rss-parser.

Phantomik avatar Phantomik commented on August 23, 2024

https://philthyboys.ru/data/rss
image

from rss-parser.

dhvcc avatar dhvcc commented on August 23, 2024

https://philthyboys.ru/data/rss
image

I'll check this one today

from rss-parser.

BBArikL avatar BBArikL commented on August 23, 2024

I think it is being discarded by bs4 because is a tag and bs4 only gives back the text between the tag. Looking at the documentation, it should give back the raw text with tags and I do not see anywhere in the code where the rss parser deletes them. Also @dhvcc , since January you did not do a release, I would appreciate one on Pypi when this issue is resolved! 👍 I did not even recognise that it did not update because I kept using my hacky workaround in my code 😅

from rss-parser.

dhvcc avatar dhvcc commented on August 23, 2024

Hey @Phantomik. As far as I can see - these are not valid "tags" in RSS, hence, they are treated as text and escaped. You can try to unescape them if you really want too, but then you'd have to work with bs4 manually. Something like this could do the trick (note, that this is a mock up of a code and I did not test it myself)

from xml.sax.saxutils import unescape
from bs4 import BeautifulSoup
...
feed = parser.parse()
# Iteratively print feed items
for item in feed.feed:
    soup = BeautifulSoup(unescape(item.description))
    print(soup.find_all('a'))

from rss-parser.

dhvcc avatar dhvcc commented on August 23, 2024

Please, make sure to close the issue if you fell like it was resolved. I'll push a release after that @BBArikL

from rss-parser.

Phantomik avatar Phantomik commented on August 23, 2024

Thank you @dhvcc . The guys and I decided to leave it as it is for now. And write your own parser so that there are links and pictures. Later I will check your guess and write.

from rss-parser.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.