GithubHelp home page GithubHelp logo

impredicative / irc-rss-feed-bot Goto Github PK

View Code? Open in Web Editor NEW
27.0 5.0 6.0 1.34 MB

Dockerized IRC bot to post RSS/Atom and scraped HTML/JSON/CSV feeds to channels

Home Page: https://hub.docker.com/r/ascensive/irc-rss-feed-bot

License: GNU Affero General Public License v3.0

Dockerfile 0.69% Python 97.67% Makefile 1.29% Shell 0.34%
irc-bot irc-rss-bot rss

irc-rss-feed-bot's People

Contributors

impredicative avatar jpgninja avatar luk3yx avatar worker701 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

irc-rss-feed-bot's Issues

Migrate to using GitHub Actions for docker builds

  • Use GitHub Actions to build and push docker builds to Docker Hub.
  • Permanently remove the obsolete configuration in Docker Hub to build the image.
  • Permanently remove the permissions granted to Docker Hub to access the repo.

Upgrade to using Python 3.9

upgraded python to latest, so on 3.9.2 now and found that irc-rss-feed-bot installer breaks due to hext not being able to be installed during pip install -r requirements.txt as there is no 3.9 support at this time for it.

irc-rss-feed-bot$ pip install hext
ERROR: Could not find a version that satisfies the requirement hext
ERROR: No matching distribution found for hext

hext is aware - html-extract/hext#18

just noting it here for others, feel free to close

Implement spidering by following specific links

SSRN follower:

<div class="results-header">
<div class="pagination">
<ul>
<li @text="..."/>
<li><a class="jeljour_pagination_number" @text:prepend("https://papers.ssrn.com/sol3/Jeljour_results.cfm?form_name=journalBrowse&journal_id=3526423&Network=no&lim=false&npage="):link/></li>
<li class="next"/>
</ul>
</div>
</div>

missing features / questions

I have been using rss-synd (eggdrop tcl script) for ages but was looking to move away from tcl.
I stumbled upon this bot, and while it looks like there is lots of features I am not seeing some things and figured I'd post here to check.

  • Is there SASL support for bot authentication to irc server

  • Ability to post the same feed to multiple channels.

  • Ability to call command in channel to re-post last x items

  • Ability to set specifically how many items are shared on an update
    I see the new config option to set how many items on the feed are share - any plans to make this more flexible, for example if I wanted to do 5 items (without having to modify the sourcecode)

Support user configured global defaults

Instead of using YAML anchors which are not very easy to understand (or popular) and which do add their own boilerplate, it would be very useful to make defaults configurable.

Example:

defaults:
      period: 0.5
      shorten: false
      new: 0

Adding something like this would allow users to avoid having to either repeat themselves or use the yaml anchors.

Allow customizing message config

Intended config:

message:
    summary: true
MESSAGE_FORMAT: Final = "[{feed}] {title} → {url}"
MESSAGE_FORMAT: Final = "[{feed}] {title} ➤ {summary} → {url}"

If channel has a style and if summary is enabled, then title is to be underlined.

Replace in all places in config where {summary} is explicitly specified.

Provide sample docker-compose.yml

I can't for the life of me get this bot working. Some documentation on building and running would be very helpful

  • The README mentions a docker-compose.yml which is nowhere to be found
  • I managed to build the container with sudo docker build -t irc-rss-feed-bot:latest . and (unsuccessfylly) run it with irc-rss-feed-bot % sudo docker run -e BITLY_TOKENS={TOKEN} -e IRC_PASSWORD={PASSOWORD} -v $(pwd)/config:/config/ irc-rss-feed-bot and would be helpful if these commands (or some other) was listed somewhere in the README
  • I did get a peewee.OperationalError: unable to open database file when running the above command, but/and I have no posts.v2.dbeither in my root-folder or my config-folder and have given the system access with chmod a+w ./irc-rss-feed-bot/posts.v2.db
  • It would be useful if a folder with blank config.yml and a secrets.env files would be already in place, or a tree structure to illustrate where they should be located. I messed around with this too much
irc-rss-feed-bot repository
├── config
│   ├── config.yaml
│   └── secrets.env
├── Dockerfile
...
├── ircrssfeedbot
...

Im sure this is just pebkac, but this looks like a nice bot and im sure I can't be alone in not figuring this out :-)

Support custom foreground colour for a feed name

It would be cool if you could format the message it sends per feed.

There would be an optional "message-format" variable you can set. Which allows you to customize the message it sends. This can also be used to add colour support etc. Need to see how colours are done in miniirc for that part I'm unsure on.

Config would look something like this

host: chat.freenode.net
ssl_port: 6697
nick: MyFeed[bot]
alerts_channel: '##mybot-alerts'
mode:
feeds:
  "#some_chan1":
    j:AJCN:
      url: https://academic.oup.com/rss/site_6122/3981.xml
      period: 24
      message-format: "[{feedname}] {title} -> {url}"
      blacklist:
        title:
          - ^Calendar\ of\ Events$
    MedicalXpress:nutrition:
      url: https://medicalxpress.com/rss-feed/search/?search=nutrition
    r/FoodNerds:
      url: https://www.reddit.com/r/FoodNerds/new/.rss
      shorten: false
      message-format: "{title} -> {url}"
      sub:
        url:
          pattern: ^https://www\.reddit\.com/r/.+?/comments/(?P<id>.+?)/.+$
          repl: https://redd.it/\g<id>

Support command to read specific feed immediately

Support a command to read a specific named feed in a specific channel immediately. For example:

AdminNick> FeedBot: read Cointelegraph

This should read the Cointelegraph feed immediately This assumes that there is such a feed configured for the current channel. Ideally any disk cache for the feed should also be bypassed.

Option to change the order?

If two new items are added to an input feed, then it gets pulled and those items are messaged through, (it appears) the bot outputs them in the order those items are located in the feed, i.e., the first/latest entry at the top is output first, and an entry dated a few days ago (though only added to the input feed now) is output second.

I think it would make more sense if the items items were recounted to the channel in the reverse order, so the newest/latest entry would come out last. Does that make sense?

Getting 503 trying to get LinuxSecurity RSS feed, failed 1431 consecutive times

I have feed setup to get the following RSS feed every 30 minutes, but its failed 1431 consecutive times, if I exec curl inside the container after installing curl, the web url always returns fine. I wonder if I need a user agent override or something to that effect to get this working.

RSS: https://linuxsecurity.com/linuxsecurity_advisories.xml
Tag: 0.10.1

Current Config

LinuxSecurity:
      url: https://linuxsecurity.com/linuxsecurity_advisories.xml
      period: 0.5
      style:
        name:
          fg: green
      sub:
        title:
          pattern: '>$'
          repl: ''

My test to prove the docker container can get to the curl fine.

docker exec -it -u root ircbotsecurityfeed bash
apt update && apt install curl -y
curl https://linuxsecurity.com/linuxsecurity_advisories.xml works fine every time

Lock released twice in bot.py

The lock is released here but not acquired again:

if sleep_time == 0:
break # Lock will be released later after posting messages.
self._outgoing_msg_lock.release() # Releasing lock before sleeping.
log.info(
"Will wait %s for channel inactivity to post %s.", timedelta_desc(sleep_time), feed
)
time.sleep(sleep_time)
log.debug("Checking IRC client connection state.")

AttributeError: 'types.SimpleNamespace' object has no attribute 'identity'

I got the bot joining the channels but nothing was coming through. The log has a lot of these;

2020-11-09 22:45:07,971 INFO FeedReader-#libreav-libreav-7f9202d97700:ircrssfeedbot.bot:272:_read_feed: Retrieved in 0.6s the feed libreav of #libreav with 10 approved entries via 1 URLs read bypassing cache.
2020-11-09 22:45:07,975 INFO ChannelMessenger-#libreav-7f9203598700:ircrssfeedbot.db:85:select_unposted_for_channel: Returning 10 unposted URLs from the database for channel #libreav having ignored feed libreav out of 10 URLs.
2020-11-09 22:45:07,976 INFO ChannelMessenger-#libreav-7f9203598700:ircrssfeedbot.feed:413:post: Posting 10 entries for feed libreav of #libreav.
2020-11-09 22:45:09,979 ERROR ChannelMessenger-#libreav-7f9203598700:ircrssfeedbot.bot:324:alerter: Error processing feed libreav of #libreav: 'types.SimpleNamespace' object has no attribute 'identity'
Traceback (most recent call last):
  File "/app/ircrssfeedbot/bot.py", line 220, in _msg_channel
    feed.post()
  File "/app/ircrssfeedbot/feed.py", line 419, in post
    msg = entry.message
  File "/app/ircrssfeedbot/entry.py", line 104, in message
    format_map = dict(identity=config.runtime.identity, channel=self.feed_reader.channel, feed=_style_name(self.feed_reader.name), url=self.short_url or self.long_url)
AttributeError: 'types.SimpleNamespace' object has no attribute 'identity'

FWIW, the feed.

AttributeError: 'types.SimpleNamespace' object has no attribute 'nick_casefold'

Hi there, thanks for all your work. I'm attempting to run this via a docker-compose file as suggested in the readme. All permissions on config/secrets seem fine, the bot logs in fine with the configured nick and admin, but nothing seems to work and it prints this exception after successful login. Admin commands also do not work (when I type 'exit' it spits out this exception again).

irc-rss-feed-bot    | Exception in thread Thread-13:
irc-rss-feed-bot    | Traceback (most recent call last):
irc-rss-feed-bot    |   File "/usr/local/lib/python3.8/threading.py", line 932, in _bootstrap_inner
irc-rss-feed-bot    |     self.run()
irc-rss-feed-bot    |   File "/usr/local/lib/python3.8/threading.py", line 870, in run
irc-rss-feed-bot    |     self._target(*self._args, **self._kwargs)
irc-rss-feed-bot    |   File "/app/ircrssfeedbot/bot.py", line 345, in _handle_join
irc-rss-feed-bot    |     if (user.casefold() != config.runtime.nick_casefold) or (channel.casefold() not in config.INSTANCE["channels:casefold"]):
irc-rss-feed-bot    | AttributeError: 'types.SimpleNamespace' object has no attribute 'nick_casefold'

Is this a bug or a misconfiguration somehow on my part? Thanks!

_multiprocessing.SemLock._rebuild(*state) FileNotFoundError: [Errno 2] No such file or directory

Here's a example:

Apr 09 08:20:14 bot python3[16914]: 2024-04-09 08:20:14,951 INFO FeedReader-#somechannel-Debian - Security-7ff0137fe6c0:ircrssfeedbot.bot:276:_read_feed: Retrieved in 0.2s the feed Debian - Security of #somechannel with 30 approved entries via 1 URLs read from cache having matching etag.
Apr 09 08:20:15 bot python3[17537]: Traceback (most recent call last):
Apr 09 08:20:15 bot python3[17537]:   File "<string>", line 1, in <module>
Apr 09 08:20:15 bot python3[17537]:   File "/usr/lib/python3.11/multiprocessing/spawn.py", line 120, in spawn_main
Apr 09 08:20:15 bot python3[17537]:     exitcode = _main(fd, parent_sentinel)
Apr 09 08:20:15 bot python3[17537]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^
Apr 09 08:20:15 bot python3[17537]:   File "/usr/lib/python3.11/multiprocessing/spawn.py", line 130, in _main
Apr 09 08:20:15 bot python3[17537]:     self = reduction.pickle.load(from_parent)
Apr 09 08:20:15 bot python3[17537]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Apr 09 08:20:15 bot python3[17537]:   File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 110, in __setstate__
Apr 09 08:20:15 bot python3[17537]:     self._semlock = _multiprocessing.SemLock._rebuild(*state)
Apr 09 08:20:15 bot python3[17537]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Apr 09 08:20:15 bot python3[17537]: FileNotFoundError: [Errno 2] No such file or directory

That occurred after a successful feed polling

Cache directory "/app/.ircrssfeedbot_cache/URLReader"

Apparently something changed since last time I used the feed bot and it no longer starts inside containers. It is bit weird because I did not see /app as being configured anywhere.

The deploy guidelines indicate to use `image: ascensive/irc-rss-feed-bot, which makes me believe that maybe that image is not the results of building https://github.com/impredicative/irc-rss-feed-bot/blob/master/Dockerfile ?

Does this means we have no published image that we can just run with building? In the end a tool like this does not need more than mounting its config.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/app/ircrssfeedbot/__main__.py", line 6, in <module>
    from ircrssfeedbot.main import main
  File "/app/ircrssfeedbot/main.py", line 11, in <module>
    from ircrssfeedbot.bot import Bot
  File "/app/ircrssfeedbot/bot.py", line 17, in <module>
    from . import config, publishers, searchers
  File "/app/ircrssfeedbot/publishers/__init__.py", line 2, in <module>
    from . import github
  File "/app/ircrssfeedbot/publishers/github.py", line 12, in <module>
    from ._base import BasePublisher
  File "/app/ircrssfeedbot/publishers/_base.py", line 12, in <module>
    from ..feed import FeedEntry
  File "/app/ircrssfeedbot/feed.py", line 21, in <module>
    from .url import URLReader
  File "/app/ircrssfeedbot/url.py", line 106, in <module>
    class URLReader:
  File "/app/ircrssfeedbot/url.py", line 109, in URLReader
    _CACHE = diskcache.Cache(directory=config.DISKCACHE_PATH / "URLReader", timeout=2, size_limit=config.DISKCACHE_SIZE_LIMIT)
  File "/usr/local/lib/python3.8/site-packages/diskcache/core.py", line 481, in __init__
    raise EnvironmentError(
PermissionError: [Errno 13] Cache directory "/app/.ircrssfeedbot_cache/URLReader" does not exist and could not be created

Avoid old posts when new feed is added

I discovered that when a new feed is added messages from it are automatically announced to the irc channels, even if these are many months old.

It would be better to avoid announcing old stuff on the first execution.

I would go so far to assume that old means, older than the pull period, maybe?

Allow highlighting words or regex matches

Support a feed-specific highlight config param which contains a list of words or regexes that are matched against an entry's title. Underline the matches in the title. Support all stylers.

Allow self sign and invalid certificates

I have an IRC server I would like to connect to, but it uses a self sign certificate. The configuration of this bot doesn't seem to have an option to accept invalid certificates. As this is often outside of the control of a user, it would be helpful to allow for it. I received the following error when attempting to connect to such a server:

ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1129)

Thank you.

User confusion on setup steps

I'm trying to get setup for the first time, dusting off my docker skills, not sure if I might have missed a step?

$ docker images                                                                                                                                                                                       
REPOSITORY                   TAG                 IMAGE ID            CREATED             SIZE                                                                                                                     
<none>                       <none>              fe5a53c5e6c8        11 hours ago        507MB                                                                                                                    
ascensive/irc-rss-feed-bot   latest              04ec735e21e1        11 days ago         395MB                                                                                                                    
python                       3.8-slim-buster     41dcfe21e8fd        2 weeks ago         113MB                                                                                                                    

$ docker run -v $(pwd)/docker/config.yaml:/config/config.yaml 04ec735e21e1                                                                                                                            
2020-11-04 04:20:47,983 INFO MainThread-7f3639f8f740:ircrssfeedbot.main:29:load_instance_config: Read user configuration file /config/config.yaml                                                                 
2020-11-04 04:20:47,983 INFO MainThread-7f3639f8f740:ircrssfeedbot.main:48:load_instance_config: The excerpted configuration for 1 channels with 1 feeds having 1 unique URLs is:                                 
{'host': 'chat.freenode.net', 'ssl_port': 6697, 'nick': 'lavb0t', 'admin': '[email protected]', 'alerts_channel': '#libreavargh', 'mode': '+igR', 'defaults': {'new': 'all'}}
2020-11-04 04:20:47,983 INFO MainThread-7f3639f8f740:ircrssfeedbot.main:59:load_instance_config: #libreav has 1 feeds: libreav
2020-11-04 04:20:47,983 INFO MainThread-7f3639f8f740:ircrssfeedbot.main:68:load_instance_config: #libreav has no foreground colors in use.
2020-11-04 04:20:47,991 INFO MainThread-7f3639f8f740:ircrssfeedbot.bot:41:__init__: Initializing bot as: uid=999(app) gid=999(app) groups=999(app)                                                                
Traceback (most recent call last):                                                                                                                                                                                
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 3035, in connect
    self._state.set_connection(self._connect())
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 3370, in _connect
    conn = sqlite3.connect(self.database, timeout=self._timeout,
sqlite3.OperationalError: unable to open database file

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 3097, in execute_sql
    cursor = self.cursor(commit)
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 3081, in cursor
    self.connect()
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 3038, in connect
    self._initialize_connection(self._state.conn)
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 2873, in __exit__
    reraise(new_type, new_type(exc_value, *exc_args), traceback)
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 183, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 3035, in connect
    self._state.set_connection(self._connect())
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 3370, in _connect
    conn = sqlite3.connect(self.database, timeout=self._timeout,
peewee.OperationalError: unable to open database file

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/app/ircrssfeedbot/__main__.py", line 8, in <module>
    main()
  File "/app/ircrssfeedbot/main.py", line 95, in main
    Bot()
  File "/app/ircrssfeedbot/bot.py", line 45, in __init__
    self._db = Database()
  File "/app/ircrssfeedbot/db.py", line 43, in __init__
    self._db.create_tables([Post])
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 3286, in create_tables
    model.create_table(**options)
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 6595, in create_table
    cls._schema.create_all(safe, **options)
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 5731, in create_all
    self.create_table(safe, **table_options)
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 5586, in create_table
    self.database.execute(self._create_table(safe=safe, **options))
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 3112, in execute
    return self.execute_sql(sql, params, commit=commit)
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 3106, in execute_sql
    self.commit()
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 2873, in __exit__
    reraise(new_type, new_type(exc_value, *exc_args), traceback)
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 183, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 3097, in execute_sql
    cursor = self.cursor(commit)
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 3081, in cursor
    self.connect()
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 3038, in connect
    self._initialize_connection(self._state.conn)
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 2873, in __exit__
    reraise(new_type, new_type(exc_value, *exc_args), traceback)
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 183, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 3035, in connect
    self._state.set_connection(self._connect())
  File "/usr/local/lib/python3.8/site-packages/peewee.py", line 3370, in _connect
    conn = sqlite3.connect(self.database, timeout=self._timeout,
peewee.OperationalError: unable to open database file

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.