GithubHelp home page GithubHelp logo

sulguk's Introduction

Sulguk - HTML to telegram entities converter

PyPI version downloads license

Need to deliver formatted content to your bot clients? Having a hangover after trying to fit HTML into telegram? Beautifulsoup is too complicated and not helping with messages?

Try sulguk (술국, a hangover soup) - delivered since 1800s.

Problem

Telegram supports parse_mode="html", but:

  • Telegram processes spaces and new lines incorrectly. So we cannot format HTML source for more readability.
  • Amount of supported tags is very low
  • It does not ignore additional attributes in supported tags.

Let's imagine we have HTML like this:

<b>This is a demo of <a href="https://github.com/tishka17/sulguk">Sulguk</a></b>

  <u>Underlined</u>
  <i>Italic</i>
  <b>Bold</b>

This is how it is rendered in browser (expected behavior):

But this is how it is rendered in Telegram with parse_mode="html":

To solve this we can convert HTML to telegram entities with sulguk. So that's how it looks now:

Example

  1. Create your nice HTML:
<ol start="10">
    <li>some item</li>
    <li>other item</li>
</ol>
<p>Some <b>text</b> in a paragraph</p>
  1. Convert it into text and entities
result = transform_html(raw_html)
  1. Send it to telegram.

Depending on your library you may need to convert entities from dict into proper type

await bot.send_message(
    chat_id=CHAT_ID,
    text=result.text,
    entities=result.entities,
)

Example for aiogram users

  1. Add SulgukMiddleware to your bot
from sulguk import AiogramSulgukMiddleware

bot.session.middleware(AiogramSulgukMiddleware())
  1. Create your nice HTML:
<ol start="10">
    <li>some item</li>
    <li>other item</li>
</ol>
<p>Some <b>text</b> in a paragraph</p>
  1. Send it using sulguk as a parse_mode:
from sulguk import SULGUK_PARSE_MODE

await bot.send_message(
    chat_id=CHAT_ID,
    text=raw_html,
    parse_mode=SULGUK_PARSE_MODE,
)

Supported tags:

For all supported tags unknown attributes are ignored as well as unknown classes. Unsupported tags are raising an error.

Standard telegram tags (with some changes):

  • <a> - a hyperlink with href attribute
  • <b>, <strong> - a bold text
  • <i>, <em> - an italic text
  • <s>, <strike>, <del> - a strikethrough text
  • <u>, <ins> - an underlined text
  • <span> - an inline element with optional attribute class="tg-spoiler" to make a spoiler
  • <tg-spoiler> - a telegram spoiler
  • <pre> with optional class="language-<name>" - a preformatted block with code. <name> will be sent as a language attribute in telegram.
  • <code> - an inline preformatted element.

Note: In standard Telegram HTML you can set a preformatted text language nesting <code class="language-<name>"> in <pre> tag. This works when it is an only child. But any additional symbol outside of <code> breaks it. The same behavior is supported in sulguk. Otherwise, you can set the language on <pre> tag itself.

Additional tags:

  • <br/> - new line
  • <hr/> - horizontal line
  • <wbr/> - word break opportunity
  • <ul> - unordered list
  • <ol> - ordered list with optional attributes
    • reversed - to reverse numbers order
    • type (1/a/A/i/I) - to set numbering style
    • start - to set starting number
  • <li> - list item, with optional value attribute to change number. Nested lists have indentation
  • <div> - a block (not inline) element
  • <p> - a paragraph, emphasized with empty lines
  • <q> - a quoted text
  • <blockquote> - a block quote. Like a paragraph with indentation
  • <h1>-<h6> - text headers, styled using available telegram options
  • <noscirpt> - contents is shown as not scripting is supported
  • <cite>, <var> - italic
  • <progress>, <meter> are rendered using emoji (🟩🟩🟩🟨⬜️⬜️)
  • <kbd>, <samp> - preformatted text
  • <img> - as a link with picture emoji before. alt text is used if provided.

Tags which are treated as block elements (like <div>):

<footer>, <header>, <main>, <nav>, <section>

Tags which are treated as inline elements (like <span>):

<html>, <body>, <output>, <data>, <time>

Tags which contents is ignored:

<head>, <link>, <meta>, <script>, <style>, <template>, <title>

Command line utility for channel management

  1. Install with addons
pip install 'sulguk[cli]'
  1. Set environment variable BOT_TOKEN
export BOT_TOKEN="your telegram token"
  1. Send HTML file as a message to your channel. Additional files will be sent as comments to the first one. You can provide a channel name or a public link
sulguk send @chat_id file.html
  1. If you want to, edit using the link from shell or from your tg client. Edition of comments is supported as well.
sulguk edit 'https://t.me/channel/1?comment=42' file.html

sulguk's People

Contributors

birdi7 avatar bralbral avatar ilya-nikolaev avatar timchesko avatar tishka17 avatar vlkorsakov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

sulguk's Issues

Unexpected behavior in aiogram_middleware.py -> _transform_text_caption

Structure from readme section is used in code below.


import asyncio
import logging
import sys

from aiogram import Bot, Dispatcher, types
from sulguk import AiogramSulgukMiddleware, SULGUK_PARSE_MODE

CHAT_ID = 123345678
BOT_TOKEN = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXX"

dp = Dispatcher()


@dp.message()
async def echo_handler(message: types.Message, bot: Bot) -> None:
    await bot.copy_message(
        from_chat_id=message.chat.id, chat_id=CHAT_ID, message_id=message.message_id
    )


async def main() -> None:
    bot = Bot(BOT_TOKEN, parse_mode=SULGUK_PARSE_MODE)
    
    # if comment row below - it works.
    bot.session.middleware(AiogramSulgukMiddleware())

    await dp.start_polling(bot)


if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO, stream=sys.stdout)
    asyncio.run(main())

Error log:

...
  File "/home/bral/PycharmProjects/sulguk_issue/venv/lib/python3.11/site-packages/sulguk/wrapper.py", line 17, in transform_html
    transformer.feed(raw_html)
  File "/usr/lib/python3.11/html/parser.py", line 109, in feed
    self.rawdata = self.rawdata + data
                   ~~~~~~~~~~~~~^~~~~~
TypeError: can only concatenate str (not "NoneType") to str
...

Link to code is sulguk repo.

This reaction applied to all messages.

CI

  • ruff
  • black
  • mypy

many \n\n\n in some html code

<p><br></p><p><strong>Предложение:</strong></p><p><br></p><p>провести ....</p><p><br></p><p><strong>Обоснование:</strong></p><p><br></p>

this code generated by quilljs

it show 1 empty string in browser but 3 in telegram

chrome -
изображение
tg -
изображение

Middleware does not work on aiogram 3.4.1

Creating a bot:

def bot(self) -> Bot:
        session: AiohttpSession = AiohttpSession(
            json_loads=mjson.decode,
            json_dumps=mjson.encode
        )
        session.middleware(AiogramSulgukMiddleware())
        session.middleware(RetryRequestMiddleware())
        
        bot: Bot = Bot(
            token=self.settings.bot_token.get_secret_value(),
            session=session,
            default=DefaultBotProperties(
                parse_mode=SULGUK_PARSE_MODE
            )
        )
        
        return bot

Sending a message:

async def cmd_start(
    event: types.Message
) -> None:
    await event.answer("Hello!")

Ошибка:

[25.02.2024 13:28:54] ERROR | asyncio:                                          Task exception was never retrieved                                              future: <Task finished name='Task-8' coro=<BaseRequestHandler._background_feed_update() done, defined at /home/FMR/application/webhook.py:75> exception=TelegramBadRequest('Telegram server says - Bad Request: unsupported parse_mode')>       Traceback (most recent call last):                                                File "/home/FMR/application/webhook.py", line 76, in _background_feed_update      result = await self.dp.feed_raw_update(bot=bot, update=update, **self.data)   File "/home/FMR/venv/lib/python3.10/site-packages/aiogram/dispatcher/dispatcher.py", line 189, in feed_raw_update                                                 return await self.feed_update(bot=bot, update=parsed_update, **kwargs)        File "/home/FMR/venv/lib/python3.10/site-packages/aiogram/dispatcher/dispatcher.py", line 158, in feed_update                                                     response = await self.update.wrap_outer_middleware(                           File "/home/FMR/venv/lib/python3.10/site-packages/aiogram/dispatcher/middlewares/error.py", line 25, in __call__                                                  return await handler(event, data)                                             File "/home/FMR/venv/lib/python3.10/site-packages/aiogram/dispatcher/middlewares/user_context.py", line 27, in __call__                                           return await handler(event, data)                                             File "/home/FMR/venv/lib/python3.10/site-packages/aiogram/fsm/middleware.py", line 41, in __call__                                                                return await handler(event, data)                                             File "/home/FMR/venv/lib/python3.10/site-packages/aiogram/dispatcher/event/telegram.py", line 121, in trigger                                                     return await wrapped_inner(event, kwargs)                                     File "/home/FMR/venv/lib/python3.10/site-packages/aiogram/dispatcher/event/handler.py", line 43, in call
    return await wrapped()
  File "/home/FMR/venv/lib/python3.10/site-packages/aiogram/dispatcher/dispatcher.py", line 276, in _listen_update
    return await self.propagate_event(update_type=update_type, event=event, **kwargs)
  File "/home/FMR/venv/lib/python3.10/site-packages/aiogram/dispatcher/router.py", line 128, in propagate_event
    return await observer.wrap_outer_middleware(_wrapped, event=event, data=kwargs)
  File "/home/FMR/venv/lib/python3.10/site-packages/aiogram/dispatcher/router.py", line 123, in _wrapped
    return await self._propagate_event(
  File "/home/FMR/venv/lib/python3.10/site-packages/aiogram/dispatcher/router.py", line 156, in _propagate_event
    response = await router.propagate_event(update_type=update_type, event=event, **kwargs)
  File "/home/FMR/venv/lib/python3.10/site-packages/aiogram/dispatcher/router.py", line 128, in propagate_event
    return await observer.wrap_outer_middleware(_wrapped, event=event, data=kwargs)
  File "/home/FMR/venv/lib/python3.10/site-packages/aiogram/dispatcher/router.py", line 123, in _wrapped
    return await self._propagate_event(
  File "/home/FMR/venv/lib/python3.10/site-packages/aiogram/dispatcher/router.py", line 148, in _propagate_event
    response = await observer.trigger(event, **kwargs)
  File "/home/FMR/venv/lib/python3.10/site-packages/aiogram/dispatcher/event/telegram.py", line 121, in trigger
    return await wrapped_inner(event, kwargs)
  File "/home/FMR/venv/lib/python3.10/site-packages/aiogram/dispatcher/event/handler.py", line 43, in call
    return await wrapped()
  File "/home/FMR/bot/handlers/start.py", line 9, in cmd_start
    await event.answer("Hello!")
  File "/home/FMR/venv/lib/python3.10/site-packages/aiogram/methods/base.py", line 84, in emit
    return await bot(self)
  File "/home/FMR/venv/lib/python3.10/site-packages/aiogram/client/bot.py", line 492, in __call__
    return await self.session(self, method, timeout=request_timeout)
  File "/home/FMR/venv/lib/python3.10/site-packages/aiogram/client/session/base.py", line 254, in __call__
    return cast(TelegramType, await middleware(bot, method))
  File "/home/FMR/venv/lib/python3.10/site-packages/sulguk/aiogram_middleware.py", line 50, in __call__
    return await make_request(bot, method)
  File "/home/FMR/bot/middlewares/request/retry.py", line 51, in __call__
    return await make_request(bot, method)
  File "/home/FMR/venv/lib/python3.10/site-packages/aiogram/client/session/aiohttp.py", line 178, in make_request
    response = self.check_response(
  File "/home/FMR/venv/lib/python3.10/site-packages/aiogram/client/session/base.py", line 120, in check_response
    raise TelegramBadRequest(method=method, message=description)
aiogram.exceptions.TelegramBadRequest: Telegram server says - Bad Request: unsupported parse_mode

If you additionally specify sulguk as the parse_mode of the message sending method, then there are no errors:

async def cmd_start(
    event: types.Message
) -> None:
    await event.answer("Hello!", parse_mode=SULGUK_PARSE_MODE)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.