Premailer version 2.5.0. I have an HTML template with <a href="http:

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Jinja variables in hrefs are HTML encoded. Outside of hrefs they're not encoded. about premailer HOT 15 OPEN

peterbe commented on June 27, 2024 5

Jinja variables in hrefs are HTML encoded. Outside of hrefs they're not encoded.

from premailer.

Comments (15)

darklow commented on June 27, 2024 3

@skyler Looks like it is lxml not premailer related issue.
See here: http://stackoverflow.com/questions/4684614/is-there-a-way-to-disable-urlencoding-of-anchor-attributes-in-lxml

Until better solution is available, i used simple replace to fix the problem with href="{{ }}".
After string has been transformed fix it back:

mapping = (('%7B%7B%20', '{{ '), ('%20%7D%7D', ' }}'))
for k, v in mapping:
    s = s.replace(k, v)

from premailer.

Tomazz commented on June 27, 2024 1

from premailer.

jdq22 commented on June 27, 2024 1

I ran into this issue as well with SendGrid marketing campaign emails, specifically for [Weblink] and [unsubscribe].

I ended up chaining two .replace().replace() to .transform() to look for the HTML encoded weblink and unsubscribe strings.

from premailer.

darklow commented on June 27, 2024

+1. Same problem for me too

from premailer.

peterbe commented on June 27, 2024

@skyler looking at http://stackoverflow.com/a/4684874/205832 it makes me think that maybe you can avoid the escaping if you change the parser. I.e.

parser = Premailer("your HTML string", method="xml")
print parser.transform()

from premailer.

clj commented on June 27, 2024

I have not had much luck using the xml output method. While it avoids escaping the attributes, I have just get various XML related errors instead, primary related non declared entities. Instead, I decided to write my own version of etree's tostring which renders the HTML and keeps the values in the attributes unescaped.

The following gist has this code: https://gist.github.com/clj/b8ba315b9a138db73be0

This works well for my testcase, but I have not tested it extensively as I am having some other problems (where the generated output as rendered by browsers is not equivalent to the input, but I don't think this is due to the renderer linked above, but I am investigating). Basically, YMMV.

Getting the output rendered as I want requires overriding and copying some of the functionality of transform in the premailer class and it might be nice to refactor the code so that is possible to easily override the rendering of the output without doing that.

from premailer.

peterbe commented on June 27, 2024

Perhaps the best way is to allow you to insert some sort of override for how lxml puts the values into src and href attributes. Premailer fiddles with the value every time because it tries to correct URLs based on the base_url.
See

premailer/premailer/premailer.py

Line 421 in d2a2a4a

parent.attrib[attr] = urljoin(self.base_url, url)

In your case, you don't want an actual working browser URL in there, you want some other code.

Perhaps something like this:

def my_url_fixed(old, new):
    if '{{'  in old and '}}' in old:
        # I know what I'm doing :)
        return old
    return new

p=Premailer(jinja_html, base_url='https://example.com', url_fixer=my_url_fixer)

And inside Premailer we have something like this:

    if self.url_fixer is None:
      self.url_fixer = lambda _, n: n
    parent.attrib[attr] = self.url_fixer(urljoin(self.base_url, url))

from premailer.

clj commented on June 27, 2024

But isn't it etree.tostring(..., method="html") that percent encodes the href and src attributes and not actually premailer that is doing the encoding? The fix you propose above happens before etree.tostring(...) is called and so I don't think it is able to achieve the desired effect of not precent encoding the characters in href (and src) attributes (which happens in etree.tostring when the method is html).

I have not personally had any issues with the urljoin since I don't specify a base_url:

from urlparse import urljoin
urljoin(None, '{{ blah }}')
# outputs: u'{{ blah }}'

So, I think that the only way of making premailer work with template engines which expect the hrefs and src attributes to not be mangled is to render the output using a custom function. As an example:

html = """
<html>
<body>
<p class="{{ something }}">
<a href="{{ something else }}">test</a>
<a href="http://example.com/%C3%A6%C3%B8%C3%A5">ing</a>
</p>
</body>
</html>
"""
from lxml import etree
tree = etree.fromstring(html)
etree.tostring(tree, method="html")

Produces the following output:

'''
<html>
<body>
<p class="{{ something }}">
<a href="%7B%7B%20something%20else%20%7D%7D">test</a>
<a href="http://example.com/%C3%A6%C3%B8%C3%A5">ing</a>
</p>
</body>
</html>
'''

Which will fail to render through my templating engine.

Trying to percent decode the output would not work, since I would want the href in the first to stay {{ something else }} (which could be achieved by precent decoding the contents of the attribute) but the second href I would like to stay as http://example.com/%C3%A6%C3%B8%C3%A5 so I can't just naively percent decode everything in an href or src attributes.

If on the other hand I run this through the MyPremailer from the gist in my previous comment (MyPremailer(html, remove_classes=False).transform()), I get the following output:

'''
<html>
<head></head><body>
<p class="{{ something }}">
<a href="{{ something else }}">test</a>
<a href="http://example.com/%C3%A6%C3%B8%C3%A5">ing</a>
</p>
</body>
</html>
'''

Which is what my templating engine needs to work correctly.

from premailer.

OrangeDog commented on June 27, 2024

@peterbe: Switching to method='xml' seems to be a no-go.
Here's the first error I get, even though my input validates as XHTML 1.0 Transitional.

  File "C:\utils\Python27\lib\site-packages\premailer\premailer.py", line 308, in transform
    head = get_or_create_head(tree)
  File "C:\utils\Python27\lib\site-packages\premailer\premailer.py", line 58, in get_or_create_head
    body = CSSSelector('body')(root)[0]
IndexError: list index out of range

In fact, it looks like the chosen method can get ignored anyway: if hasattr(self.html, "getroottree").

I also tried setting method='text', but that appears to have overridden my setting of encoding='ascii', as you get the familiar UnicodeEncodeError.

from premailer.

peterbe commented on June 27, 2024

@clj Annoyingly I didn't get a notification about your reply.
I fear that the idea of trying to overwrite how tostring works feels dangerous and fragile. It might just happen to work today in this use-case. But taking it on and trying to solve for something against its will usually leads to maintenance problems in the future.

My bet would be to instead rely on a string replace solution as show by @darklow.

from premailer.

kespindler commented on June 27, 2024

This can be resolved by either using method='xml', if possible, or if you can't have correctly-formed xml, then in the following way:

from premailer import Premailer
from premailer.premailer import _importants
from lxml import etree

with open('path/to/template.jinja2') as f:
    template_str = f.read()
parser = etree.HTMLParser()
html = etree.fromstring(template_str, parser)
# i used these options. feel free to change them.
styled = Premailer(html,
                               external_styles=['path/to/css/file.css'],
                               disable_leftover_css=True,
                               strip_important=True,
                               disable_validation=True,
                               keep_style_tags=False,
                               ).transform()
# the html parser forcibly added <html><body> to my template when it wasn't before
# the getroot()[0][0] skips to the element contained in body
# just use etree.tostring(styled) if you want the full document
out = etree.tostring(styled.getroot()[0][0]).decode('utf8')
out = _importants.sub('', out)   # if you used the strip_important flag

with open('path/to/output/template.jinja2', 'w') as f:
    f.write(out)

A refactor of the Premailer.transform method would make "jinja-var preservation" an easy option to add to the class.

from premailer.

jlev commented on June 27, 2024

This also bit me. Thanks for the solution, @jdq22

from premailer.

asandeep commented on June 27, 2024

I faced same issue and used urllib.unquote to convert HTML encoded strings back to their decoded form.

Working just fine for me. HTH.

from premailer.

firstcloudconsulting commented on June 27, 2024

Just offering another solution. It's certainly not efficient, but it does the job and I was already using bs4 for some other aspects:

    from bs4 import BeautifulSoup

    soup = BeautifulSoup(html, features="html.parser")

    # Preserve links (template tags get URL encoded)
    orig_links = []

    for a in soup.find_all('a', href=True):
        orig_links.append(a['href'])
        a['href'] = '#link-%d' % (len(orig_links) - 1)

    # Inline CSS
    html = transform(
        html,
        premailer_html = transform(
            str(soup),
            strip_important=False
        )
    )

    # Restore links
    soup = BeautifulSoup(premailer_html, features="html.parser")

    for a in soup.find_all('a', href=True):
        lidx = int(a['href'].rsplit('-', 1)[1])
        a['href'] = orig_links[lidx]

    # Output: str(soup) or soup.prettify() ... I then minify it as well with htmlmin

from premailer.

pirsquare commented on June 27, 2024

What works for me currently is to set preserve_handlebar_syntax to True.

from premailer.

Jinja variables in hrefs are HTML encoded. Outside of hrefs they're not encoded. about premailer HOT 15 OPEN

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs