GithubHelp home page GithubHelp logo

Comments (15)

darklow avatar darklow commented on June 27, 2024 3

@skyler Looks like it is lxml not premailer related issue.
See here: http://stackoverflow.com/questions/4684614/is-there-a-way-to-disable-urlencoding-of-anchor-attributes-in-lxml

Until better solution is available, i used simple replace to fix the problem with href="{{ }}".
After string has been transformed fix it back:

mapping = (('%7B%7B%20', '{{ '), ('%20%7D%7D', ' }}'))
for k, v in mapping:
    s = s.replace(k, v)

from premailer.

Tomazz avatar Tomazz commented on June 27, 2024 1

+1

from premailer.

jdq22 avatar jdq22 commented on June 27, 2024 1

I ran into this issue as well with SendGrid marketing campaign emails, specifically for [Weblink] and [unsubscribe].

I ended up chaining two .replace().replace() to .transform() to look for the HTML encoded weblink and unsubscribe strings.

from premailer.

darklow avatar darklow commented on June 27, 2024

+1. Same problem for me too

from premailer.

peterbe avatar peterbe commented on June 27, 2024

@skyler looking at http://stackoverflow.com/a/4684874/205832 it makes me think that maybe you can avoid the escaping if you change the parser. I.e.

parser = Premailer("your HTML string", method="xml")
print parser.transform()

from premailer.

clj avatar clj commented on June 27, 2024

I have not had much luck using the xml output method. While it avoids escaping the attributes, I have just get various XML related errors instead, primary related non declared entities. Instead, I decided to write my own version of etree's tostring which renders the HTML and keeps the values in the attributes unescaped.

The following gist has this code: https://gist.github.com/clj/b8ba315b9a138db73be0

This works well for my testcase, but I have not tested it extensively as I am having some other problems (where the generated output as rendered by browsers is not equivalent to the input, but I don't think this is due to the renderer linked above, but I am investigating). Basically, YMMV.

Getting the output rendered as I want requires overriding and copying some of the functionality of transform in the premailer class and it might be nice to refactor the code so that is possible to easily override the rendering of the output without doing that.

from premailer.

peterbe avatar peterbe commented on June 27, 2024

Perhaps the best way is to allow you to insert some sort of override for how lxml puts the values into src and href attributes. Premailer fiddles with the value every time because it tries to correct URLs based on the base_url.
See

parent.attrib[attr] = urljoin(self.base_url, url)

In your case, you don't want an actual working browser URL in there, you want some other code.

Perhaps something like this:

def my_url_fixed(old, new):
    if '{{'  in old and '}}' in old:
        # I know what I'm doing :)
        return old
    return new

p=Premailer(jinja_html, base_url='https://example.com', url_fixer=my_url_fixer)

And inside Premailer we have something like this:

    if self.url_fixer is None:
      self.url_fixer = lambda _, n: n
    parent.attrib[attr] = self.url_fixer(urljoin(self.base_url, url))

from premailer.

clj avatar clj commented on June 27, 2024

But isn't it etree.tostring(..., method="html") that percent encodes the href and src attributes and not actually premailer that is doing the encoding? The fix you propose above happens before etree.tostring(...) is called and so I don't think it is able to achieve the desired effect of not precent encoding the characters in href (and src) attributes (which happens in etree.tostring when the method is html).

I have not personally had any issues with the urljoin since I don't specify a base_url:

from urlparse import urljoin
urljoin(None, '{{ blah }}')
# outputs: u'{{ blah }}'

So, I think that the only way of making premailer work with template engines which expect the hrefs and src attributes to not be mangled is to render the output using a custom function. As an example:

html = """
<html>
<body>
<p class="{{ something }}">
<a href="{{ something else }}">test</a>
<a href="http://example.com/%C3%A6%C3%B8%C3%A5">ing</a>
</p>
</body>
</html>
"""
from lxml import etree
tree = etree.fromstring(html)
etree.tostring(tree, method="html")

Produces the following output:

'''
<html>
<body>
<p class="{{ something }}">
<a href="%7B%7B%20something%20else%20%7D%7D">test</a>
<a href="http://example.com/%C3%A6%C3%B8%C3%A5">ing</a>
</p>
</body>
</html>
'''

Which will fail to render through my templating engine.

Trying to percent decode the output would not work, since I would want the href in the first to stay {{ something else }} (which could be achieved by precent decoding the contents of the attribute) but the second href I would like to stay as http://example.com/%C3%A6%C3%B8%C3%A5 so I can't just naively percent decode everything in an href or src attributes.

If on the other hand I run this through the MyPremailer from the gist in my previous comment (MyPremailer(html, remove_classes=False).transform()), I get the following output:

'''
<html>
<head></head><body>
<p class="{{ something }}">
<a href="{{ something else }}">test</a>
<a href="http://example.com/%C3%A6%C3%B8%C3%A5">ing</a>
</p>
</body>
</html>
'''

Which is what my templating engine needs to work correctly.

from premailer.

OrangeDog avatar OrangeDog commented on June 27, 2024

@peterbe: Switching to method='xml' seems to be a no-go.
Here's the first error I get, even though my input validates as XHTML 1.0 Transitional.

  File "C:\utils\Python27\lib\site-packages\premailer\premailer.py", line 308, in transform
    head = get_or_create_head(tree)
  File "C:\utils\Python27\lib\site-packages\premailer\premailer.py", line 58, in get_or_create_head
    body = CSSSelector('body')(root)[0]
IndexError: list index out of range

In fact, it looks like the chosen method can get ignored anyway: if hasattr(self.html, "getroottree").

I also tried setting method='text', but that appears to have overridden my setting of encoding='ascii', as you get the familiar UnicodeEncodeError.

from premailer.

peterbe avatar peterbe commented on June 27, 2024

@clj Annoyingly I didn't get a notification about your reply.
I fear that the idea of trying to overwrite how tostring works feels dangerous and fragile. It might just happen to work today in this use-case. But taking it on and trying to solve for something against its will usually leads to maintenance problems in the future.

My bet would be to instead rely on a string replace solution as show by @darklow.

from premailer.

kespindler avatar kespindler commented on June 27, 2024

This can be resolved by either using method='xml', if possible, or if you can't have correctly-formed xml, then in the following way:

from premailer import Premailer
from premailer.premailer import _importants
from lxml import etree

with open('path/to/template.jinja2') as f:
    template_str = f.read()
parser = etree.HTMLParser()
html = etree.fromstring(template_str, parser)
# i used these options. feel free to change them.
styled = Premailer(html,
                               external_styles=['path/to/css/file.css'],
                               disable_leftover_css=True,
                               strip_important=True,
                               disable_validation=True,
                               keep_style_tags=False,
                               ).transform()
# the html parser forcibly added <html><body> to my template when it wasn't before
# the getroot()[0][0] skips to the element contained in body
# just use etree.tostring(styled) if you want the full document
out = etree.tostring(styled.getroot()[0][0]).decode('utf8')
out = _importants.sub('', out)   # if you used the strip_important flag

with open('path/to/output/template.jinja2', 'w') as f:
    f.write(out)

A refactor of the Premailer.transform method would make "jinja-var preservation" an easy option to add to the class.

from premailer.

jlev avatar jlev commented on June 27, 2024

This also bit me. Thanks for the solution, @jdq22

from premailer.

asandeep avatar asandeep commented on June 27, 2024

I faced same issue and used urllib.unquote to convert HTML encoded strings back to their decoded form.

Working just fine for me. HTH.

from premailer.

firstcloudconsulting avatar firstcloudconsulting commented on June 27, 2024

Just offering another solution. It's certainly not efficient, but it does the job and I was already using bs4 for some other aspects:

    from bs4 import BeautifulSoup

    soup = BeautifulSoup(html, features="html.parser")

    # Preserve links (template tags get URL encoded)
    orig_links = []

    for a in soup.find_all('a', href=True):
        orig_links.append(a['href'])
        a['href'] = '#link-%d' % (len(orig_links) - 1)

    # Inline CSS
    html = transform(
        html,
        premailer_html = transform(
            str(soup),
            strip_important=False
        )
    )

    # Restore links
    soup = BeautifulSoup(premailer_html, features="html.parser")

    for a in soup.find_all('a', href=True):
        lidx = int(a['href'].rsplit('-', 1)[1])
        a['href'] = orig_links[lidx]

    # Output: str(soup) or soup.prettify() ... I then minify it as well with htmlmin

from premailer.

pirsquare avatar pirsquare commented on June 27, 2024

What works for me currently is to set preserve_handlebar_syntax to True.

from premailer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.