GithubHelp home page GithubHelp logo

Comments (12)

awdeorio avatar awdeorio commented on August 12, 2024

Could you upload a small version of the template and database? I can take a look.

from mailmerge.

gbiscuolo avatar gbiscuolo commented on August 12, 2024

Wow! Thank you for your practically instant reply :-)
First: I'm using mailmerge version 1.7.2, installed via "pip3" on a Debian Stretch distro

This is a test template: test_template.txt

This is a test database: test_database.txt

If I dry-run I get no issues (all messages are displayed):

mailmerge --dry-run --no-limit --template mailmerge/test_template.txt --config mailmerge/gbiscuolo_server.conf --database mailmerge/test_database.txt

If I no-dry-run I get the same error message as in my first message (after inserting my SMTP password):

Traceback (most recent call last):
  File "/usr/local/bin/mailmerge", line 11, in <module>
    sys.exit(main())
  File "/usr/lib/python3/dist-packages/click/core.py", line 716, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/core.py", line 696, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3/dist-packages/click/core.py", line 889, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/click/core.py", line 534, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/mailmerge/main.py", line 231, in main
    sendmail(message, config_filename)
  File "/usr/local/lib/python3.5/dist-packages/mailmerge/main.py", line 72, in sendmail
    smtp.send_message(message)
  File "/usr/lib/python3.5/smtplib.py", line 958, in send_message
    g.flatten(msg_copy, linesep='\r\n')
  File "/usr/lib/python3.5/email/generator.py", line 116, in flatten
    self._write(msg)
  File "/usr/lib/python3.5/email/generator.py", line 181, in _write
    self._dispatch(msg)
  File "/usr/lib/python3.5/email/generator.py", line 214, in _dispatch
    meth(msg)
  File "/usr/lib/python3.5/email/generator.py", line 432, in _handle_text
    super(BytesGenerator,self)._handle_text(msg)
  File "/usr/lib/python3.5/email/generator.py", line 249, in _handle_text
    self._write_lines(payload)
  File "/usr/lib/python3.5/email/generator.py", line 155, in _write_lines
    self.write(line)
  File "/usr/lib/python3.5/email/generator.py", line 406, in write
    self._fp.write(s.encode('ascii', 'surrogateescape'))
UnicodeEncodeError: 'ascii' codec can't encode character '\xe8' in position 7: ordinal not in range(128)

Plz. can you tell me what library is "email/generator.py" from?

ciao
Giovanni

from mailmerge.

awdeorio avatar awdeorio commented on August 12, 2024

The error is coming from Python's stmp (email) library. I can reproduce the problem with my own email server. I'll let you know.

from mailmerge.

gbiscuolo avatar gbiscuolo commented on August 12, 2024

Thank you for your feedback, I'm struggling with this but can't get a solution (I'm not an smtplib nor python regular programmer)
I was hoping to resolve the issue by defining the content-type header but... non way; I added:

Content-Type: text/plain; charset=UTF-8

to the template but I get the very same error as above

from mailmerge.

gbiscuolo avatar gbiscuolo commented on August 12, 2024

OK: found a partial solution, please see the
mailmerge main.py diff (git)

This is a temporary hack and not a real solution, since the encoding is hardcoded and not extracted from the message Content-Type header (that sould be in the template)

I'm working on a more complete patch: I'm not used at forking and merge requesting with Github... but I'll try

Please keep this open until we find a proper solution to message encoding

Ciao
Giovanni

from mailmerge.

awdeorio avatar awdeorio commented on August 12, 2024

Good hack! I spent a few hours on this yesterday and haven't been able to find an elegant solution yet. I did learn that it matters whether the body is UTF-8 vs. headers. I agree with your idea that encoding the Content-Type in the template is a far better solution.

from mailmerge.

gbiscuolo avatar gbiscuolo commented on August 12, 2024

I'm still trying to find a solution to encode the content-type in the template header
I've forked your upstream repo and I'm working in this feature branch:

https://github.com/gbiscuolo/mailmerge/tree/feature/content-type

My patch is working in Python3 but not in Python2, where email.parser.Parser().parsestr(text) still complains when parsing an UTF-8 encoded message

I've searched the email.parser documentation but I still can't figure out how to correctly parse non-ascii encoded email messages.

any direction please?

ciao
Giovanni

from mailmerge.

awdeorio avatar awdeorio commented on August 12, 2024

I can replicate the problem you describe with the email.parser.Parser().parsestr(text) method. While I haven't solved it, I have gotten a bit closer to understanding the problem.

In Python 2.7, parsestr() calls parse(), which assumes an RFC 2822 formatted headers. I don't see any indication that the it is designed to interpret any UTF-8 content, even though the non-ASCII characters are present later, in the body of the text.
https://docs.python.org/2/library/email.parser.html#email.parser.Parser.parse

In Python 3.6, parsestr() again calls parse(), which assumes RFC 5322 or RFC 6532 formatting. The later supports UTF-8.
https://docs.python.org/3.6/library/email.parser.html#email.parser.BytesParser.parse

So, this explains why parsestr() has different behavior in Python 2.7 compared to Python 3.6. This isn't a solution, but it's a little closer!

from mailmerge.

awdeorio avatar awdeorio commented on August 12, 2024

Update: I've successfully parsed a UTF-8 encoded body using Python2:

message = email.mime.text.MIMEText(text, _subtype='plain', _charset='utf-8')

https://docs.python.org/2/library/email.mime.html#email.mime.text.MIMEText

This is a bit of progress, but still suffers from a few problems:

  1. Doesn't read headers (like TO and FROM) from text
  2. Assumes text/plain, which breaks text/html support

from mailmerge.

awdeorio avatar awdeorio commented on August 12, 2024

The problem gets worse. Mark Lutz talks about email encoding in GREAT DEPTH in Programming Python 4th edition. https://books.google.com/books?id=q8W3WQbNWmkC

He writes that email encodings are a huge mess in Python 2.x. He suggests that the solution is to write your own email parser to elegantly handle non-ascii email. Email parsing with UTF-8 and other encodings is fixed in later Python 3.x versions. I tried copying the outdated email parsing code from the book into mailmerge, but it didn't immediately work.

One possible fix for this is "if you want to use UTF-8, pip3 install mailmerge. Possibly, with a catch-and-warn for the exception.

from mailmerge.

gbiscuolo avatar gbiscuolo commented on August 12, 2024

thank you for the updates!

first a very short preamble: modern editors fully support UTF-8 encoded text allowing anyone to write an email template using strings like çóñỉöæßðđŋħĸł (perfectly meaningless, just an example) and modern email clients supports UTF-8 too, IMHO UTF-8 support it's a must for a useful tool like mailmerge

re comment 309751646:

  1. forcing to use Python3 for full UTF-8 (and other encodings) email parsing support is an option (and it works for all modern distros users, including me): it's not the best solution but a legitimate one, e.g. see mailman3 dropped Python2 support. You are the author, your choice! :-)
  2. porting your code to support Python2 is possible (there are plenty of applications sending emails using Python2: mailman2, django...) but we should rewrite your email parsing code, dropping the very elegant email.parser.Parser().parsestr(text) for email.mime.text.MIMEText and email.header email.header.decode_header
  3. I don't understand Mark Lutz suggestion to "write you own email parser" since we have email (body and headers) encoding support in standard Python[2|3] library, still not in email.parser.Parser() unfortunately: can you please publish the code you tested (does it worth a feature branch?)

re your comment 309739275:

  1. we should read the _charset and _subtype from the Content-Type header, not forcing hardcoded text and utf-8
  2. we should parse the message header before the message body using email.header standard library functions

I find this Stackoverflow answer an interesting approach to "charset aware" header parsing

If you decide to go on with Python2 support I could help (with limited resources)

from mailmerge.

awdeorio avatar awdeorio commented on August 12, 2024

I found a new solution! Check out my bugfix/utf8 branch. In particular, https://github.com/awdeorio/mailmerge/blob/bugfix/utf8/mailmerge/api.py#L19

The documentation is deep in the source code comments of http://python-future.org/_modules/future/standard_library.html

Note that I restructured the source code a lot so that I could better add unit tests for issues like this. A testcase with your template and database is in /tests/test_utf8.*.

You can run all the tests with ./bin/test-functional.

You can test both python2 and python3 automatically with ./bin/test-python2-python3

from mailmerge.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.