Comments (5)
It's a Python 3 compatibility issue.
I'll look into it later, meanwhile feel free to patch.
I mark this as a bug
.
from html2text.
This problem also occurs when running html2text as the mu4e-html2text command in mu4e in Emacs and is caused by the fact that str objects do not have decode methods.
Replacing
data = data.decode(encoding)
on line 1083 of init.py with
if hasattr(data, "decode"):
data = data.decode(encoding)
seems to fix it.
from html2text.
File objects in Python 3 are not binary by default. See https://docs.python.org/3/library/sys.html#sys.stdin. The fix, at least for Python 3, in file html2text/init.py is:
else:
#data = sys.stdin.read()
data = sys.stdin.buffer.read()
Note: the other input/file objects do return bytes (urllib.urlopen() and open(file_, 'rb')).
Adding additional shell script tests would be useful, similar to:
- "echo '
hi
' | html2text" - html2text file
- html2text url
from html2text.
Please have a look at pull request #46 it passes the test suite at travis as well: https://travis-ci.org/Alir3z4/html2text/builds/42863577
Lemme know if you have any suggestion for improvement, I'll merge by end of the week and issue new release.
from html2text.
The fix for this bug #46 has been merged and included in version 2014.12.5
:
Changes:
https://github.com/Alir3z4/html2text/releases/tag/2014.12.5
Available on PyPi now:
https://pypi.python.org/pypi/html2text/2014.12.5
Thanks for your awesome contribution.
from html2text.
Related Issues (20)
- Semicolon in Text with &#. HOT 1
- IndexError when padding nested tables HOT 2
- inline tags within strong/strike/u/i/em/... do not handle spaces correctly
- Different results when `HTML2Text` object is reused HOT 2
- Strip leading/trailing whitespace for links and inline code HOT 1
- Cannot provide space between content of <option> tag in <select>
- The export format is incorrect when the table tag contains < p > or < br >
- Featurerequest: Output without markdown HOT 3
- RE_MD_DASH_MATCHER does not exist in the HTML2TEXT() object
- Character reference replacement results in raw HTML
- Broken Images in README.md HOT 2
- How can I parse the `<pre>` tag into tri-backquote style?
- [Bug] Assumes first row is always table header even if it is not HOT 1
- --ignore-links flag creates new composite words in output
- Link titles break with encoded quote
- Ignoring some elements
- HTML <picture> Element not returned as image link from srcset
- charref() maybe throw OverflowError: Python int too large to convert to C int HOT 1
- Extra "\" slashes before specific numeric
- `.handle()` w/ new text yields previous results if AssertionError is raised
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from html2text.