GithubHelp home page GithubHelp logo

Non-Ascii characters about rtv HOT 15 CLOSED

michael-lazar avatar michael-lazar commented on May 28, 2024
Non-Ascii characters

from rtv.

Comments (15)

michael-lazar avatar michael-lazar commented on May 28, 2024

I think this is a fair point, I don't have anything against adding unicode. I went with ASCII because I was worried about compatibility with terminal emulators and different system configurations lacking unicode support.

from rtv.

michael-lazar avatar michael-lazar commented on May 28, 2024

#33

from rtv.

michael-lazar avatar michael-lazar commented on May 28, 2024

Fixed pull request in 47ad49a.

However, there still appear to be some issues with python3. For example,

$ python2 -m rtv -l http://www.reddit.com/r/LearnJapanese/comments/2ylsz9/request_japanese_audio_textbook_audiobook_or/

loads correctly, but

$ python3 -m rtv -l http://www.reddit.com/r/LearnJapanese/comments/2ylsz9/request_japanese_audio_textbook_audiobook_or/

crashes with the following error

Traceback (most recent call last):
  File "/usr/lib/python3.4/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.4/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/michael/Projects/rtv_project/rtv/__main__.py", line 7, in <module>
    main()
  File "/home/michael/Projects/rtv_project/rtv/main.py", line 84, in main
    page.loop()
  File "/home/michael/Projects/rtv_project/rtv/submission.py", line 28, in loop
    self.draw()
  File "/home/michael/Projects/rtv_project/rtv/page.py", line 143, in draw
    self._draw_content()
  File "/home/michael/Projects/rtv_project/rtv/page.py", line 194, in _draw_content
    attr = self.draw_item(subwindow, data, inverted)
  File "/home/michael/Projects/rtv_project/rtv/submission.py", line 95, in draw_item
    return self.draw_comment(win, data, inverted=inverted)
  File "/home/michael/Projects/rtv_project/rtv/submission.py", line 122, in draw_comment
    win.addnstr(row, 1, text, n_cols-1)
_curses.error: addnwstr() returned ERR

Passing --force-ascii fixes the crash, but I would like to figure out why some unicode characters appear to be messing with curses in python3.4.

from rtv.

0xr0bert avatar 0xr0bert commented on May 28, 2024

That doesn't make a lot of sense because this works http://www.reddit.com/r/LearnJapanese/comments/2yonx8/is_there_any_difference_between_%E9%87%8D%E3%81%9F%E3%81%84_and_%E9%87%8D%E3%81%84/ and that has japanese characters in it. However I get the same issue on my computer

from rtv.

michael-lazar avatar michael-lazar commented on May 28, 2024

I did some investigating into this! It looks like the problem is with how python3 curses.addnstr() calculates the n. The expected behavior is that curses.addnstr(y, x, text, n) will print the first n characters of the_text_ string onto the screen. Here's what's happening.

import curses
import locale
locale.setlocale(locale.LC_ALL, '')

def func(stdscr):

    stdscr.clear()

    text = 'a' * 5
    stdscr.addnstr(0, 0, text, 6)
    stdscr.addnstr(1, 0, text, 5)
    stdscr.addnstr(2, 0, text, 4)

    text = 'あ' * 5
    stdscr.addnstr(4, 0, text, 6)
    stdscr.addnstr(5, 0, text, 5)
    stdscr.addnstr(6, 0, text, 4)

    text = ('あ' * 5).encode('utf-8')
    stdscr.addnstr(8, 0, text, 6)
    stdscr.addnstr(9, 0, text, 5)
    stdscr.addnstr(10, 0, text, 4)


    stdscr.refresh()
    stdscr.getch()
    stdscr.getch()

if __name__ == '__main__':
    curses.wrapper(func)

screenshot from 2015-03-12 00 00 35

  1. Passing unicode characters within ordinal range(128) works as expected, each character takes up one column of the terminal and up to n characters are printed.
  2. Passing unicode characters outside of the ordinal range(128). Up to n characters are printed, but each character takes up two columns in the terminal. So unless we know exactly how much space each character takes, passing in a value for n becomes useless for any practical sense.
  3. Passing utf-8 encoded bytes. Now this one is odd, it looks like n is counting the number of bytes passed in. The 'あ' character is 3-bytes long in utf-8. Setting n to 6 allows for 6/3=2 full characters to be printed. Setting n below six cuts off the final character. It is unclear what happens to the partial character bytes. This mode is also practically useless because it doesn't account for the width of each character on the screen.

from rtv.

michael-lazar avatar michael-lazar commented on May 28, 2024

I don't know if if there is a good way to handle this.

I didn't realize that some unicode characters take up more space than others. This will break how I am currently using textwrap to format paragraphs to fit on the page.

from rtv.

yskmt avatar yskmt commented on May 28, 2024

Yes. I observed too that sometimes a unicode character breaks at the line-change.

from rtv.

michael-lazar avatar michael-lazar commented on May 28, 2024

Heads up, I changed the default encoding to ascii until this is resolved. 7db8c2f

from rtv.

michael-lazar avatar michael-lazar commented on May 28, 2024

Interesting reading on the subject http://stackoverflow.com/questions/3634627
and https://pypi.python.org/pypi/wcwidth/0.1.4

from rtv.

michael-lazar avatar michael-lazar commented on May 28, 2024

I've refactored the code to follow the "unicode sandwich" design method described here. This should make the problem easier to address in the future while considering both py2 and py3.

from rtv.

firecat53 avatar firecat53 commented on May 28, 2024

We used unicodedata.east_asian_width to help calculate column widths for my curses CSV viewer.

from rtv.

michael-lazar avatar michael-lazar commented on May 28, 2024

Hey guys! I just merged a large update to the master branch that will hopefully smooth out all of the unicode issues in the codebase. We are now using the kitchen python package, which provides a bunch of unicode-aware text formatting functions. I also set the default program mode to enable unicode, with the option to disable it with the --ascii flag.

@yskmt do you think you could help me test this out, or point me to some non-english subreddits? I would really appreciate it.

from rtv.

yskmt avatar yskmt commented on May 28, 2024

@michael-lazar Sure. Have you implemented it? Is it on master branch?

from rtv.

michael-lazar avatar michael-lazar commented on May 28, 2024

Yes it is currently on master. Unicode mode should be turned on by default, so all you have to do is checkout master and run it.

from rtv.

michael-lazar avatar michael-lazar commented on May 28, 2024

I haven't heard any objections so I'm closing this for now. If anybody discovers a problem, please open a new issue.

from rtv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.