Hi, my test file was uploaded successfully 👍 Now I changed to r

Hehe :) No worries, just wrote a to recreate the structure locally here, hopefu

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

That's bad :( May the backport of <a href="https://pypi.python.org/pypi/pathlib2/" rel

UTF-8 filenames about iceshelf HOT 16 CLOSED

mrworf commented on June 14, 2024

UTF-8 filenames

from iceshelf.

Comments (16)

mrworf commented on June 14, 2024 1

No worries, this is great, I'm just happy someone is willing to put up with all of this :) I'm using this for email so I don't get the more interesting combos like unicode in my filenames :D ... I know why these errors happen though, I just need to find all places where I miss unicode support (python2 deals poorly with it and some functions don't report unicode unless feed with unicode :-P )

from iceshelf.

mrworf commented on June 14, 2024

Should work now if you pull the latest version. Also added åäö and other fine unicode characters to the test script to weed out this. Please close if this solves your issue.

from iceshelf.

Jonny007-MKD commented on June 14, 2024

Now I've got quite the same error at another position. Sorry for consuming so much of your time!

Traceback (most recent call last):
  File "./build/iceshelf/iceshelf", line 412, in <module>
    gotall = collectSources(config['sources'])
  File "./build/iceshelf/iceshelf", line 153, in collectSources
    for root, dirs, files in os.walk(path):
  File "/usr/lib/python2.7/os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/usr/lib/python2.7/os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/usr/lib/python2.7/os.py", line 286, in walk
    if isdir(join(top, name)):
  File "/usr/lib/python2.7/posixpath.py", line 80, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 16: ordinal not in range(128)

from iceshelf.

mrworf commented on June 14, 2024

This is frustrating, I can't reproduce the issue you're seeing. Would you mind emailing me ([email protected]) the output of "find /raid/Multimedia/Audio/" so I can recreate the structure here? Also, just in-case, are you running the latest? Last commit headline is " Improved test script", just so nothing got messed up :)

from iceshelf.

Jonny007-MKD commented on June 14, 2024

I pulled again, just in-case, still the same issue. Mail is on the way. Sorry that I included no body ;)

from iceshelf.

mrworf commented on June 14, 2024

Hehe :) No worries, just wrote a script to recreate the structure locally here, hopefully this will allow me to see what the F is going on.

from iceshelf.

mrworf commented on June 14, 2024

AHHHH! Now it crashes here too :D Nice! Let's see what we can see

from iceshelf.

mrworf commented on June 14, 2024

ha@development:~/projects/iceshelf/tmp$ ../iceshelf config
First run, no previous checksums
Setting up the prep directory
Checking sources for changes
Processing "test" (raid/)
Creating archive
Creating tar archive
Removing temporary copies of files
Traceback (most recent call last):
  File "../iceshelf", line 469, in <module>
    files = gatherData()
  File "../iceshelf", line 213, in gatherData
    fileutils.deleteTree(config["archivedir"], True)
  File "/home/ha/projects/iceshelf/fileutils.py", line 9, in deleteTree
    for root, dirs, files in os.walk(tree, topdown=False):
  File "/usr/lib/python2.7/os.py", line 294, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/usr/lib/python2.7/os.py", line 294, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/usr/lib/python2.7/os.py", line 294, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/usr/lib/python2.7/os.py", line 294, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/usr/lib/python2.7/os.py", line 294, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/usr/lib/python2.7/os.py", line 294, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/usr/lib/python2.7/os.py", line 284, in walk
    if isdir(join(top, name)):
  File "/usr/lib/python2.7/posixpath.py", line 80, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 61: ordinal not in range(128)

from iceshelf.

mrworf commented on June 14, 2024

Urrgrgh... this is hopeless, python 2.7 does not like unicode, period... All file functions only works if you use bytes for paths and files, while json barfs on it sigh ... Converting it to unicode breaks filenaming, ignoring unicode fixes saving data, but breaks loading.

from iceshelf.

Jonny007-MKD commented on June 14, 2024

That's bad :( May the backport of pathlib2 help?

Is the unicode support in python 3 that much better? How much work would it be to upgrade to python 3? Can I help somehow? :)

from iceshelf.

mrworf commented on June 14, 2024

Right now all the backup stuff works, I just can't serialize it to/from JSON. So I'm looking into encoding filenames with something that JSON likes. Just ran out of time yesterday :) Hoping to have it ready later this week, work is for some reason taking up my time now ;)

from iceshelf.

mrworf commented on June 14, 2024

This is very interesting. Linux uses bytes to represent filenames, this means that you can, in a filesystem, have both UTF-8 encoded filenames as well as Latin1. Neither encoding is compatible with each other and this is the reason your backup is failing. You have a file which was created using Latin1 or some other encoding which causes the issue.

I know, it sounds odd, but it's true :) ... Checkout the files under "/raid/Multimedia/Audio/Musik/Yo-Yo Ma/Play Classical/" ... Some of the files there (and other places) will have question marks instead of the character you'd expect.

I've pushed a new version which is resilient to this situation and skips it with a warning. Renaming these files will correct the problem.

from iceshelf.

Jonny007-MKD commented on June 14, 2024

That's really interesting! Those are files that I downloaded from Google Music and then uploaded via FTP (vsftpd). Here's a link to a similar problem.

[...] server's charset is UTF-8 [...] windows' [...] encoding is GB2312

The mentioned command also detects which filenames are already UTF-8 and doesn't destroy them :)

convmv -f gb2312 -t utf8 -r --notest * -r

Thank you!

EDIT: great, after I ran the command, my filename changed from this
07 - Unaccompanied Cello Suite No. 1 in G Major, BWV 1007 Pr?lude.mp3
to this
07 - Unaccompanied Cello Suite No. 1 in G Major, BWV 1007 Pr�lude.mp3
Perhaps GB2312 wasn't correct for those sigh

from iceshelf.

mrworf commented on June 14, 2024

I'd guess it's Latin1 or iso-8859-1 since it's german classical music. And the character you're missing is é :)
Please close this issue if you're happy with the resolution on my end.

from iceshelf.

Jonny007-MKD commented on June 14, 2024

Well, as I looked up now NTFS uses UTF-16 (wchar) for the filenames, but this would've looked different on Unix (I suppose). So FileZilla will have done some conversion to whatever, probably Latin1. I changed the FileZilla settings somewhat after the installation.

Anyway, your solution is excellent. I haven't seen such a warning after my conversion, now it's uploading :) I'm missing a progress indicator, but in the end iceshelf shall run in the background and the indicator isn't needed then. Thanks!

from iceshelf.

mrworf commented on June 14, 2024

Yeah, NTFS is all unicode, only linux allows you to put bytes and call it whatever you want :)

from iceshelf.

UTF-8 filenames about iceshelf HOT 16 CLOSED

Comments (16)

Related Issues (8)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs