GithubHelp home page GithubHelp logo

UTF-8 filenames about iceshelf HOT 16 CLOSED

mrworf avatar mrworf commented on June 14, 2024
UTF-8 filenames

from iceshelf.

Comments (16)

mrworf avatar mrworf commented on June 14, 2024 1

No worries, this is great, I'm just happy someone is willing to put up with all of this :) I'm using this for email so I don't get the more interesting combos like unicode in my filenames :D ... I know why these errors happen though, I just need to find all places where I miss unicode support (python2 deals poorly with it and some functions don't report unicode unless feed with unicode :-P )

from iceshelf.

mrworf avatar mrworf commented on June 14, 2024

Should work now if you pull the latest version. Also added åäö and other fine unicode characters to the test script to weed out this. Please close if this solves your issue.

from iceshelf.

Jonny007-MKD avatar Jonny007-MKD commented on June 14, 2024

Now I've got quite the same error at another position. Sorry for consuming so much of your time!

Traceback (most recent call last):
  File "./build/iceshelf/iceshelf", line 412, in <module>
    gotall = collectSources(config['sources'])
  File "./build/iceshelf/iceshelf", line 153, in collectSources
    for root, dirs, files in os.walk(path):
  File "/usr/lib/python2.7/os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/usr/lib/python2.7/os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/usr/lib/python2.7/os.py", line 286, in walk
    if isdir(join(top, name)):
  File "/usr/lib/python2.7/posixpath.py", line 80, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 16: ordinal not in range(128)

from iceshelf.

mrworf avatar mrworf commented on June 14, 2024

This is frustrating, I can't reproduce the issue you're seeing. Would you mind emailing me ([email protected]) the output of "find /raid/Multimedia/Audio/" so I can recreate the structure here? Also, just in-case, are you running the latest? Last commit headline is " Improved test script", just so nothing got messed up :)

from iceshelf.

Jonny007-MKD avatar Jonny007-MKD commented on June 14, 2024

I pulled again, just in-case, still the same issue. Mail is on the way. Sorry that I included no body ;)

from iceshelf.

mrworf avatar mrworf commented on June 14, 2024

Hehe :) No worries, just wrote a script to recreate the structure locally here, hopefully this will allow me to see what the F is going on.

from iceshelf.

mrworf avatar mrworf commented on June 14, 2024

AHHHH! Now it crashes here too :D Nice! Let's see what we can see

from iceshelf.

mrworf avatar mrworf commented on June 14, 2024
ha@development:~/projects/iceshelf/tmp$ ../iceshelf config
First run, no previous checksums
Setting up the prep directory
Checking sources for changes
Processing "test" (raid/)
Creating archive
Creating tar archive
Removing temporary copies of files
Traceback (most recent call last):
  File "../iceshelf", line 469, in <module>
    files = gatherData()
  File "../iceshelf", line 213, in gatherData
    fileutils.deleteTree(config["archivedir"], True)
  File "/home/ha/projects/iceshelf/fileutils.py", line 9, in deleteTree
    for root, dirs, files in os.walk(tree, topdown=False):
  File "/usr/lib/python2.7/os.py", line 294, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/usr/lib/python2.7/os.py", line 294, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/usr/lib/python2.7/os.py", line 294, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/usr/lib/python2.7/os.py", line 294, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/usr/lib/python2.7/os.py", line 294, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/usr/lib/python2.7/os.py", line 294, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/usr/lib/python2.7/os.py", line 284, in walk
    if isdir(join(top, name)):
  File "/usr/lib/python2.7/posixpath.py", line 80, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 61: ordinal not in range(128)

from iceshelf.

mrworf avatar mrworf commented on June 14, 2024

Urrgrgh... this is hopeless, python 2.7 does not like unicode, period... All file functions only works if you use bytes for paths and files, while json barfs on it sigh ... Converting it to unicode breaks filenaming, ignoring unicode fixes saving data, but breaks loading.

from iceshelf.

Jonny007-MKD avatar Jonny007-MKD commented on June 14, 2024

That's bad :( May the backport of pathlib2 help?

Is the unicode support in python 3 that much better? How much work would it be to upgrade to python 3? Can I help somehow? :)

from iceshelf.

mrworf avatar mrworf commented on June 14, 2024

Right now all the backup stuff works, I just can't serialize it to/from JSON. So I'm looking into encoding filenames with something that JSON likes. Just ran out of time yesterday :) Hoping to have it ready later this week, work is for some reason taking up my time now ;)

from iceshelf.

mrworf avatar mrworf commented on June 14, 2024

This is very interesting. Linux uses bytes to represent filenames, this means that you can, in a filesystem, have both UTF-8 encoded filenames as well as Latin1. Neither encoding is compatible with each other and this is the reason your backup is failing. You have a file which was created using Latin1 or some other encoding which causes the issue.

I know, it sounds odd, but it's true :) ... Checkout the files under "/raid/Multimedia/Audio/Musik/Yo-Yo Ma/Play Classical/" ... Some of the files there (and other places) will have question marks instead of the character you'd expect.

I've pushed a new version which is resilient to this situation and skips it with a warning. Renaming these files will correct the problem.

from iceshelf.

Jonny007-MKD avatar Jonny007-MKD commented on June 14, 2024

That's really interesting! Those are files that I downloaded from Google Music and then uploaded via FTP (vsftpd). Here's a link to a similar problem.

[...] server's charset is UTF-8 [...] windows' [...] encoding is GB2312

The mentioned command also detects which filenames are already UTF-8 and doesn't destroy them :)

convmv -f gb2312 -t utf8 -r --notest * -r

Thank you!

EDIT: great, after I ran the command, my filename changed from this
07 - Unaccompanied Cello Suite No. 1 in G Major, BWV 1007 Pr?lude.mp3
to this
07 - Unaccompanied Cello Suite No. 1 in G Major, BWV 1007 Pr�lude.mp3
Perhaps GB2312 wasn't correct for those sigh

from iceshelf.

mrworf avatar mrworf commented on June 14, 2024

I'd guess it's Latin1 or iso-8859-1 since it's german classical music. And the character you're missing is é :)
Please close this issue if you're happy with the resolution on my end.

from iceshelf.

Jonny007-MKD avatar Jonny007-MKD commented on June 14, 2024

Well, as I looked up now NTFS uses UTF-16 (wchar) for the filenames, but this would've looked different on Unix (I suppose). So FileZilla will have done some conversion to whatever, probably Latin1. I changed the FileZilla settings somewhat after the installation.

Anyway, your solution is excellent. I haven't seen such a warning after my conversion, now it's uploading :) I'm missing a progress indicator, but in the end iceshelf shall run in the background and the indicator isn't needed then. Thanks!

from iceshelf.

mrworf avatar mrworf commented on June 14, 2024

Yeah, NTFS is all unicode, only linux allows you to put bytes and call it whatever you want :)

from iceshelf.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.