Comments (16)
No worries, this is great, I'm just happy someone is willing to put up with all of this :) I'm using this for email so I don't get the more interesting combos like unicode in my filenames :D ... I know why these errors happen though, I just need to find all places where I miss unicode support (python2 deals poorly with it and some functions don't report unicode unless feed with unicode :-P )
from iceshelf.
Should work now if you pull the latest version. Also added åäö and other fine unicode characters to the test script to weed out this. Please close if this solves your issue.
from iceshelf.
Now I've got quite the same error at another position. Sorry for consuming so much of your time!
Traceback (most recent call last):
File "./build/iceshelf/iceshelf", line 412, in <module>
gotall = collectSources(config['sources'])
File "./build/iceshelf/iceshelf", line 153, in collectSources
for root, dirs, files in os.walk(path):
File "/usr/lib/python2.7/os.py", line 296, in walk
for x in walk(new_path, topdown, onerror, followlinks):
File "/usr/lib/python2.7/os.py", line 296, in walk
for x in walk(new_path, topdown, onerror, followlinks):
File "/usr/lib/python2.7/os.py", line 286, in walk
if isdir(join(top, name)):
File "/usr/lib/python2.7/posixpath.py", line 80, in join
path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 16: ordinal not in range(128)
from iceshelf.
This is frustrating, I can't reproduce the issue you're seeing. Would you mind emailing me ([email protected]) the output of "find /raid/Multimedia/Audio/" so I can recreate the structure here? Also, just in-case, are you running the latest? Last commit headline is " Improved test script", just so nothing got messed up :)
from iceshelf.
I pulled again, just in-case, still the same issue. Mail is on the way. Sorry that I included no body ;)
from iceshelf.
Hehe :) No worries, just wrote a script to recreate the structure locally here, hopefully this will allow me to see what the F is going on.
from iceshelf.
AHHHH! Now it crashes here too :D Nice! Let's see what we can see
from iceshelf.
ha@development:~/projects/iceshelf/tmp$ ../iceshelf config
First run, no previous checksums
Setting up the prep directory
Checking sources for changes
Processing "test" (raid/)
Creating archive
Creating tar archive
Removing temporary copies of files
Traceback (most recent call last):
File "../iceshelf", line 469, in <module>
files = gatherData()
File "../iceshelf", line 213, in gatherData
fileutils.deleteTree(config["archivedir"], True)
File "/home/ha/projects/iceshelf/fileutils.py", line 9, in deleteTree
for root, dirs, files in os.walk(tree, topdown=False):
File "/usr/lib/python2.7/os.py", line 294, in walk
for x in walk(new_path, topdown, onerror, followlinks):
File "/usr/lib/python2.7/os.py", line 294, in walk
for x in walk(new_path, topdown, onerror, followlinks):
File "/usr/lib/python2.7/os.py", line 294, in walk
for x in walk(new_path, topdown, onerror, followlinks):
File "/usr/lib/python2.7/os.py", line 294, in walk
for x in walk(new_path, topdown, onerror, followlinks):
File "/usr/lib/python2.7/os.py", line 294, in walk
for x in walk(new_path, topdown, onerror, followlinks):
File "/usr/lib/python2.7/os.py", line 294, in walk
for x in walk(new_path, topdown, onerror, followlinks):
File "/usr/lib/python2.7/os.py", line 284, in walk
if isdir(join(top, name)):
File "/usr/lib/python2.7/posixpath.py", line 80, in join
path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 61: ordinal not in range(128)
from iceshelf.
Urrgrgh... this is hopeless, python 2.7 does not like unicode, period... All file functions only works if you use bytes for paths and files, while json barfs on it sigh ... Converting it to unicode breaks filenaming, ignoring unicode fixes saving data, but breaks loading.
from iceshelf.
That's bad :( May the backport of pathlib2 help?
Is the unicode support in python 3 that much better? How much work would it be to upgrade to python 3? Can I help somehow? :)
from iceshelf.
Right now all the backup stuff works, I just can't serialize it to/from JSON. So I'm looking into encoding filenames with something that JSON likes. Just ran out of time yesterday :) Hoping to have it ready later this week, work is for some reason taking up my time now ;)
from iceshelf.
This is very interesting. Linux uses bytes to represent filenames, this means that you can, in a filesystem, have both UTF-8 encoded filenames as well as Latin1. Neither encoding is compatible with each other and this is the reason your backup is failing. You have a file which was created using Latin1 or some other encoding which causes the issue.
I know, it sounds odd, but it's true :) ... Checkout the files under "/raid/Multimedia/Audio/Musik/Yo-Yo Ma/Play Classical/" ... Some of the files there (and other places) will have question marks instead of the character you'd expect.
I've pushed a new version which is resilient to this situation and skips it with a warning. Renaming these files will correct the problem.
from iceshelf.
That's really interesting! Those are files that I downloaded from Google Music and then uploaded via FTP (vsftpd). Here's a link to a similar problem.
[...] server's charset is UTF-8 [...] windows' [...] encoding is GB2312
The mentioned command also detects which filenames are already UTF-8 and doesn't destroy them :)
convmv -f gb2312 -t utf8 -r --notest * -r
Thank you!
EDIT: great, after I ran the command, my filename changed from this
07 - Unaccompanied Cello Suite No. 1 in G Major, BWV 1007 Pr?lude.mp3
to this
07 - Unaccompanied Cello Suite No. 1 in G Major, BWV 1007 Pr�lude.mp3
Perhaps GB2312 wasn't correct for those sigh
from iceshelf.
I'd guess it's Latin1 or iso-8859-1 since it's german classical music. And the character you're missing is é :)
Please close this issue if you're happy with the resolution on my end.
from iceshelf.
Well, as I looked up now NTFS uses UTF-16 (wchar) for the filenames, but this would've looked different on Unix (I suppose). So FileZilla will have done some conversion to whatever, probably Latin1. I changed the FileZilla settings somewhat after the installation.
Anyway, your solution is excellent. I haven't seen such a warning after my conversion, now it's uploading :) I'm missing a progress indicator, but in the end iceshelf shall run in the background and the indicator isn't needed then. Thanks!
from iceshelf.
Yeah, NTFS is all unicode, only linux allows you to put bytes and call it whatever you want :)
from iceshelf.
Related Issues (8)
- Encrypt failed HOT 17
- JSON file will become too big HOT 2
- stable release? HOT 1
- How to run the tool itself? HOT 3
- Done dir cannot be empty HOT 2
- Use file name as archive description HOT 27
- Use AWS client instead of glacier-cmd HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from iceshelf.