Comments (12)
This issue is likely related to the use of the archive_entry_pathname_utf8
variant with ICONV disabled. I can attempt to enable ICONV to check if it resolves the problem. If that doesn't work, I may consider retrieving the raw pathname
data and converting it to UTF-8 on the JavaScript side using TextDecoder
. Would you be able to provide an example of a GBK encoded Zip file for testing purposes?
from archive-wasm.
upload it
from archive-wasm.
from archive-wasm.
only archive_entry_pathname is cant format,will garbled characters
from archive-wasm.
Just posted version 1.5. Added an encoding
option to extract
that allows defining a specific encoding to use to decode an entry's metadata. Also added some logic that derives a fallback encoding to use when utf8 fails, from the system/browser's language. I think this should handle your use case, and it works with the test zip you provided. Btw, I decided against adding your zip to the test suite as it seems to have some personal information (idk I don't understand Chinese, but these looked like bank documents?), anyway feel free to delete the attached zip if you want.
from archive-wasm.
@Klivitam was your issue resolved with the last updates?
from archive-wasm.
@Klivitam was your issue resolved with the last updates?
I think you should return the arraybuffer | unitarray of pathname and let the user parse the username by themselves. Because I don’t know what encoding I used when decompressing it.
from archive-wasm.
Can there be a mechanism to automatically identify the compression method?
from archive-wasm.
I think you should return the arraybuffer | unitarray of pathname and let the user parse the username by themselves. Because I don’t know what encoding I used when decompressing it.
I could allow passing a 'binary'
value to the encoding
option, that would just signal that you want the raw arraybuffer instead of the decoded value, like Node.js do with its fs module functions
Can there be a mechanism to automatically identify the compression method?
Not reliably. There are a ton of legacy encoding that conflict with each other, and discerning what encoding a random stream of bytes is using is quite hard, and requires some statistical analysis of common language/encoding patterns, and even then it is not perfect, as there are a lot of overlaps between different encodings using the exact same byte structure for completely different character, so you have a lot of false positives. What I implemented in the code to mitigate this is to always try to decode with utf-8, which is the de facto standard nowadays, and if that fails it tries to decode using the most common legacy encoding following the user's system/browser language:
archive-wasm/src/wasm/pointer.mjs
Lines 27 to 35 in 9f9766d
archive-wasm/src/wasm/pointer.mjs
Lines 210 to 223 in 9f9766d
from archive-wasm.
yeah i mean it, thanks for
from archive-wasm.
i found a bug, u can change open_archive c and add setlocale like this
can resolve a little bugs.
default gb2312
from archive-wasm.
from archive-wasm.
Related Issues (3)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from archive-wasm.