GithubHelp home page GithubHelp logo

Comments (12)

HeavenVolkoff avatar HeavenVolkoff commented on July 24, 2024

This issue is likely related to the use of the archive_entry_pathname_utf8 variant with ICONV disabled. I can attempt to enable ICONV to check if it resolves the problem. If that doesn't work, I may consider retrieving the raw pathname data and converting it to UTF-8 on the JavaScript side using TextDecoder. Would you be able to provide an example of a GBK encoded Zip file for testing purposes?

from archive-wasm.

Klivitam avatar Klivitam commented on July 24, 2024

upload it

from archive-wasm.

Klivitam avatar Klivitam commented on July 24, 2024

image

from archive-wasm.

Klivitam avatar Klivitam commented on July 24, 2024

only archive_entry_pathname is cant format,will garbled characters

from archive-wasm.

HeavenVolkoff avatar HeavenVolkoff commented on July 24, 2024

Just posted version 1.5. Added an encoding option to extract that allows defining a specific encoding to use to decode an entry's metadata. Also added some logic that derives a fallback encoding to use when utf8 fails, from the system/browser's language. I think this should handle your use case, and it works with the test zip you provided. Btw, I decided against adding your zip to the test suite as it seems to have some personal information (idk I don't understand Chinese, but these looked like bank documents?), anyway feel free to delete the attached zip if you want.

from archive-wasm.

HeavenVolkoff avatar HeavenVolkoff commented on July 24, 2024

@Klivitam was your issue resolved with the last updates?

from archive-wasm.

Klivitam avatar Klivitam commented on July 24, 2024

@Klivitam was your issue resolved with the last updates?

I think you should return the arraybuffer | unitarray of pathname and let the user parse the username by themselves. Because I don’t know what encoding I used when decompressing it.

from archive-wasm.

Klivitam avatar Klivitam commented on July 24, 2024

Can there be a mechanism to automatically identify the compression method?

from archive-wasm.

HeavenVolkoff avatar HeavenVolkoff commented on July 24, 2024

I think you should return the arraybuffer | unitarray of pathname and let the user parse the username by themselves. Because I don’t know what encoding I used when decompressing it.

I could allow passing a 'binary' value to the encoding option, that would just signal that you want the raw arraybuffer instead of the decoded value, like Node.js do with its fs module functions

Can there be a mechanism to automatically identify the compression method?

Not reliably. There are a ton of legacy encoding that conflict with each other, and discerning what encoding a random stream of bytes is using is quite hard, and requires some statistical analysis of common language/encoding patterns, and even then it is not perfect, as there are a lot of overlaps between different encodings using the exact same byte structure for completely different character, so you have a lot of false positives. What I implemented in the code to mitigate this is to always try to decode with utf-8, which is the de facto standard nowadays, and if that fails it tries to decode using the most common legacy encoding following the user's system/browser language:

const fallbackEncodings = Object.entries({
cp1251: ['ru', 'uk', 'be', 'bg', 'sr', 'bs', 'mk'],
gb18030: ['zh'],
cseuckr: ['ko'],
csshiftjis: ['ja'],
}).reduce((mapping, [encoding, languages]) => {
for (const language of languages) mapping[language] = encoding
return mapping
}, /** @type {Record.<string, string>} */ ({}))

try {
const lang = Intl.DateTimeFormat().resolvedOptions().locale.split('-')[0]
if (lang) {
const fallbackEncoding = fallbackEncodings[lang]
if (fallbackEncoding)
return new TextDecoder(fallbackEncoding, { fatal: true }).decode(dataView)
}
} catch {}
try {
return new TextDecoder('latin1', { fatal: true }).decode(dataView)
} catch {}
return new TextDecoder('utf8').decode(dataView)

from archive-wasm.

Klivitam avatar Klivitam commented on July 24, 2024

yeah i mean it, thanks for

from archive-wasm.

Klivitam avatar Klivitam commented on July 24, 2024

i found a bug, u can change open_archive c and add setlocale like this
image
can resolve a little bugs.
default gb2312

from archive-wasm.

Klivitam avatar Klivitam commented on July 24, 2024

ttttest.zip

from archive-wasm.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.