GithubHelp home page GithubHelp logo

base32768's People

Contributors

dependabot[bot] avatar qntm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

base32768's Issues

base.decode(...).buffer returns 1 byte too long buffer.

Title says it all:
base.decode(...).buffer returns 1 byte too long buffer.

Example:

base.decode("h").buffer.length == 1
base.decode("hi").buffer.length == 3
base.decode("hii").buffer.length == 3
base.decode("hiii").buffer.length == 5
base.decode("hiiii").buffer.length == 5

Consideration when string will be stored in text files

First of all, thanks a lot for creating base32768.

I'm experimenting with converting binary data into text, then store as text files in the filesystem.

The reason why I do it may not be your interest, so I fold it.

I need to ship an offline .html to my users, who can simply double click to open it, and it'll be a fancy SPA that runs on file:///. It can't read anything from the file system, except css js and images via src="" tags. So, I use const database = "a_long_long_string" inside a .js file to provide data to the app.

Since text files are UTF-8, this package seems to lose the edge. (As far as my tests, when my instruction is "hey I have this big string, please write it to file", both browser and nodejs will write UTF-8, so I guess it's not up to me to change)

So I tested different ways to convert binary data to string, for instance, base32768, base64, and TextDecoder.

Base32768 works as intended, of course.

Base64 seems to do really well, it's faster than base32768 and produces smaller text files. I guess it's because each base64 char is 1 byte in UTF8, correct? Your README says it's 75% efficient while base32768 is 63%, which is consistent with my output file size.

TextDecoder is 20x more performant than the 2 above, but the conversion can't be reversed if the input Uint8Array contains bytes from 128-255 and I don't understand why:

TextDecoder test code:
function convertToStringThenBack(input: Uint8Array) {
  const string = new TextDecoder().decode(input.buffer);
  const back = new TextEncoder().encode(string);
  const isEqual =
    back.length === input.length &&
    back.every((value, index) => value === input[index]);
  if (isEqual) console.log("good");
  else console.log("bad");
}

convertToStringThenBack(new Uint8Array([1, 10, 100, 127])); // good
convertToStringThenBack(new Uint8Array([1, 10, 100, 128])); // bad

Performance is a big deal because my data is very large, even if it means I'll have to add an extra step after TextDecoder to sanitize the string, however I simply don't know how to make it work.

Regardless, do you have any comments to whatever I wrote above? Thanks very much!

Typing plans?

Hello. I am writing a project where I use base2048, base32768, and base65536 (all three— I give the user the choice of which they prefer). I am finding them very useful.

base2048 and base65536 have TypeScript typings, but base32768 does not. I get an error if I include base32768 in TypeScript without giving it an any-type exemption.

Are there plans to add types to base32768 like the other two have?
If (no promises) I contributed base32768 typings by copying what the other two have, would this be a welcome PR?

Thanks.

How To Use In Browsers Section in FAQ

It will be nice to have a FAQ Section. The question in my mind is: How to use this in a browser, like Base64? In a single file, we can have
<img src="data:image/png;base64,iVBORw0KGgoAAA ANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4 //8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU 5ErkJggg==" alt="Red dot" />
and a red dot is displayed (example from Wikipedia). How to do this in base32768?

Case folding in base32768

We've been using your base32768 encoding in rclone via @Max-Sum 's go port https://github.com/Max-Sum/base32768 as a way to encode encrypted file names onto cloud storage systems. This seems particularly effective on OneDrive which seems to use UTF-16 internally.

However we noticed in this issue rclone/rclone#6803 that there are some characters which can be case folded in the set of 32768 characters.

I wrote a little Go program to demonstrate this here: https://go.dev/play/p/SK5G4dnHM6T

This prints stuff like

Duplicate case folded rune ƃ into Ƃ
Duplicate case folded rune ƅ into Ƅ

and comes up with the summary

Found 521 case folding and 199 duplicate case folding characters out of 32896

Which means that there are 521 characters which have a case folded variant, but more importantly there are 199 characters which have both the upper case and lower case variants in the 32768 characters.

This is important because rclone generates file names with these characters and OneDrive is case insensitive. So there is a small chance that two different encrypted file names map to two strings which are the same when compared case insensitively.

Now, I think the probability of this is quite small. The minimum length of a file name is 16 bytes (un-encoded) so 128 bits which makes 9 base32768 characters.

So there is about a 0.006 chance any given character can be case folded. What we'd like to know is how many of these filenames would we have to put in a directory in order to have a 50% chance of having a case folded collision. I've had a few goes at working this out and I'm coming out with an answer of the order of 10²¹. I'm not sure I trust my maths here but its a big number, that I'm sure of.

We've been thinking about making a variant of base32768 which does not include both the upper and lower case versions of any characters which can be case folded.

Any thoughts?

SyntaxError: Unexpected token ...

When attempting to run the example code from the README, I receive the following error:

$ node test.js 
/home/aceat64/node_modules/base32768/index.js:90
        var result = bits_to_bits([...buf.values()], MAGIC_NUMBER_B, MAGIC_NUMBER_A);
                                   ^^^

SyntaxError: Unexpected token ...
    at exports.runInThisContext (vm.js:53:16)
    at Module._compile (module.js:373:25)
    at Object.Module._extensions..js (module.js:416:10)
    at Module.load (module.js:343:32)
    at Function.Module._load (module.js:300:12)
    at Module.require (module.js:353:17)
    at require (internal/module.js:12:17)
    at Object.<anonymous> (/home/aceat64/test.js:1:79)
    at Module._compile (module.js:409:26)
    at Object.Module._extensions..js (module.js:416:10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.