GithubHelp home page GithubHelp logo

codedread / bitjs Goto Github PK

View Code? Open in Web Editor NEW
79.0 3.0 7.0 835 KB

Binary Tools for JavaScript

License: MIT License

JavaScript 98.56% HTML 0.07% Dockerfile 0.15% Makefile 0.21% C 1.01%
javascript zip rar tar unrar unzip binary gif jpeg png

bitjs's Introduction

Node.js CI

bitjs: Binary Tools for JavaScript

Introduction

A set of dependency-free JavaScript modules to work with binary data in JS (using Typed Arrays). Includes:

  • bitjs/archive: Decompressing files (unzip, unrar, untar, gunzip) in JavaScript, implemented as Web Workers where supported, and allowing progressive unarchiving while streaming.
  • bitjs/codecs: Get the codec info of media containers in a ISO RFC6381 MIME type string.
  • bitjs/file: Detect the type of file from its binary signature.
  • bitjs/image: Parsing GIF, JPEG, PNG. Conversion of WebP to PNG or JPEG.
  • bitjs/io: Low-level classes for interpreting binary data (BitStream, ByteStream). For example, reading or peeking at N bits at a time.

Installation

Install it using your favourite package manager, the package is registered under @codedread/bitjs.

npm install @codedread/bitjs

or

yarn add @codedread/bitjs

CommonJS/ESM in Node

This module is an ES Module. If your project uses CommonJS modules, it's a little trickier to use. One example of this is if a TypeScript project compiles to CommonJS, it will try to turn imports into require() statements, which will break. The fix for this (unfortunately) is to update your tsconfig.json:

 "moduleResolution": "Node16",

and use a Dynamic Import:

const { getFullMIMEString } = await import('@codedread/bitjs');

Packages

bitjs.archive

This package includes objects for decompressing and compressing binary data in popular archive formats (zip, rar, tar, gzip). Here is a simple example of unrar:

Decompressing

import { Unrarrer } from './bitjs/archive/decompress.js';
const unrar = new Unrarrer(rarFileArrayBuffer);
unrar.addEventListener('extract', (e) => {
  const {filename, fileData} = e.unarchivedFile;
  console.log(`Extracted ${filename} (${fileData.byteLength} bytes)`);
  // Do something with fileData...
});
unrar.addEventListener('finish', () => console.log('Done'));
unrar.start();

More details and examples are located on the API page.

bitjs.codecs

This package includes code for dealing with media files (audio/video). It is useful for deriving ISO RFC6381 MIME type strings, including the codec information. Currently supports a limited subset of MP4 and WEBM.

How to use:

  • First, install ffprobe (ffmpeg) on your system.
  • Then:
import { getFullMIMEString } from 'bitjs/codecs/codecs.js';
/**
 * @typedef {import('bitjs/codecs/codecs.js').ProbeInfo} ProbeInfo
 */

const cmd = 'ffprobe -show_format -show_streams -print_format json -v quiet foo.mp4';
exec(cmd, (error, stdout) => {
  /** @type {ProbeInfo} */
  const info = JSON.parse(stdout);
  // 'video/mp4; codecs="avc1.4D4028, mp4a.40.2"'
  const contentType = getFullMIMEString(info);
  ...
});

bitjs.file

This package includes code for dealing with files. It includes a sniffer which detects the type of file, given an ArrayBuffer.

import { findMimeType } from './bitjs/file/sniffer.js';
const mimeType = findMimeType(someArrayBuffer);

bitjs.image

This package includes code for dealing with image files. It includes low-level, event-based parsers for GIF, JPEG, and PNG images.

It also includes a module for converting WebP images into alternative raster graphics formats (PNG/JPG), though this latter module is deprecated, now that WebP images are well-supported in all browsers.

GIF Parser

import { GifParser } from './bitjs/image/parsers/gif.js'

const parser = new GifParser(someArrayBuffer);
parser.onApplicationExtension(evt => {
  const appId = evt.detail.applicationIdentifier;
  const appAuthCode = new TextDecoder().decode(evt.detail.applicationAuthenticationCode);
  if (appId === 'XMP Data' && appAuthCode === 'XMP') {
    /** @type {Uint8Array} */
    const appData = evt.detail.applicationData;
    // Do something with appData (parse the XMP).
  }
});
parser.start();

JPEG Parser

import { JpegParser } from './bitjs/image/parsers/jpeg.js'
import { ExifTagNumber } from './bitjs/image/parsers/exif.js';

const parser = new JpegParser(someArrayBuffer)
    .onApp1Exif(evt => console.log(evt.detail.get(ExifTagNumber.IMAGE_DESCRIPTION).stringValue));
await parser.start();

PNG Parser

import { PngParser } from './bitjs/image/parsers/png.js'
import { ExifTagNumber } from './bitjs/image/parsers/exif.js';

const parser = new PngParser(someArrayBuffer);
    .onExifProfile(evt => console.log(evt.detail.get(ExifTagNumber.IMAGE_DESCRIPTION).stringValue))
    .onTextualData(evt => console.dir(evt.detail));
await parser.start();

WebP Converter

import { convertWebPtoPNG, convertWebPtoJPG } from './bitjs/image/webp-shim/webp-shim.js';
// convertWebPtoPNG() takes in an ArrayBuffer containing the bytes of a WebP
// image and returns a Promise that resolves with an ArrayBuffer containing the
// bytes of an equivalent PNG image.
convertWebPtoPNG(webpBuffer).then(pngBuf => {
  const pngUrl = URL.createObjectURL(new Blob([pngBuf], {type: 'image/png'}));
  someImgElement.setAttribute(src, pngUrl);
});

bitjs.io

This package includes stream objects for reading and writing binary data at the bit and byte level: BitStream, ByteStream.

import { BitStream } from './bitjs/io/bitstream.js';
const bstream = new BitStream(someArrayBuffer, true /** most-significant-bit-to-least */ );
const crc = bstream.readBits(12); // Read in 12 bits as CRC. Advance pointer.
const flagbits = bstream.peekBits(6); // Look ahead at next 6 bits. Do not advance pointer.

More details and examples are located on the API page.

Reference

  • UnRar: A work-in-progress description of the RAR file format.

History

This project grew out of another project of mine, kthoom (a comic book reader implemented in the browser). This repository was automatically exported from my original repository on GoogleCode and has undergone considerable changes and improvements since then.

bitjs's People

Contributors

andrebrait avatar antimatter15 avatar codedread avatar dependabot[bot] avatar elesueur avatar gavindsouza avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

bitjs's Issues

Typescript error: The '@codedread/bitjs' library may need to update its package.json or typings

Hi.

I get this error:
Typescript error: There are types at 'node_modules/@codedread/bitjs/types/index.d.ts', but this result could not be resolved when respecting package.json "exports". The '@codedread/bitjs' library may need to update its package.json or typings

The error goes away if I find package.json in node_modules/@coderead and replace:

"exports": "./index.js"

with

"exports": [
    "./index.js",
    "./types/index.d.ts" 
  ],

Add support for RarVM

The RAR format supports a RarVM (virtual machine). We should support it so that we can open more archive files.

Unarchive from a stream of bytes

Today, the unarchivers require the entire ArrayBuffer to be available before it can start unarchiving. This works fine for local files, but for anything else (network fetches) it requires the entire file to be downloaded before unarchiving can begin.

Should investigate reworking the code and API to deal with a stream of bytes.

Support WebP images

Safari and older browsers do not support WebP images. bitjs should provide a way to convert an ArrayBuffer of a WebP image into an ArayBuffer of a PNG image.

Preferably this should work very fast, maybe use WebAssembly?

Migrate bitjs to es6

At some point, we should cut over bitjs to use es6 features that are now widely supported natively in modern browsers:

  • const, let
  • classes
  • arrow functions
  • destructuring

We can keep an es5- branch around for anybody that wants to use that.

Do MIME type sniffing on images

bitjs.image should have a function to take in an ArrayBuffer and return the MIME type, if it can be determined from the signature bytes.

unarchive fails to terminate workers after they complete, chrome crashes with large number of unarchives

What steps will reproduce the problem?
1. unzip a large number of files in Chrome


What is the expected output? What do you see instead?

Aw Snap!


What version of the product are you using? On what operating system?

Chrome 24.0.1312.52, Ubuntu 10.10

Please provide any additional information below.

Attached patch resolves the issue by terminating the workers after they 
completed.

Original issue reported on code.google.com by [email protected] on 18 Mar 2014 at 4:37

Attachments:

Support zipping

Add a library that lets a client send files to a Worker and the Worker sends zipped bytes back to the client.

  • Start with just storage (no compression).
  • Then implement DEFLATE.
  • Let clients use options to tweak zipper behavior (fast zipping vs smaller files)

Unrar file

Hi, how I can unrar file with your code?

Angular 9 + "no loaders are configured"

Hi there,

Was hoping you could help me out. I'm trying to use your bitjs in an angular 9 application and I am running into this problem:

./node_modules/@codedread/bitjs/image/webp-shim/webp-shim.js 9:18
Module parse failed: Unexpected token (9:18)
You may need an appropriate loader to handle this file type, currently no loaders are configured to process this file. See https://webpack.js.org/concepts#loaders
> const url = import.meta.url;
| if (!url.endsWith('/webp-shim.js')) {
|   throw 'webp-shim must be loaded as webp-shim.js';

I've made sure all of the JS files are imported properly, and followed your simple example but can't seem to figure out what i'm doing wrong.

Thanks for your help

EDIT: I double checked and I am receiving the file from the api as a type arraybuffer.

Untarrer: File path truncated

I have logged the Progress Events while unarchiving two different files. I'm not sure what the difference between the two files really is, but one of them always shows up with currentFileName truncated.

File 1:

UnarchiveProgressEvent {type: "progress", currentFilename: "in.upload.test/public/files/hack.png", currentFileNumber: 0, currentBytesUnarchivedInFile: 15974, totalFilesInArchive: 1, …}

File 2:

UnarchiveProgressEvent {type: "progress", currentFilename: "./gavin.upload.test/private/files/500m.mt", currentFileNumber: 0, currentBytesUnarchivedInFile: 104857600, totalFilesInArchive: 1, …}

Initially, I thought it was that the locale wasn't set, but it seems that I get that error (tar: Failed to set default locale) while opening both the files in vim.

expected: "./gavin.upload.test/public/files/hack.png"
current: "in.upload.test/public/files/hack.png"

archive: Support rar 5.0 format

Hi Jeff, first of all, thanks so much for such an excellent javascript library. I'm using it for unrar in a web application and wondering if it is possible to add support for RAR 5.0?

Always emit a Progress Event before an Extract event

The number of files in the archive are conveyed in the Progress Event. In some scenarios, Extract events are emitted before the first Progress Event, which could cause client code to not understand how many files to expect.

Move to ES Modules

Multiple browsers now support ES6 modules natively. This issue is to track rewriting bitjs using modules instead of "polluting" the window namespace.

(Should keep the old code as an unmaintained branch though)

Get archive content metadata

Description

Trying your RAR decoder as a replacement of libarchive, it works well but I'm running into some performance issue.

Somes books I'm dealing with (in CBR) are quite big, over 500MB and 300 pages.
But I dont see with this library how get file descriptions about the rar content without extracting everything. I end up loading the whole book in memory (which takes >20s) when I just want to list files and load the 2 first pages.

LibArchive for example expose a method .getFilesObject() to access metadata and listing file. And the reading/decoding operation is a separated async operation.

I searched but I couldnt see a way to have this kind of feature with bitjs, am I missing something?

Support data descriptors in zip files

In unzip.js we have the following TODO:

// TODO: deal with data descriptor if present (we currently assume no data descriptor!)
// "This descriptor exists only if bit 3 of the general purpose bit flag is set"
// But how do you figure out how big the file data is if you don't know the compressedSize
// from the header?!?

What I think this means is that we may have to rely on the Central Directory structures to know when each local file starts and ends. That should tell us where the data descriptor fields are so we can properly get the compressed size of each file.

This will require a bit of refactoring, unfortunately. One possibility is upon first encounter of a data descriptor, stop unzipping local files, scan through the rest of the file, get central directory information and then pick up scanning again.

(This also means that if a zip file uses data descriptors, it's not really set up for streaming, since the entire zip file has to be read into the browser before we can get to the Central Directory structures).

Decompressing from node js and not the browser

Having looked over the examples , it's quite obvious that bitjs is intended to be called from the browser context , but would it be possible to call it from a node js process instead as I plan on using this in an electron application

untar `ustar` format for long filenames missing slash

👋 I think I found a bug in your untar implementation by pretty roundabout means.

I received an issue report on my own project at arethetypeswrong/arethetypeswrong.github.io#101. I’m not using bitjs; I’m using my own fork of https://github.com/antimatter15/untar.js, which is itself a port of bitjs. The user there provided a repro, which I’ve built and packed into a tarball here, if you’d like to have a repro for a failing test:

fails-2.0.1-alpha.0.tgz

Unpacking that tarball with bitjs’s untar will show a file name missing a slash compared to what you’d see if you unpacked the tarball onto your filesystem:

+ dist/types/primitives/ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKABCDEFGHIJKLMNOPQRSTUVWXYZA.d.ts
- dist/types/primitives/ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJK/ABCDEFGHIJKLMNOPQRSTUVWXYZA.d.ts
                                                             ^

This name is too long to fit in the archive header, so its name is split between the name and prefix fields and has to be concatenated, as you have here:

bitjs/archive/untar.js

Lines 89 to 93 in 97cfb3b

this.prefix = readCleanString(bstream, 155);
if (this.prefix.length) {
this.name = this.prefix + this.name;
}

I found a source to support the empirical evidence that these should be joined with a slash:

The name field (100 chars) an inserted slash ('/') and the prefix field (155 chars) produce the pathname of the file. When recreating the original filename, name and prefix are concatenated, using a slash character in the middle. If a pathname does not fit in the space provided or may not be split at a slash character so that the parts will fit into 100 + 155 chars, the file may not be archived. Linknames longer than 100 chars may not be archived too.

https://linux.die.net/man/1/ustar

I made this change on my fork: andrewbranch/untar.js@095c173

MIME type for Matroska Video/Audio

WebM videos support VP8, VP9 and Vorbis, Opus. WebM is based on the Matroska container.

I have seen some Matroska video/audio files out there (.mkv) that have different audio and video codecs in their streams (for example: h264, dts).

  1. What should the MIME type look like for Matroska (not WebM) files? I have seen video/x-matroska and audio/x-matroska
  2. Should the codecs information be include in the full MIME type (RFC6381)? I have seen examples of this: "video/x-matroska;codecs=avc1"

Publish package

Would you consider publishing this repository? As of now there is no way of fetching it as a dependency.
Thanks.

More semantic event handling for Unarchiver / Archivers

Like was done for commit c3a7b35, we could remove all the Unarchive Event sub-classes in archive/events.js and replace them with @typedef data structures and attach to CustomEvents.

I want to do this for two reason:

  • It provides a better DX for subscribing to events with methods like onExtract() that the IDE can give hints for. We can keep addEventListener() around for backwards-compatibility.
  • It aligns the archive package with the image package.

Expand the test suite to cover un-archivers

Need automate tests for not just the io packages (streams/buffer) but for the unarchivers (unzip, unrar, untar).

One idea is to have a short script/command that can turn a binary file into a simple JS file that can be loaded in via importScripts() in a worker.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.