sheetjs / js-cfb Goto Github PK

View Code? Open in Web Editor NEW

66.0 12.0 15.0 573 KB

:floppy_disk: OLE File Container Format

Home Page: https://sheetjs.com/cfb-editor

License: Apache License 2.0

JavaScript 95.68% Shell 0.57% Makefile 1.24% HTML 1.91% TypeScript 0.61%

cfb xls biff file-format storage mhtml

js-cfb's Introduction

Important

Thank you Clippy!

But our Sheet is in another Workbook!

The new source repository URL is https://git.sheetjs.com/sheetjs/sheetjs. SheetJS CE remains truly open source under the Apache 2.0 License.

Issues should be raised at https://git.sheetjs.com/sheetjs/sheetjs/issues. Users can register directly or sign in with a valid GitHub account. Issues can also be raised at https://sheetjs.com/chat.

Documentation is available at https://docs.sheetjs.com.

Scripts and NodeJS modules are available at https://cdn.sheetjs.com.

The master branch branch of the SheetJS/sheetjs repository on GitHub includes all commits through 515d1c6f2e1d3ca422ee9198b177cfd926434936.

The SheetJS Community Edition offers battle-tested open-source solutions for extracting useful data from almost any complex spreadsheet and generating new spreadsheets that will work with legacy and modern software alike.

SheetJS Pro offers solutions beyond data processing: Edit complex templates with ease; let out your inner Picasso with styling; make custom sheets with images/graphs/PivotTables; evaluate formula expressions and port calculations to web apps; automate common spreadsheet tasks, and much more!

Note

💼 We're Hiring!

SheetJS is looking for US-based software developers to expand this project and related software libraries and tools. https://sheetjs.com/careers more info.

Resources

License

Please consult the attached LICENSE file for details. All rights not explicitly granted by the Apache 2.0 License are reserved by the Original Author.

js-cfb's People

Contributors

Stargazers

Watchers

Forkers

steveyen jokerslab watmough rossj kangkang721 yang123vc aeppic albanm yuryalkevich garrettluu stof isabella232 shearer12345 jimyzzp rinzler17

js-cfb's Issues

stream names should be limited to 32 UTF-16 code points, including the terminating null character

Thanks for js-cfb, and for the changes to truncate stream names introduced in 0e33eb6.

The MS-CFB spec says

storage and stream names are limited to 32 UTF-16 code points, including the terminating null character.

Currently, cfb.js truncates stream name to 32 characters, but as the name has to be null terminated, it should be truncated to 31, allowing WriteShift to pad the rest with 0.

So, in cfb.js#894

		if(_nm.length > 32) {
			console.error("Name " + _nm + " will be truncated to " + _nm.slice(0,32));
			_nm = _nm.slice(0, 32);
		}

all the 32s should be 31.

In my testing this doesn't break any of my tools, but some throw warnings:

Python compoundfiles - warns missing NULL terminator in name
P7Zip - no warnings, displays the character that should be null
olebrowse - no warnings, doesn't display the character that should be null
olefile - detects no fatal parsing issue "'incorrect DirEntry name length >64 bytes"

I can submit a PR for this if you'd like

TypeError: file.slice is not a function

.../node_modules/cfb/cfb.js:370
var blob = file.slice(0,512);
                ^

TypeError: file.slice is not a function
    at parse (.../node_modules/cfb/cfb.js:370:17)
    at Object.read (.../node_modules/cfb/cfb.js:688:9)

My code is this:

const cfb = CFB.read(file, { type: 'file' })
	// const cfb = CFB.parse(data)
	const vbaDirEntry = CFB.find(cfb, 'VBA')
	if (!vbaDirEntry) {
		throw new Error('VBA root not found')
	}

	const vbaDir = CFB.read(cfb, vbaDirEntry)
	const modules = {}
	for (const entry of vbaDir.FullPaths) {
		console.log(entry)
	}

build fail when including cfb 1.0.3

I'm trying to import and use xlsx which depends on cfb. When I build my angular 5 project I get the following error:

ERROR in node_modules/cfb/types/index.d.ts(38,24): error TS2304: Cannot find name 'Buffer'. src/app/components/purchase-order/register/register.component.ts(98,12): error TS2304: Cannot find name 'Buffer'.

I have no problem resolving this base type in my main project but it fails inside of cfb. Am I using an incompatible module or is this possibly a bug in cfb? My package.json file contents:

{

"name": "venus-app-client",
"version": "0.0.0",
"license": "MIT",
"scripts": {
"ng": "ng",
"start": "ng serve --proxy-config proxy-config.json",
"build": "ng build",
"test": "ng test",
"lint": "ng lint",
"e2e": "ng e2e"
},
"private": true,
"dependencies": {
"angular/animations": "^5.2.3",
"angular/cdk": "^5.1.1",
"angular/common": "^5.2.3",
"angular/compiler": "^5.2.3",
"angular/core": "^5.2.3",
"angular/forms": "^5.2.3",
"angular/http": "^5.2.3",
"angular/platform-browser": "^5.2.3",
"angular/platform-browser-dynamic": "^5.2.3",
"angular/router": "^5.2.3",
"ngx-translate/core": "^9.1.1",
"types/lodash": "^4.14.100",
"cfb": "^1.0.3",
"codelyzer": "^4.1.0",
"commander": "^2.14.1",
"core-js": "^2.5.3",
"font-awesome": "^4.7.0",
"hammerjs": "^2.0.8",
"lodash": "^4.17.4",
"primeng": "^5.2.0",
"printj": "^1.1.1",
"rxjs": "^5.5.6",
"typescript": "~2.4.2",
"xlsx": "^0.12.1",
"zone.js": "^0.8.20"
},
"devDependencies": {
"angular/cli": "^1.6.7",
"angular/compiler-cli": "^5.2.3",
"angular/language-service": "^5.2.3",
"types/jasmine": "^2.8.6",
"types/jasminewd2": "^2.0.2",
"types/node": "^6.0.96",
"jasmine-core": "^2.9.1",
"jasmine-spec-reporter": "^4.2.1",
"karma": "^1.7.1",
"karma-chrome-launcher": "^2.2.0",
"karma-cli": "^1.0.1",
"karma-coverage-istanbul-reporter": "^1.4.1",
"karma-jasmine": "^1.1.1",
"karma-jasmine-html-reporter": "^0.2.2",
"protractor": "^5.3.0",
"ts-node": "^3.3.0",
"tslint": "^5.9.1"
}
}

cfb_add and write performance issues

Hi there,

I'm working on a program which converts .pst files to .msg files, primarily in Node but also in the browser, and it uses this library in a very write-heavy way for saving the .msg files. Through testing and profiling, I've noticed a couple write-related performance issues that I wanted to share.

With some modifications, I've been able to reduce the output generation time of my primary "large" test case (4300 .msg files from 1 .pst) by a factor of 8 from about 16 minutes to 2 minutes (running on Node).

The 1st issue, which may just be a matter of documentation, is that using cfb_add repeatedly to add all streams to a new doc is very slow, as it calls cfb_gc and cfb_rebuild every time. We switched from using cfb_add to directly pushing to cfb.FileIndex and cfb.FullPaths (and then calling cfb_rebuild once at the end) which reduced the output time from 16 minutes to 3.5 minutes.

The 2nd issue is that the _write and WriteShift functions do not utilize Buffer capabilities when it is available. By using Buffer.alloc() for the initial creation, which guarantees a 0-filled initialization, along with Buffer.copy for content streams, Buffer.write for hex / utf16le strings, and Buffer's various write int / uint methods, we were able to further reduce the output time from 3.5 minutes to 2 minutes.

If you wish, I would be happy to share my changes, or to work on a pull request which uses Buffer functions when available. My current changes don't do any feature detection, and rather just rely on Buffer being available, as even in the browser we use feross/buffer, so it would need some more work to maintain functionality in non-Buffer environments.

Thanks

process out of memory Exception

When xlsjs convert a big excel file(.xls) as readFile(), node get error below

FATAL ERROR: JS Allocation failed - process out of memory

Can you help me? How to sole it.

I attempted below setting

$ node --max-old-space-size=8192 my-node-script.js

If anyone has solution, please response it. : )

Problematic file giving infinite loop / OOM in make_sector_list

Hi there,

I have a .doc file that is not exiting the inner loop of make_sector_list, leading to an OOM crash. There is some issue with the file itself, as Word gives an error opening it; however, DFVIEW.EXE opens the file and will display all the contained streams without complaining.

Even if the file is bad and can't be read, hopefully there is some condition that can throw an exception rather than OOM.

Write performance issue & fix (rebuild_cfb)

Hi there,

I discovered today some unusually slow operations when trying to create a rather large .msg file with roughly 63k data nodes. I tracked the slowness down to a nested loop in rebuild_cfb() that is ensuring that each stream node has a parent storage node.

I made some modifications to track the names in a in a JS object instead, and the rebuild_cfb() time dropped from about 1 minute to only 30 ms. I decided to use a plain object instead of a Set just to maintain maximum compatibility with old browsers.

Below are my changes with the added fullPaths object and the removed nested loop.

	// Used to track which names exist
	var fullPaths = Object.create(null);
	var data/*:Array<[string, CFBEntry]>*/ = [];
	for(i = 0; i < cfb.FullPaths.length; ++i) {
		fullPaths[cfb.FullPaths[i]] = true;
		if(cfb.FileIndex[i].type === 0) continue;
		data.push([cfb.FullPaths[i], cfb.FileIndex[i]]);
	}
	for(i = 0; i < data.length; ++i) {
		var dad = dirname(data[i][0]);
		s = fullPaths[dad];
		if(!s) {
			data.push([dad, ({
				name: filename(dad).replace("/",""),
				type: 1,
				clsid: HEADER_CLSID,
				ct: now, mt: now,
				content: null
			}/*:any*/)]);
			// Add name to set
			fullPaths[dad] = true;
		}
	}

How to update xml file in CFB file?

I need to update xml file in CFB file. Just like CFB Editor. How can I implement it?

parsing files in container.content

Hi I'm hoping you can help me on this... for this project:
https://github.com/SuddenDevelopment/ScanWordDoc

I'm trying to be able to extract the Macro as a string. I can detect if it's there but the the cfb file.content is a buffer, I can toString('utf8') that buffer and see it's still a ways off from being workable... I get a bunch of unreadbale characters and the macro is in there. in this format i cant treat it like a string, I cant get an indexOf or regex match on anything except for an attribute in .content that is in quotes.

how can I parse the .content buffer in a word doc file to work with it further?

I ahev also tried passing the resulting .content buffers to cfb.parse and cfb.read with every option I could find :)

thanks

Bad parenting / hierarchy construction when parent is after R / L sibling tree

I've noticed some files that show an incorrect folder structure with this library. I believe the dad array is not filled correctly when a node comes before it's parent, and has R or L sibling nodes. I've attached a sample file that demonstrates the issue.

In the attached .cfb file, both some file 1 and some file 2 are sibling files under some folder, but js-cfb shows some file 2 as a root level file.

file.zip

Infinite loop in build_full_paths with some files

I've run into a file that enters an infinite loop with my changes in PR #6, even with your additional loop fix in commit 8d85fb6. The infinite loop doesn't happen in previous version 1.1.0.

I believe the dad tree for this file is constructed a bit incorrectly, thus it has a loop in it. I haven't looked into fixing the dad tree, but I did manage to avoid the infinite loop by slightly changing the final naming loop to:

	for(i=1; i < pl; ++i) {
		if(FI[i].type === 0 /* unknown */) continue;
		if (i !== dad[i]) {
			j = i;
			do {
				j = dad[j];
				FP[i] = FP[j] + "/" + FP[i];
			} while (j !== 0 && -1 !== dad[j] && j != dad[j]);
		}
		dad[i] = -1;
	}

I'm unable to share the file publicly, but I can email it if you would like.

XLSX broken file

/** Chase down the rest of the DIFAT chain to build a comprehensive list
    DIFAT chains by storing the next sector number as the last 32 bits */
function sleuth_fat(idx, cnt, sectors, ssz, fat_addrs) {
        var q = ENDOFCHAIN;
        if(idx === ENDOFCHAIN) {
                // if(cnt !== 0) throw new Error("DIFAT chain shorter than expected");
                if(cnt !== 0) console.log("DIFAT chain shorter than expected");
        } else if(idx !== -1 /*FREESECT*/) {
                var sector = sectors[idx], m = (ssz>>>2)-1;
                if(!sector) return;
                for(var i = 0; i < m; ++i) {
                        if((q = __readInt32LE(sector,i*4)) === ENDOFCHAIN) break;
                        fat_addrs.push(q);
                }
                sleuth_fat(__readInt32LE(sector,ssz-4),cnt - 1, sectors, ssz, fat_addrs);
        }
}

Infinite loop in get_sector_list with damaged .doc file

Hi there. I've come across a problematic .doc file that is causing an infinite loop in get_sector_list.

It looks like the 2nd half of this .doc file is all null, so it is definitely damaged & invalid, but it would be nice to avoid the infinite loop.

In this specific case, the loop starts off with j = 0, which results in the next j value being read from sectors[312], which is all null bytes due to the file corruption. This results in an infinite loop with j = 0.

I noticed that the chkd array is not being checked. Adding if (chkd[j]) break; at the top of the loop avoids the infinite loop and results in a later exception. Perhaps it's better to throw immediately inside the loop?

build_full_paths does not prepend "root_entry/" if child is before parent

I came across and issue where the returned FullPaths array from a cfb.parse call does not properly append the root name (root_entry/) to child nodes (with depth greater than 1) that appear before their parent node.

It seems that in such a situation, build_full_paths properly constructs the dad tree, but terminates the final path construction loop too early, before it has prepended root_entry/.

I have created and attached a sample file the demonstrates the issue.

The returned FullPaths array is:

[
    "Root Entry/",
    "some folder/some child",
    "Root Entry/some folder/"
]

Despite "some child" being a child of "some folder"

Here is the view from DFVIEW.EXE

path-test.zip

Weird entry names in .msi file

MSI file is also MS-CFB format.

I opened https://cmake.org/files/v3.11/cmake-3.11.1-win64-x64.msi with http://sheetjs.com/cfb-editor

The file can be opened and the content of entries seems to be correct but file names seem wrong:

Able to remove Seed file item

This might be a good item to add to allow for the file to be created without the seed.

function _write(cfb, options) {
var _opts = options || {};
/* MAD is order-sensitive, skip rebuild and sort */
if(_opts.fileType == 'mad') return write_mad(cfb, _opts);
rebuild_cfb(cfb);
if (_opts.noseed) cfb_del(cfb, "/\u0001Sh33tJ5"); <--------------- Added to allow for the file item not to be created!!!!
switch(_opts.fileType) {
case 'zip': return write_zip(cfb, _opts);
//case 'mad': return write_mad(cfb, _opts);
}

fs.readFileSync is not a function

How do I use "fs" in my browser

sheetjs / js-cfb Goto Github PK

js-cfb's Introduction

Thank you Clippy!

But our Sheet is in another Workbook!

💼 We're Hiring!

Resources

License

js-cfb's People

Contributors

Stargazers

Watchers

Forkers

js-cfb's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs