dissectmalware / pyxlsb2 Goto Github PK
View Code? Open in Web Editor NEWan Excel 2007+ Binary Workbook (xlsb) parser for Python
License: Apache License 2.0
an Excel 2007+ Binary Workbook (xlsb) parser for Python
License: Apache License 2.0
Excel (.xlsb) Binary File Format(office07 format. max formula id is 0x01e4 now, but without 0x005c):
https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-xlsb/90a52fcb-ce63-497f-a3d3-173c42d82242
Excel Binary File Format (.xls) Structure(office 97 format. max formula id was 0x017b):
https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-xls/00b5dd7d-51ca-4938-b7b7-483fe0e5933b
pyxlsb2/ptags.py
xlrd2/formula.py
Hi,
There's a small bug in pyxlsb2/formula.py:Formula.str
It calls the stringify method which expects a workbook as an argument without any arguments.
As far as I could tell this code is not used, but if it were it would crash as it is missing that required argument, so I thought I would bring it to your attention.
Thanks!
Hello,
I reading an .xlsb file. I found some files which doesn't seem to open while other work.
Where the files which not worked throw this error -
IndexError: list index out of range
from pyxlsb2 import open_workbook
xlsb_file_path= "sample_file.xlsb"
with open_workbook(xlsb_file_path) as wb:
for sheet in wb.sheets:
print(sheet.name)
This is what chatgpt is suggesting -
The error you're encountering is an IndexError with the message "list index out of range." This typically occurs when you're trying to access an element in a list using an index that is outside the valid range of indices for that list. In this specific case, the error is happening in the pyxlsb2 library code while parsing the formula of a cell.
The relevant part of the traceback is:
plaintext
Copy code
File "/home/ujwala/.venvs/ingest_venv/lib/python3.9/site-packages/pyxlsb2/formula.py", line 16, in stringify
return '' if not tokens else tokens.pop().stringify(tokens, workbook)
IndexError: list index out of range
Here's what's happening:
The tokens list is being accessed with tokens.pop().
The pop() method removes and returns the last item from the list.
The stringify method is then called on the popped item.
However, it seems that the tokens list is empty when pop() is called, resulting in an IndexError. The IndexError occurs because there is no item to pop from an empty list.
To fix this issue:
Check if tokens is not empty before calling pop().
Update the return statement in the stringify method of the pyxlsb2/formula.py file to include a check for an empty list:
return '' if not tokens else tokens.pop().stringify(tokens, workbook)
becomes:
return '' if not tokens else tokens.pop().stringify(tokens, workbook) if tokens else ''
This modification ensures that pop() is only called when tokens is not empty.
Update pyxlsb2 library to the latest version.
Ensure that you are using the latest version of the pyxlsb2 library, as this issue may have been addressed in a newer release. You can check the library's official repository for updates.
After making these changes, try running your code again and see if the issue persists. If the problem persists or if there's a specific version of pyxlsb2 that you're using, you might want to check the library's GitHub repository for open issues or consider reporting this issue to the library maintainers.
Hello,
there seems to be version 0.0.9 on pypi, but here on github the last version is 0.0.8 .
Please could you put the versions in sync?
Thank you
When I try to install the package with
pip install enum34 pyxlsb2
It fails:
Collecting enum34==1.1.10
Downloading enum34-1.1.10-py2-none-any.whl (11 kB)
Collecting pyxlsb2==0.0.2
Downloading pyxlsb2-0.0.2.tar.gz (31 kB)
ERROR: Command errored out with exit status 1:
command: /tmp/venv/env/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-CQMm6n/pyxlsb2/setup.py'"'"
'; __file__='"'"'/tmp/pip-install-CQMm6n/pyxlsb2/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-6J3CgU
cwd: /tmp/pip-install-CQMm6n/pyxlsb2/
Complete output (13 lines):
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-CQMm6n/pyxlsb2/setup.py", line 3, in <module>
from pyxlsb2 import __version__
File "pyxlsb2/__init__.py", line 3, in <module>
from .workbook import Workbook
File "pyxlsb2/workbook.py", line 7, in <module>
from .recordreader import RecordReader
File "pyxlsb2/recordreader.py", line 4, in <module>
from . import records as recs
File "pyxlsb2/records.py", line 1, in <module>
from enum import Enum
ImportError: No module named enum
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
But when I installed it separately, it works:
pip install enum34
pip install pyxlsb2
Add support for function with ID above 0x17B
Example: RANDBETWEEN in 38e01ea82f15a2dcd6905daf98e2f51886e1611ccc0dfc0e76a933b0b6db719d
Handle shared string (BrtCellIsst)
38e01ea82f15a2dcd6905daf98e2f51886e1611ccc0dfc0e76a933b0b6db719d
Hi!
I was wondering whether this project could be used with pandas instead of original pyxlsb. However, they are rather strict with licensing questions.
The original pyxlsb was released under LGPLv3 (which was fine). It seems derived work can't be licensed under Apache 2.0 according to this. You could consider relicensing pyxlsb2 under LGPLv3 so it would be compliant with licenses and usable.
Hello,
I want to use pyxlsb2 for reading an .xlsb file. I found some files which doesn't seem to open while other work.
Opening with pyxlsb works, pyxlsb2 not.
I have two files:
Source code:
from pyxlsb2 import open_workbook
with open_workbook("y.xlsb") as wb:
for sheet in wb.sheets:
print(sheet)
I can't find any difference in the files. Their both are .xlsb. files. I need to see if the files have any hidden worksheets. This is not possible with pyxlsb only with pyxlsb2 as I know so far.
Best regards
Patrick
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.