onsbigdata / parsing_company_accounts Goto Github PK
View Code? Open in Web Editor NEWReading digital XBRL/iXBRL account documents - for sharing
Reading digital XBRL/iXBRL account documents - for sharing
Hi @martinONS ,
Great code, works like a charm.
You haven't specified a licence for your code. https://help.github.com/en/github/creating-cloning-and-archiving-repositories/licensing-a-repository So I am not sure what you allow and what it a breach of your copyrights. I want to write a blog post explaining how to get data financial data from Companies House and do some analysis on it. Will be linking to your GitHub page if you allow the use of your code.
You can contact me on (smellofroses2@gmail.***) Anna
Hi,
I think there is a bug in the distribution: the authors have forgotten to include file README.md into the distribution package.
I tried to install xbrl_parser but run into an issue with its installation. When running pip install xbrl_parser or pip3 install xbrl_parser I am getting an error FileNotFoundError: [Errno 2] No such file or directory: 'README.md'.
Full text of the error:
Collecting xbrl_parser Downloading https://files.pythonhosted.org/packages/9c/72/f8b6d58dfe085a8e9f2b6bf05795f9deb071372d476eb2100f6c0355d803/xbrl_parser-0.1.tar.gz ERROR: Command errored out with exit status 1: command: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-g9hjy3nu/xbrl-parser/setup.py'"'"'; __file__='"'"'/tmp/pip-install-g9hjy3nu/xbrl-parser/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-g9hjy3nu/xbrl-parser/pip-egg-info cwd: /tmp/pip-install-g9hjy3nu/xbrl-parser/ Complete output (5 lines): Traceback (most recent call last): File "<string>", line 1, in <module> File "/tmp/pip-install-g9hjy3nu/xbrl-parser/setup.py", line 8, in <module> long_description=open('README.md').read(), FileNotFoundError: [Errno 2] No such file or directory: 'README.md' ---------------------------------------- ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
Please update the distribution package.
This bug was discussed and identified on StackOverflow - https://stackoverflow.com/questions/59802483/how-do-i-install-xbrl-parser-on-the-server-from-https-github-com-onsbigdata-pa/59802591#59802591
Managed to get around this issue by manually copping xbrl_parser.py in my directory.
Thanks,
Anna
Tried running the extractPDF data jupyter notebook but returned a keyerror.
Converting PDF image to multiple png files
./example_data_PDF/00053475.pdf
Performing pre-processing on all png images
Traceback (most recent call last):
File "<ipython-input-31-0d43203f9a14>", line 1, in <module>
results = xip.process_PDF("./example_data_PDF/00053475.pdf")
File "C:\Users\My_Name\Documents\Python_Scripts\Urls_to_comps\DataCity\parsing_company_accounts\xbrl_image_parser.py", line 384, in process_PDF
data = make_measurements(data)
File "C:\Users\My_Name\Documents\Python_Scripts\Urls_to_comps\DataCity\parsing_company_accounts\xbrl_image_parser.py", line 141, in make_measurements
data['centre_x'] = data['left'] + ( data['width'] / 2. )
File "C:\Users\My_Name\Anaconda3\envs\py37\lib\site-packages\pandas\core\frame.py", line 2927, in __getitem__
indexer = self.columns.get_loc(key)
File "C:\Users\My_Name\Anaconda3\envs\py37\lib\site-packages\pandas\core\indexes\base.py", line 2659, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'left'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.