GithubHelp home page GithubHelp logo

onsbigdata / parsing_company_accounts Goto Github PK

View Code? Open in Web Editor NEW
43.0 43.0 17.0 39.58 MB

Reading digital XBRL/iXBRL account documents - for sharing

Jupyter Notebook 5.60% HTML 94.23% Python 0.18%

parsing_company_accounts's People

Contributors

martinons avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

parsing_company_accounts's Issues

Licence clarification

Hi @martinONS ,

Great code, works like a charm.

You haven't specified a licence for your code. https://help.github.com/en/github/creating-cloning-and-archiving-repositories/licensing-a-repository So I am not sure what you allow and what it a breach of your copyrights. I want to write a blog post explaining how to get data financial data from Companies House and do some analysis on it. Will be linking to your GitHub page if you allow the use of your code.

You can contact me on (smellofroses2@gmail.***) Anna

pip3 install xbrl_parser fails due to missing 'README.md' file

Hi,

I think there is a bug in the distribution: the authors have forgotten to include file README.md into the distribution package.

I tried to install xbrl_parser but run into an issue with its installation. When running pip install xbrl_parser or pip3 install xbrl_parser I am getting an error FileNotFoundError: [Errno 2] No such file or directory: 'README.md'.

Full text of the error:

Collecting xbrl_parser Downloading https://files.pythonhosted.org/packages/9c/72/f8b6d58dfe085a8e9f2b6bf05795f9deb071372d476eb2100f6c0355d803/xbrl_parser-0.1.tar.gz ERROR: Command errored out with exit status 1: command: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-g9hjy3nu/xbrl-parser/setup.py'"'"'; __file__='"'"'/tmp/pip-install-g9hjy3nu/xbrl-parser/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-g9hjy3nu/xbrl-parser/pip-egg-info cwd: /tmp/pip-install-g9hjy3nu/xbrl-parser/ Complete output (5 lines): Traceback (most recent call last): File "<string>", line 1, in <module> File "/tmp/pip-install-g9hjy3nu/xbrl-parser/setup.py", line 8, in <module> long_description=open('README.md').read(), FileNotFoundError: [Errno 2] No such file or directory: 'README.md' ---------------------------------------- ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Please update the distribution package.

This bug was discussed and identified on StackOverflow - https://stackoverflow.com/questions/59802483/how-do-i-install-xbrl-parser-on-the-server-from-https-github-com-onsbigdata-pa/59802591#59802591

Managed to get around this issue by manually copping xbrl_parser.py in my directory.

Thanks,
Anna

Extract PDF.IPYNB returns KeyError: 'left'

Tried running the extractPDF data jupyter notebook but returned a keyerror.

Converting PDF image to multiple png files
./example_data_PDF/00053475.pdf
Performing pre-processing on all png images
Traceback (most recent call last):

  File "<ipython-input-31-0d43203f9a14>", line 1, in <module>
    results = xip.process_PDF("./example_data_PDF/00053475.pdf")

  File "C:\Users\My_Name\Documents\Python_Scripts\Urls_to_comps\DataCity\parsing_company_accounts\xbrl_image_parser.py", line 384, in process_PDF
    data = make_measurements(data)

  File "C:\Users\My_Name\Documents\Python_Scripts\Urls_to_comps\DataCity\parsing_company_accounts\xbrl_image_parser.py", line 141, in make_measurements
    data['centre_x'] = data['left'] + ( data['width'] / 2. )

  File "C:\Users\My_Name\Anaconda3\envs\py37\lib\site-packages\pandas\core\frame.py", line 2927, in __getitem__
    indexer = self.columns.get_loc(key)

  File "C:\Users\My_Name\Anaconda3\envs\py37\lib\site-packages\pandas\core\indexes\base.py", line 2659, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))

  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc

  File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc

  File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item

  File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'left'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.