GithubHelp home page GithubHelp logo

kanedata / ixbrl-parse Goto Github PK

View Code? Open in Web Editor NEW
62.0 11.0 25.0 228 KB

A python library for getting useful data out of ixbrl files.

Home Page: https://ixbrl-parse.readthedocs.io/

License: MIT License

Python 15.33% HTML 84.67%
finance python python37 xbrl

ixbrl-parse's People

Contributors

adobrinevski avatar ajmarks avatar avyfain avatar dependabot[bot] avatar drkane avatar vin0110 avatar wcollinscw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ixbrl-parse's Issues

Additional numeric metadata

Thank you for sharing this excellent library @drkane ! I'm currently using it to extract numeric data from Companies House reports and assessing how feasible it would be to determine which table in a document each numeric object belongs to. Looking at XBRL format, it seems that it would be possible to tell which page a numeric object belongs to but tables could be trickier as the table HTML element seems to be used for non-numeric as well. I'm wondering if this is a feature you have considered?

'NoneType' object has no attribute 'get'

05174546.zip

Hi DrKane,

Ive zipped an example account because it happened again. What is quite weird is that sometimes when if I pull it from companies house no issue. But then sometimes I get this file which does look different when you inspect it.

Do you think the file is just corrupted on output?

ValueError: could not convert string to float:

Great library!! Need help with the following error.
When executing the following code get a ValueError

for filename in reversed(filelist):
    print("filename", filename)
    # print("index", index)
    this_file = {}
    thisfilename = os.path.join(this_path, filename)
    with open(thisfilename,encoding="utf8") as a:
        x = IXBRL(a)
        print(x.contexts)
        print(x.nonnumeric)
        print(x.numeric)

Below is the detailed error available :

Traceback (most recent call last):
  File "C:/..-ixbrl.py", line 19, in <module>
    x = IXBRL(a)
  File "C:\..\ixbrlparse\core.py", line 12, in __init__
    self._get_numeric()
  File "C:\..\ixbrlparse\core.py", line 67, in _get_numeric
    }) for s in self.soup.find_all({'nonFraction'})]
  File "C:\\..\ixbrlparse\core.py", line 67, in <listcomp>
    }) for s in self.soup.find_all({'nonFraction'})]
  File "C:\..\ixbrlparse\core.py", line 200, in __init__
    self._parse_value()
  File "C:\..\ixbrlparse\core.py", line 212, in _parse_value
    self.value = float(self.value)
ValueError: could not convert string to float: 

NotImplementedError: Format "fixedzero" not implemented (namespace "ixt")

Hello
I tried to parse the following file -official file from Reanult Group" and got the following error

Traceback (most recent call last):
  File "C:\Python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Python39\lib\site-packages\ixbrlparse\__main__.py", line 73, in <module>
    main()
  File "C:\Python39\lib\site-packages\ixbrlparse\__main__.py", line 53, in main
    x = IXBRL(args.infile)
  File "C:\Python39\lib\site-packages\ixbrlparse\core.py", line 194, in __init__
    self.parser._get_numeric()
  File "C:\Python39\lib\site-packages\ixbrlparse\core.py", line 110, in _get_numeric
    self.numeric.append(ixbrlNumeric(element))
  File "C:\Python39\lib\site-packages\ixbrlparse\components\numeric.py", line 36, in __init__
    self.format = get_format(format_["format_"])(**format_)
  File "C:\Python39\lib\site-packages\ixbrlparse\components\transform.py", line 106, in get_format
    raise NotImplementedError(
NotImplementedError: Format "fixedzero" not implemented (namespace "ixt")

I used the command line

python -m ixbrlparse 969500F7JLTX36OUI695-2021-12-31.xhtml

the document is from this file RENAULT GROUP on this website https://www.renaultgroup.com/finance/information-reglementee/

Change to the Readme

in your read me I think you should change

with open('sample_ixbrl.html') as a:
  x = IXBRL(a)

to

with open(filename, encoding="utf8") as a:

This will make sure the file opens in a standard format if its not stated in the file.

Unconverted Data remains

Hi Dr Kane,

Ive been continuing testing different runs through and found this error.
Seems to be related to when the program tries to pull the date out of a specific kind of account: "Unaudited Financial Statements".

Its raised here:
core.py: "enddate": s.find(['xbrli:endDate', 'endDate']).text if s.find(['xbrli:endDate', 'endDate']) else None

Then to:
context.py: startdate, "%Y-%m-%d").date() if startdate else None

Ending in:
_strptime.py: tt, fraction, gmtoff_fraction = _strptime(data_string, format)
if len(data_string) != found.end():
361 raise ValueError("unconverted data remains: %s" %
--> 362 data_string[found.end():])
Unconverted-Data.zip

Hope you've been well.

Add continuation element

As described in the docs and the spec

Continuation happens outside of the original nonNumeric element, so would need to be done before the value is formatted.

Example:

<ix:nonNumeric contextRef="C" continuedAt="C_BP_0" name="accrep:AccountantsReportOnFinancialStatements">
    <span class="fn1">This report is made solely to the </span>
</ix:nonNumeric>
<ix:continuation continuedAt="C_BP_1" id="C_BP_0">
    <span class="fn1">b</span>
</ix:continuation>
<ix:continuation continuedAt="C_BP_2" id="C_BP_1">
    <span class="fn1">oard of </span>
</ix:continuation>
<ix:continuation continuedAt="C_BP_3" id="C_BP_2">
    <span class="fn1">d</span>
</ix:continuation>

output would look like "This report is made solely to the board of d"

NotImplementedError: Format "numwordsen" not implemented (namespace "ixt-sec")

Windows, Anaconda Python 3.8.5.

Code to reproduce:

import io
import requests
from ixbrlparse import IXBRL

url = 'https://www.sec.gov/Archives/edgar/data/72333/000007233320000195/jwn-20200801.htm'
r = requests.get(url)
xbrl = IXBRL(io.StringIO(r.text))

Stack trace:

Traceback (most recent call last):
  File "<input>", line 7, in <module>
  File "C:\Users\AMarks\Anaconda3\envs\datascience\lib\site-packages\ixbrlparse\core.py", line 16, in __init__
    self._get_numeric()
  File "C:\Users\AMarks\Anaconda3\envs\datascience\lib\site-packages\ixbrlparse\core.py", line 97, in _get_numeric
    ixbrlNumeric(element)
  File "C:\Users\AMarks\Anaconda3\envs\datascience\lib\site-packages\ixbrlparse\components\numeric.py", line 34, in __init__
    self.format = get_format(format_['format_'])(**format_)
  File "C:\Users\AMarks\Anaconda3\envs\datascience\lib\site-packages\ixbrlparse\components\transform.py", line 91, in get_format
    raise NotImplementedError(
NotImplementedError: Format "numwordsen" not implemented (namespace "ixt-sec")

Issue with some dates not parsing properly

Hi,

Had an issue with dates not parsing properly on some accounts, the accounts in questions had dates in formats such as '31/03/2023'. The error would look something like "time data '31/12/2022' does not match format '%d.%m.%y'".

All the error seemed to revolve around the format "ixt2:datedaymonthyear". An example of accounts that throw errors would be: THOS.S.PENNY,LIMITED Company number 00093876

I did a temporary fix in the code under the 'formats.py' file and amended the 'ixtDateFormat' Class's function 'parse_value' to be the below:

def parse_value(self, value: Union[str, int, float]) -> Optional[datetime.date]:
    if isinstance(value, str):
        value = value.lower()
        # remove ordinal suffixes with regex
        value = DATE_ORDINAL_SUFFIX_REGEX.sub(r"\1", value)
        date_formats = self._get_date_formats()
        error: Optional[Exception] = None
        for date_format in date_formats:
            try:
                return datetime.datetime.strptime(value, date_format).astimezone().date()
            except ValueError as e:
                error = e
                continue
        # if we get here, we couldn't parse the date. Raise the last error
        if error:  # pragma: no cover
            try:
                return datetime.datetime.strptime(value, "%d/%m/%Y").astimezone().date()
            except ValueError as e:
                error = e
                print(value)
            raise error
    msg = f"Could not parse value {value} as a date"
    warnings.warn(msg, stacklevel=2)
    return None

Basically just added an additional check if all other checks fail to manually try parse the date value as "%d/%m/%Y" if other formats fail before raising an error.

AttributeError: 'NoneType' object has no attribute 'get' - Solution

Hi David,

Found an error on line 49 of core.py. "AttributeError: 'NoneType' object has no attribute 'get' ".
The below code does return the correct schema I just haven't figured out how to integrate it into Core.py

Hope your doing well considering all the Corona virus stuff,

with open(filename, "r") as f:
soup = BeautifulSoup(f, 'html.parser')
resources = soup.find(['ix:references', 'references'])
#print(resources)
for s in resources.find_all(['link:schemaRef', 'schemaRef', 'schemaref', 'link:schemaref']):
x = s.get('xlink:href')
print(x)
ErrorFolder.zip

Parser fails on incorrect-formatted date objects

Hello,
I have some documents that contain dates that are outside of the range allowed by datetime.datetime.strprtime, e..g

<xbrli:period>
    <xbrli:startDate>0001-01-01</xbrli:startDate>
    <xbrli:endDate>0001-01-01</xbrli:endDate>
</xbrli:period>

This fails when parsing with the error ValueError: year(0) out of range. The rest of my document seems valid, so would it be possible to emit this as a warning when raise_on_error=False?

I have also seen another unrelated date issue when parsing other date strings

[09/26/23 14:23:00] WARNING  ****/site-packages/ixbrlparse/core.py:160: UserWarning: Format ixt:dateslasheu not implemented - value  warnings.py:109
                             '04/09/2023' not parsed                                                                                                                                                                             
                               ixbrlNonNumeric(   

Which suggests to me that the format isn't being inferred here. I'm not too familiar with iXBRL, but is there a way to specify the format for date strings, so that these can be correctly parsed?

Handle accounts passed as a string

In my use of this helpful library I have found it useful to pass strings directly to IXBRL instead of a file handler. In my case this was for handling files loaded using requests from a URL. I wonder if this is something which could be useful more widely?

My quick way of achieving this was to modify the init function of the IXBRL as below however this may well not be the best way for general usage.

def __init__(self, f=None, content=None): if f: self.soup = BeautifulSoup(f.read(), "xml") elif content: self.soup = BeautifulSoup(content, "xml") else: raise AttributeError self._get_schema() self._get_contexts() self._get_units() self._get_nonnumeric() self._get_numeric()

NotImplementedError: Format "num-dot-decimal" not implemented (namespace "ixt")

Hi,

this is really a great package. I tried to parse some iXBLR myself, but seeing all your work it becomes clear that this is quite some work. So thanks for the package!

I have an issue. I try to parse some iXBLR annual report from "Gleif": https://www.gleif.org/content/1-about/10-governance/11-annual-report/gleif-annual-report-2019.zip

When I try to parse it, I get the error below. I tried to dig into the source files, but I could not locate the actual problem or find out how to solve the error.

Would be glad to hear back from you, Cheers!

File "...\python\gleif_try.py", line 8, in
x = IXBRL(a)
File "C:\Python3\lib\site-packages\ixbrlparse\core.py", line 16, in init
self.get_numeric()
File "C:\Python3\lib\site-packages\ixbrlparse\core.py", line 97, in get_numeric
ixbrlNumeric(element)
File "C:\Python3\lib\site-packages\ixbrlparse\components\numeric.py", line 34, in init
self.format = get_format(format
['format
'])(**format_)
File "C:\Python3\lib\site-packages\ixbrlparse\components\transform.py", line 92, in get_format
namespace,
NotImplementedError: Format "num-dot-decimal" not implemented (namespace "ixt")

'NoneType' object has no attribute 'attrs'

Hi there Ive been trying to fix the issue. When i ran some tests i came across this error:
AttributeError: 'NoneType' object has no attribute 'attrs'

in the following section:
ixbrl-parse-master\ixbrlparse\core.py
22(line) for k in self.soup.find('html').attrs:
24(line) if k.startswith("xmlns") or ":" in k:
25(line) self.namespaces[k] = self.soup.find('html')[k].split(" ")

Tried using a try-except to avoid NoneType exception:
but that hasnt worked.

Ill let you know if i find a fix.

Create package on pypi

For those not that advanced in python, a 2 liners explaining how to use the setup.py file would save a couple of wasted hours fumbling through the docs how to do that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.