GithubHelp home page GithubHelp logo

tiesdekok / fast_xbrl_parser Goto Github PK

View Code? Open in Web Editor NEW
34.0 34.0 11.0 9.53 MB

An XBRL parser built in Rust that provides a fast, easy, and lightweight way to convert XBRL XML files into JSON or CSV.

Rust 85.22% Python 14.78%
csv json rust xbrl xml

fast_xbrl_parser's People

Contributors

tiesdekok avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

fast_xbrl_parser's Issues

[BUG] Example cannot be reproduced: "dimensions" output is missing data

Let's take the link given in the official example:

url = "https://www.sec.gov/Archives/edgar/data/1326380/000132638021000129/gme-20211030_htm.xml"
xbrl_dict = fxp.parse(
    url,
    output=["json", "facts", "dimensions",],
    email="...",  ### Adjust this to reflect your email address. This is required by the SEC Edgar system when passing a URL.
)

print([
    fact
    for fact in xbrl_dict["json"]
    if fact["context_ref"] == "i12efa46789384fd39ac941da8b7b0f1a_D20200202-20201031"
])

Output:

[{'id': 'id3VybDovL2RvY3MudjEvZG9jOjUzYmQ2ZjI2MzlhMDQ4NjFiYWIwZTVhM2Q2OWFhZjUzL3NlYzo1M2JkNmYyNjM5YTA0ODYxYmFiMGU1YTNkNjlhYWY1M18xMTI0L2ZyYWc6NmJiZGRiOTdhMTExNDdlMmI5ZTc3NDk0N2Y3NGRjOWYvdGV4dHJlZ2lvbjo2YmJkZGI5N2ExMTE0N2UyYjllNzc0OTQ3Zjc0ZGM5Zl8yNDAz_bae81877-b251-4a3e-a936-b94169023ae8',
  'prefix': 'us-gaap',
  'name': 'DisposalGroupIncludingDiscontinuedOperationGeneralAndAdministrativeExpense',
  'value': '1200000',
  'decimals': '-5',
  'context_ref': 'i12efa46789384fd39ac941da8b7b0f1a_D20200202-20201031',
  'unit_ref': 'usd',
  'dimensions': [{'key_ns': 'us-gaap',
    'key_value': 'DisposalGroupClassificationAxis',
    'member_ns': 'us-gaap',
    'member_value': 'DiscontinuedOperationsDisposedOfBySaleMember'},
   {'key_ns': 'us-gaap',
    'key_value': 'IncomeStatementBalanceSheetAndAdditionalDisclosuresByDisposalGroupsIncludingDiscontinuedOperationsAxis',
    'member_ns': 'gme',
    'member_value': 'SpringMobileMember'}],
  'units': [{'unit_type': 'unit', 'unit_value': 'iso4217:USD'}],
  'periods': [{'period_type': 'startDate', 'period_value': '2020-02-02'},
   {'period_type': 'endDate', 'period_value': '2020-10-31'}]},
 {'id': 'id3VybDovL2RvY3MudjEvZG9jOjUzYmQ2ZjI2MzlhMDQ4NjFiYWIwZTVhM2Q2OWFhZjUzL3NlYzo1M2JkNmYyNjM5YTA0ODYxYmFiMGU1YTNkNjlhYWY1M18xMTI0L2ZyYWc6NmJiZGRiOTdhMTExNDdlMmI5ZTc3NDk0N2Y3NGRjOWYvdGV4dHJlZ2lvbjo2YmJkZGI5N2ExMTE0N2UyYjllNzc0OTQ3Zjc0ZGM5Zl8yNDEw_29acdbfa-37af-4a68-93de-33d588e14dc2',
  'prefix': 'us-gaap',
  'name': 'DiscontinuedOperationTaxEffectOfDiscontinuedOperation',
  'value': '-300000',
  'decimals': '-5',
  'context_ref': 'i12efa46789384fd39ac941da8b7b0f1a_D20200202-20201031',
  'unit_ref': 'usd',
  'dimensions': [{'key_ns': 'us-gaap',
    'key_value': 'DisposalGroupClassificationAxis',
    'member_ns': 'us-gaap',
    'member_value': 'DiscontinuedOperationsDisposedOfBySaleMember'},
   {'key_ns': 'us-gaap',
    'key_value': 'IncomeStatementBalanceSheetAndAdditionalDisclosuresByDisposalGroupsIncludingDiscontinuedOperationsAxis',
    'member_ns': 'gme',
    'member_value': 'SpringMobileMember'}],
  'units': [{'unit_type': 'unit', 'unit_value': 'iso4217:USD'}],
  'periods': [{'period_type': 'startDate', 'period_value': '2020-02-02'},
   {'period_type': 'endDate', 'period_value': '2020-10-31'}]}]

Actual data

dimensions_df = pd.DataFrame(xbrl_dict["dimensions"])
print(dimensions_df[
    dimensions_df.context_ref == "i12efa46789384fd39ac941da8b7b0f1a_D20200202-20201031"
])

Output:

           cik    accession_number          xml_name  \
54  0001326380  000132638021000129  gme-20211030_htm   

                                          context_ref axis_prefix  \
54  i12efa46789384fd39ac941da8b7b0f1a_D20200202-20...     us-gaap   

                           axis_tag member_prefix  \
54  DisposalGroupClassificationAxis       us-gaap   

                                      member_tag  
54  DiscontinuedOperationsDisposedOfBySaleMember  

Issue 1: missing rows

IncomeStatementBalanceSheetAndAdditionalDisclosuresByDisposalGroupsIncludingDiscontinuedOperationsAxis row is entirely missing

Issue 2: missing columns

'member_ns': 'us-gaap', 'member_value': 'DiscontinuedOperationsDisposedOfBySaleMember'
are entirely missing.

In documentation example it's using different member_prefix, member_tag column names. Mismatched column names may be the culprit.

Can not install from pypi

pip install fast-xbrl-parser
ERROR: Could not find a version that satisfies the requirement fast-xbrl-parser (from versions: none)
ERROR: No matching distribution found for fast-xbrl-parser

same issue if i use fast-xbrl-praser==0.3.0

Standardized Financials

Hi @TiesdeKok it is great to see you are still building awesome projects! Would you perhaps know the best way to go about creating standardized financials using SEC data. I think I could probably use your parser, but then I assume I need some mapping file to help me organize and standardize accross companies to get it into an ML-ready format. Would love to hear your recommendations.

Cheers,
Derek

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.