GithubHelp home page GithubHelp logo

scraxbrl's People

Contributors

iuvoz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scraxbrl's Issues

xbrl taxonomy

First of all thanks for sharing the library. It works perfectly.
However, I can't really figure out how it is actually working!

In the settings script, a few paths to the xbrl taxonomy is defined (line 41-49). However, it is not clear to me that these paths are actually being used for anything. What are these paths for?

this is not working problem in extract XML data method

it is scraping data perfectly file but when it calls XMLExtract to extract file, its not working and function self.build_ins().. for some file it goes beyond that function but again error at self.extract_all_pre(). after that it fails. controller wont go to next next function which is self.extract_all_calc()

Cannot get requirements

When running pip install -r requirements.txt, the following shows up in the log -

URLs to search for versions for Twisted-Web==13.2.0 (from -r requirements.txt (line 4)):

IndexError: list index out of range

Affects:
DataView('AC','2015-09-30','10-Q')
DataView('AIW','2011-06-30','10-Q')
DataView('AI','2012-09-30','10-Q')
DataView('APLE','2014-03-31','10-Q')
DataView('ARE','2012-03-31','10-Q')
DataView('ARL','2016-06-30','10-Q')
DataView('AT','2011-06-30','10-Q')
DataView('A','2016-07-31','10-Q')
[...]

Traceback (most recent call last):
DataView('AC','2015-09-30','10-Q')
File "DataViewer.py", line 13, in init
self.load_data()
File "DataViewer.py", line 20, in load_data
fpath_file = os.listdir(fpath_no_p)[0]
IndexError: list index out of range

Most likely the pickle file was not created because the xml file(s) might not have been parsed properly (or they may have errors)?

Formatting issue

Affects namely AAN's 10-Q 2014-06-30; as such:

aan.traverse_tree('ProgressiveAcquisitionIntangibleAssetsAcquiredDetails')
   IndefiniteLivedIntangibleAssetsAcquiredAsPartOfBusinessCombinationTable
      IndefiniteLivedIntangibleAssetsByMajorClassAxis
         IndefiniteLivedIntangibleAssetsMajorClassNameDomain
      AcquiredIndefiniteLivedIntangibleAssetsLineItems
         IndefinitelivedIntangibleAssetsAcquired
                (u'2014-04-13', u'2014-04-14')
                53000000.0
   FiniteLivedIntangibleAssetsAcquiredAsPartOfBusinessCombinationTable
         BusinessAcquisitionAcquireeDomain
            ProgressiveFinanceHoldingsLLCMember
                2014-04-14              (u'2015-01-01', u'2015-01-31')          (u'2014-07-01', u'2014-07-31')          (u'2014-04-15', u'2014-06-30')              (u'2014-04-13', u'2014-04-14')          (u'2014-01-01', u'2014-06-30')          (u'2013-01-01', u'2013-06-30')
                138198000.0             3600000.0               22300000.0              323000.0                333000000.0             2300000.0  1325928000.0
      FiniteLivedIntangibleAssetsByMajorClassAxis
         FiniteLivedIntangibleAssetsMajorClassNameDomain
      AcquiredFiniteLivedIntangibleAssetsLineItems
         FinitelivedIntangibleAssetsAcquired1
                (u'2014-04-13', u'2014-04-14')
                14000000.0
         IntangibleAssetsAcquired
                (u'2014-04-13', u'2014-04-14')
                333000000.0
         AcquiredFiniteLivedIntangibleAssetsWeightedAverageUsefulLife
                (u'2014-04-13', u'2014-04-14')
                P9Y

Notice how BusinessAcquisitionAcquireeDomain is idented with 6 spaces while 3 should be used.

KeyError

>>> from DataViewer import *
>>> a=DataView('A','2013-04-30','10-Q')
>>> a.traverse_tree('StatementOfFinancialPositionClassified')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "DataViewer.py", line 72, in traverse_tree
    base = self.data[cat]['roles'][name]
KeyError: 'StatementOfFinancialPositionClassified'
>>> a.data.keys()
['ins', 'cal', 'lab', 'pre', 'error', 'no_lineage']
>>> a.data['pre'].keys()
['xbrl_titles', 'roles']
>>> a.data['pre']['roles'].keys()
['AcquisitionOfDako', 'AcquisitionOfDakoDetailedTextualsDetails', 'AcquisitionOfDakoIntangiblesDetails', 'AcquisitionOfDakoProformaConsolidatedOperatingResultsDetails', 'AcquisitionOfDakoPurchasePriceAllocationDetails', 'AcquisitionOfDakoTables', 'CondensedConsolidatedBalanceSheetUnaudited', 'CondensedConsolidatedBalanceSheetUnauditedParenthetical', 'CondensedConsolidatedStatementOfCashFlowsUnaudited', 'CondensedConsolidatedStatementOfComprehensiveIncomeUnaudited', 'CondensedConsolidatedStatementOfComprehensiveIncomeUnauditedParenthetical', 'CondensedConsolidatedStatementOfOperationsUnaudited', 'Derivatives', 'DerivativesDetails', 'DerivativesDisclosuresAndDerivativeInstrumentAggregatedNotionalAmountsByCurrencyAndDesignationsDetails', 'DerivativesEffectOfDerivativeInstrumentsOnConsolidatedStatementOfOperationsDetails', 'DerivativesFairValueOfDerivativeInstrumentsAndConsolidatedBalanceSheetLocationDetails', 'DerivativesTables', 'DocumentAndEntityInformation', 'FairValueMeasurements', 'FairValueMeasurementsFairValueMeasuresAndImpairmentOfLongLivedAssetsDetails', 'FairValueMeasurementsFairValueOfAssetsAndLiabilitiesMeasuredOnRecurringBasisDetails', 'FairValueMeasurementsTables', 'GoodwillAndOtherIntangibleAssets', 'GoodwillAndOtherIntangibleAssetsDisclosuresAndComponentsOfPurchasedOtherIntangiblesDetails', 'GoodwillAndOtherIntangibleAssetsFiniteLivedAssetsFutureAmortizationExpenseDetails', 'GoodwillAndOtherIntangibleAssetsGoodwillAndOtherIntangibleAssetsTextualsDetails', 'GoodwillAndOtherIntangibleAssetsGoodwillRollForwardDetails', 'GoodwillAndOtherIntangibleAssetsTables', 'IncomeTaxes', 'IncomeTaxesDetails', 'Inventory', 'InventoryDetails', 'InventoryTables', 'LongTermDebt', 'LongTermDebtDetails', 'LongTermDebtLongTermDebtOtherDebtDetails', 'LongTermDebtTables', 'NetIncomePerShare', 'NetIncomePerShareDetails', 'NetIncomePerShareTables', 'NewAccountingPronouncements', 'OverviewBasisOfPresentationAndSummaryOfSignificantAccountingPolicies', 'OverviewBasisOfPresentationAndSummaryOfSignificantAccountingPoliciesDetails', 'OverviewBasisOfPresentationAndSummaryOfSignificantAccountingPoliciesPolicies', 'Restructuring', 'RestructuringDetails', 'RestructuringIncomeStatementLocationDetails', 'RestructuringTables', 'RetirementPlansAndPostRetirementPensionPlans', 'RetirementPlansAndPostRetirementPensionPlansDetails', 'RetirementPlansAndPostRetirementPensionPlansDetailsTextual', 'RetirementPlansAndPostRetirementPensionPlansTables', 'SegmentInformation', 'SegmentInformationProfitabilityAndSegmentAssetsDetails', 'SegmentInformationReconciliationOfReportableResultsDetails', 'SegmentInformationTables', 'ShareBasedCompensation', 'ShareBasedCompensationAllocatedShareBasedCompensationExpenseDetails', 'ShareBasedCompensationFairValueAssumptionsDetails', 'ShareBasedCompensationTables', 'ShortTermDebt', 'ShortTermDebtCreditFacilityDetails', 'ShortTermDebtSeniorNotesDetails', 'StockholdersEquity', 'StockholdersEquityStockRepurchaseProgramDetails', 'StockholdersEquityStockholdersEquityDividendsDetails', 'WarrantiesAndContingencies', 'WarrantiesAndContingenciesDetails', 'WarrantiesAndContingenciesTables']
>>> a.data['pre']['roles']['AcquisitionOfDako']
OrderedDict([('title_name', None), ('tree', OrderedDict([('BusinessCombinationsAbstract', OrderedDict([('pfx', 'us-gaap'), ('sub', OrderedDict([('BusinessCombinationDisclosureTextBlock', OrderedDict([('pfx', 'us-gaap'), ('sub', OrderedDict()), ('order', 1), ('val', OrderedDict()), ('label', 'Acquisition of Dako')]))])), ('label', u'Business Combinations [Abstract]')]))])), ('from_to', [('BusinessCombinationsAbstract', 'BusinessCombinationDisclosureTextBlock', '1', 'terseLabel')]), ('root', [('us-gaap', 'BusinessCombinationsAbstract', '1', 'terseLabel')]), ('unique', [('us-gaap', 'BusinessCombinationDisclosureTextBlock', '1', 'terseLabel'), ('us-gaap', 'BusinessCombinationsAbstract', '1', 'terseLabel')])])
>>> a.traverse_tree['AcquisitionOfDako']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'instancemethod' object has no attribute '__getitem__'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.