tooksoi / scraxbrl Goto Github PK
View Code? Open in Web Editor NEWSEC Edgar Scraper and XBRL Parser/Renderer
License: MIT License
SEC Edgar Scraper and XBRL Parser/Renderer
License: MIT License
First of all thanks for sharing the library. It works perfectly.
However, I can't really figure out how it is actually working!
In the settings script, a few paths to the xbrl taxonomy is defined (line 41-49). However, it is not clear to me that these paths are actually being used for anything. What are these paths for?
it is scraping data perfectly file but when it calls XMLExtract to extract file, its not working and function self.build_ins().. for some file it goes beyond that function but again error at self.extract_all_pre(). after that it fails. controller wont go to next next function which is self.extract_all_calc()
how do I download this without a setup.py?
When running pip install -r requirements.txt, the following shows up in the log -
URLs to search for versions for Twisted-Web==13.2.0 (from -r requirements.txt (line 4)):
Affects:
DataView('AC','2015-09-30','10-Q')
DataView('AIW','2011-06-30','10-Q')
DataView('AI','2012-09-30','10-Q')
DataView('APLE','2014-03-31','10-Q')
DataView('ARE','2012-03-31','10-Q')
DataView('ARL','2016-06-30','10-Q')
DataView('AT','2011-06-30','10-Q')
DataView('A','2016-07-31','10-Q')
[...]
Traceback (most recent call last):
DataView('AC','2015-09-30','10-Q')
File "DataViewer.py", line 13, in init
self.load_data()
File "DataViewer.py", line 20, in load_data
fpath_file = os.listdir(fpath_no_p)[0]
IndexError: list index out of range
Most likely the pickle file was not created because the xml file(s) might not have been parsed properly (or they may have errors)?
Affects namely AAN's 10-Q 2014-06-30; as such:
aan.traverse_tree('ProgressiveAcquisitionIntangibleAssetsAcquiredDetails')
IndefiniteLivedIntangibleAssetsAcquiredAsPartOfBusinessCombinationTable
IndefiniteLivedIntangibleAssetsByMajorClassAxis
IndefiniteLivedIntangibleAssetsMajorClassNameDomain
AcquiredIndefiniteLivedIntangibleAssetsLineItems
IndefinitelivedIntangibleAssetsAcquired
(u'2014-04-13', u'2014-04-14')
53000000.0
FiniteLivedIntangibleAssetsAcquiredAsPartOfBusinessCombinationTable
BusinessAcquisitionAcquireeDomain
ProgressiveFinanceHoldingsLLCMember
2014-04-14 (u'2015-01-01', u'2015-01-31') (u'2014-07-01', u'2014-07-31') (u'2014-04-15', u'2014-06-30') (u'2014-04-13', u'2014-04-14') (u'2014-01-01', u'2014-06-30') (u'2013-01-01', u'2013-06-30')
138198000.0 3600000.0 22300000.0 323000.0 333000000.0 2300000.0 1325928000.0
FiniteLivedIntangibleAssetsByMajorClassAxis
FiniteLivedIntangibleAssetsMajorClassNameDomain
AcquiredFiniteLivedIntangibleAssetsLineItems
FinitelivedIntangibleAssetsAcquired1
(u'2014-04-13', u'2014-04-14')
14000000.0
IntangibleAssetsAcquired
(u'2014-04-13', u'2014-04-14')
333000000.0
AcquiredFiniteLivedIntangibleAssetsWeightedAverageUsefulLife
(u'2014-04-13', u'2014-04-14')
P9Y
Notice how BusinessAcquisitionAcquireeDomain is idented with 6 spaces while 3 should be used.
>>> from DataViewer import *
>>> a=DataView('A','2013-04-30','10-Q')
>>> a.traverse_tree('StatementOfFinancialPositionClassified')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "DataViewer.py", line 72, in traverse_tree
base = self.data[cat]['roles'][name]
KeyError: 'StatementOfFinancialPositionClassified'
>>> a.data.keys()
['ins', 'cal', 'lab', 'pre', 'error', 'no_lineage']
>>> a.data['pre'].keys()
['xbrl_titles', 'roles']
>>> a.data['pre']['roles'].keys()
['AcquisitionOfDako', 'AcquisitionOfDakoDetailedTextualsDetails', 'AcquisitionOfDakoIntangiblesDetails', 'AcquisitionOfDakoProformaConsolidatedOperatingResultsDetails', 'AcquisitionOfDakoPurchasePriceAllocationDetails', 'AcquisitionOfDakoTables', 'CondensedConsolidatedBalanceSheetUnaudited', 'CondensedConsolidatedBalanceSheetUnauditedParenthetical', 'CondensedConsolidatedStatementOfCashFlowsUnaudited', 'CondensedConsolidatedStatementOfComprehensiveIncomeUnaudited', 'CondensedConsolidatedStatementOfComprehensiveIncomeUnauditedParenthetical', 'CondensedConsolidatedStatementOfOperationsUnaudited', 'Derivatives', 'DerivativesDetails', 'DerivativesDisclosuresAndDerivativeInstrumentAggregatedNotionalAmountsByCurrencyAndDesignationsDetails', 'DerivativesEffectOfDerivativeInstrumentsOnConsolidatedStatementOfOperationsDetails', 'DerivativesFairValueOfDerivativeInstrumentsAndConsolidatedBalanceSheetLocationDetails', 'DerivativesTables', 'DocumentAndEntityInformation', 'FairValueMeasurements', 'FairValueMeasurementsFairValueMeasuresAndImpairmentOfLongLivedAssetsDetails', 'FairValueMeasurementsFairValueOfAssetsAndLiabilitiesMeasuredOnRecurringBasisDetails', 'FairValueMeasurementsTables', 'GoodwillAndOtherIntangibleAssets', 'GoodwillAndOtherIntangibleAssetsDisclosuresAndComponentsOfPurchasedOtherIntangiblesDetails', 'GoodwillAndOtherIntangibleAssetsFiniteLivedAssetsFutureAmortizationExpenseDetails', 'GoodwillAndOtherIntangibleAssetsGoodwillAndOtherIntangibleAssetsTextualsDetails', 'GoodwillAndOtherIntangibleAssetsGoodwillRollForwardDetails', 'GoodwillAndOtherIntangibleAssetsTables', 'IncomeTaxes', 'IncomeTaxesDetails', 'Inventory', 'InventoryDetails', 'InventoryTables', 'LongTermDebt', 'LongTermDebtDetails', 'LongTermDebtLongTermDebtOtherDebtDetails', 'LongTermDebtTables', 'NetIncomePerShare', 'NetIncomePerShareDetails', 'NetIncomePerShareTables', 'NewAccountingPronouncements', 'OverviewBasisOfPresentationAndSummaryOfSignificantAccountingPolicies', 'OverviewBasisOfPresentationAndSummaryOfSignificantAccountingPoliciesDetails', 'OverviewBasisOfPresentationAndSummaryOfSignificantAccountingPoliciesPolicies', 'Restructuring', 'RestructuringDetails', 'RestructuringIncomeStatementLocationDetails', 'RestructuringTables', 'RetirementPlansAndPostRetirementPensionPlans', 'RetirementPlansAndPostRetirementPensionPlansDetails', 'RetirementPlansAndPostRetirementPensionPlansDetailsTextual', 'RetirementPlansAndPostRetirementPensionPlansTables', 'SegmentInformation', 'SegmentInformationProfitabilityAndSegmentAssetsDetails', 'SegmentInformationReconciliationOfReportableResultsDetails', 'SegmentInformationTables', 'ShareBasedCompensation', 'ShareBasedCompensationAllocatedShareBasedCompensationExpenseDetails', 'ShareBasedCompensationFairValueAssumptionsDetails', 'ShareBasedCompensationTables', 'ShortTermDebt', 'ShortTermDebtCreditFacilityDetails', 'ShortTermDebtSeniorNotesDetails', 'StockholdersEquity', 'StockholdersEquityStockRepurchaseProgramDetails', 'StockholdersEquityStockholdersEquityDividendsDetails', 'WarrantiesAndContingencies', 'WarrantiesAndContingenciesDetails', 'WarrantiesAndContingenciesTables']
>>> a.data['pre']['roles']['AcquisitionOfDako']
OrderedDict([('title_name', None), ('tree', OrderedDict([('BusinessCombinationsAbstract', OrderedDict([('pfx', 'us-gaap'), ('sub', OrderedDict([('BusinessCombinationDisclosureTextBlock', OrderedDict([('pfx', 'us-gaap'), ('sub', OrderedDict()), ('order', 1), ('val', OrderedDict()), ('label', 'Acquisition of Dako')]))])), ('label', u'Business Combinations [Abstract]')]))])), ('from_to', [('BusinessCombinationsAbstract', 'BusinessCombinationDisclosureTextBlock', '1', 'terseLabel')]), ('root', [('us-gaap', 'BusinessCombinationsAbstract', '1', 'terseLabel')]), ('unique', [('us-gaap', 'BusinessCombinationDisclosureTextBlock', '1', 'terseLabel'), ('us-gaap', 'BusinessCombinationsAbstract', '1', 'terseLabel')])])
>>> a.traverse_tree['AcquisitionOfDako']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'instancemethod' object has no attribute '__getitem__'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.