datasets / s-and-p-500-companies Goto Github PK

List of companies in the S&P 500 together with associated financials

Home Page: https://datahub.io/core/s-and-p-500-companies

Python 36.82% Makefile 63.18%

s-and-p-500-companies's Introduction

S&P 500 Companies Dataset

List of companies in the S&P 500 (Standard and Poor's 500). The S&P 500 is a free-float, capitalization-weighted index of the top 500 publicly listed stocks in the US (top 500 by market cap). The dataset includes a list of all the stocks contained therein.

Data

Information on S&P 500 index used to be available on the official webpage on the Standard and Poor's website but until they publish it back, Wikipedia's [SP500 list of companies][sp-list] is the best up-to-date and open data source.

Sources

Detailed information on the S&P 500 (primarily in XLS format) used to be obtained from its official webpage on the Standard and Poor's website - it was free but registration was required.

Note For aggregate information on the S&P (dividends, earnings, etc.) see Standard and Poor's 500 Dataset.

General Financial Notes

Publicly listed US companies are obliged various reports on a regular basis with the SEC. Of these 2 types are of especial interest to investors and others interested in their finances and business. These are:

10-K = Annual Report
10-Q = Quarterly report

Development

The pipeline relies on Python, so you'll need to have it installed on your machine. Then:

Create a virtual environment in a directory using Python's venv module: python3 -m venv .env
Activate the virtual environment: source .env/bin/activate
Install the dependencies: pip install -r scripts/requirements.txt
Run the scripts: python scripts/scrape.py

Alternatively, you can use the provided Makefile to run the scraping with a simple make. It'll create a virtual environment, install the dependencies and run the script.

License

All data is licensed under the Open Data Commons Public Domain Dedication and License. All code is licensed under the MIT/BSD license.

Note that while no credit is formally required a link back or credit to Rufus Pollock and the Open Knowledge Foundation is much appreciated.

s-and-p-500-companies's People

Contributors

Stargazers

Watchers

Forkers

web5design nashvillebrigade bcutrell cowboyee edwinacu johndavidback alexchinco openwonk mailashish89 zlfccnu bwlv wgee yohannan-reyes wenomeno joehand sm20030 lijuncheng16 greedo rafasashi charlesmzhu yattahri biddyweb hjl2014 dalamar66 farmerjohnor lexman nborggren greenfieldhq polanict winfieldtian jonathanitakpe chreko scripnichenko pwhitey86 ricklentz mcalistj datamadman perfettiful lomaxap nightduck j03m rlugojr 10sun stephaneagily anukat2015 datasets-update-bot papajo kirankbee iamsingularity blitzj bturtel pgupta04 godaccess alexei-jiltsov rodolfohernandez bkungfoo chinawanghao jeffreytuu akesireddy aaronpearlman tabro multidis kuonanhong nickbytes shaneshifflett chonaz pearlmiumiu mavisk ttekin infobiac jesse-peters franciscof93 cullenmrc lung9198 libardo1 jaspreet1121 thaining koistya danielmckenzie amercader shrikantdange1208 tforsberg bach21 kgeraghty aemilcar dshen1 rohitchandran1-zz rahultelgote1989 noahg vaquarkhan chenhuims rpetrenko pchavanne jamie-chen masta-g3 patcracker aadel112 nerrickt kleung1220 l1nt

s-and-p-500-companies's Issues

The CSV structure is broken – varying number of fields

The included CSV file contains variable number of fields (should be 15, has 14 in some cases) which results in URL being put in a number field.

Recommendation: add data quality checks in the scraping process.

Not financials

This project is advertised as including "associated financials" in the GitHub description and the name at https://datahub.io/core/s-and-p-500-companies#resource-s-and-p-500-companies_zip.

The included data actually is company name, ticker and sector and includes nothing of a quantitative nature.

To reduce confusion and avoid overselling, I recommend to update the two introductions above to remove mention of financials in this data set.

505 Companies

This listing for the S&P500 has 505 companies in it. There should probably be some kind of invariant imposed so updates are rejected if they include something other than 500 companies.

DTV replaced by SIG

DirecTV was bought by AT&T and replaced by SIG in the index
http://www.streetinsider.com/Index+Changes/Signet+Jewelers+(SIG)+to+Replace+DirectTV+(DTV)+in+S%26P+500,+Other+Changes+(PRXL)+(ENTA)+(CTLT)/10751889.html

WLP became ANTM

Wellpoing became Anthem so the list of constituents needs to be updated.

Not pulling latest info from wikipedia

Hey guys,

There seems to be a issue with removing and adding S&P 500 holdings. For example I still see EMC in the S&P 500 according to the JSON API I used.

http://data.okfn.org/data/core/s-and-p-500-companies/r/constituents-financials.json

Thank you,

John

Date of data

Hello!

This is a great tool! Is it somehow possible to acquire the issuance data of the facts that are provided in the data csv? Since it's a bit losing value if I don't know when the information was filed.

Thanks!

Issue with Makefile

I am trying to fathom out why Makefile fails. It appears that due to its age, it needs a number of amendments.

Issues seem to be:

The file locations are incorrect as one doesn't exist: eg: ../data/constituents-financials.csv - if removed the file will at least create the 2 output files for constituents(provided you change the tmp directory address in constituents.py at lines 12, 13 and 16 to "../scripts/tmp" from just scripts/tmp. ( This also permits constituents.py to be run from the scripts directory in terminal)
The test_data.py file relies on goodtables which has since been deprecated. I have tried to get my head around this file and replace goodtables using the 'frictionlessdata framework' package(which I believe replaced goodtables), but I am afraid neither my python or computer science ability are yet up to that challenge.
In summary my suggestion would be to rewrite test_data.py using frictionlessdata framework package for validation and then rewrite the Make file referencing the amended data locations and finally make amendments at lines 12, 13 and 16 in constiturents.py to correctly reference the tmp directory. I realise this is a big ask as it is way above my league so I understand the workload.

My attempt at updating Makefile is below. It correctly creates the 2 output files in the correct directory then errors at the goodtables imports in test_data.py:

MAKEFILE:

__all: pushed.txt

../data:
mkdir ../data

../data List_of_S%26P_500_companies.html: constituents.py
python constituents.py

../data/constituents.csv: ../data List_of_S%26P_500_companies.html constituents.py
python constituents.py

valid.txt: ../data/constituents.csv ../datapackage.json test_data.py
python test_data.py
echo "Datapackage is valid" > valid.txt

pushed.txt: valid.txt
git add ../data/constituents.csv ../data/constituents-financials.csv
git add ../data/constituents_symbols.txt ../data/constituents-symbols.txt
git commit -m "[data][skip ci] automatic update" || exit 0
git push publish
echo "Update has been pushed if there was a change" > pushed.txt

.PHONY: all__

The excel file no longer contains a list of constituents

Hey there -

I am not sure how the excel file at this address was structured before, but as of today, it no longer contains a list of S&P500 constituents. Right now, sheet 4 contains a list of companies that issued additional shares in Q2 of 2015.

Am I missing something?

Thanks,
Chafik

Is this list still being maintained?

I was looking at this list as well as other datasets from datahub.io and notice that alot of them seem to have stopped updating in 2020. curious if this is still being maintained?

Add Exchange

Hi,

Might be worth adding the exchange information becasue its eaiser when importing to TradingView.

NASDAQ:CMCSA
NASDAQ:NFLX
NASDAQ:VIAC
NASDAQ:TTWO
NASDAQ:EA
NASDAQ:DISCA
NASDAQ:NWSA
NASDAQ:ATVI
NYSE:IPG
NASDAQ:NWS
NYSE:VZ
NYSE:OMC
NYSE:LYV
NASDAQ:DISH
NASDAQ:GOOG
NASDAQ:DISCK
NYSE:DIS
NASDAQ:CHTR
NASDAQ:FB
NYSE:LUMN
NYSE:TWTR
NASDAQ:FOXA
NASDAQ:FOX
NASDAQ:TMUS
NASDAQ:GOOGL
NYSE:T
NYSE:TJX
NYSE:MCD
NASDAQ:AMZN
NYSE:LOW
NASDAQ:EXPE
NYSE:BBWI
NASDAQ:ULTA
NASDAQ:TSCO
NYSE:UAA
NYSE:UA
NYSE:HBI
NYSE:VFC
NYSE:NVR
NYSE:GM
NYSE:AZO
NYSE:AAP
NYSE:WHR
NASDAQ:WYNN
NYSE:YUM
NYSE:GPC
NYSE:APTV
NASDAQ:HAS
NYSE:GPS
NYSE:TGT
NYSE:RL
NASDAQ:TSLA
NYSE:TPR
NYSE:HLT
NYSE:HD
NASDAQ:EBAY
NYSE:PVH
NYSE:LVS
NASDAQ:DLTR
NASDAQ:NWL
NYSE:PHM
NASDAQ:BKNG
NYSE:BWA
NYSE:LEG
NYSE:F
NYSE:CCL
NASDAQ:SBUX
NASDAQ:GRMN
NASDAQ:LKQ
NYSE:KMX
NYSE:BBY
NYSE:LEN
NYSE:NKE
NASDAQ:MAR
NASDAQ:CZR
NYSE:DPZ
NYSE:DRI
NYSE:DHI
NYSE:DG
NASDAQ:PENN
NYSE:MHK
NYSE:MGM
NASDAQ:ETSY
NASDAQ:POOL
NASDAQ:ORLY
NYSE:CMG
NASDAQ:ROST
NYSE:RCL
NYSE:NCLH
NASDAQ:MNST
NYSE:CAG
NYSE:HSY
NYSE:PG
NYSE:MO
NYSE:KMB
NYSE:TSN
NYSE:CHD
NYSE:GIS
NYSE:BF.B
NASDAQ:KHC
NYSE:ADM
NYSE:MKC
NASDAQ:MDLZ
NYSE:HRL
NYSE:KR
NASDAQ:WBA
NYSE:K
NYSE:SJM
NYSE:CL
NYSE:EL
NYSE:CLX
NYSE:STZ
NYSE:PM
NYSE:WMT
NASDAQ:PEP
NYSE:TAP
NYSE:LW
NYSE:SYY
NYSE:CPB
NYSE:KO
NASDAQ:COST
NYSE:PSX
NYSE:HAL
NYSE:XOM
NYSE:COP
NYSE:CVX
NYSE:COG
NYSE:VLO
NYSE:PXD
NYSE:EOG
NYSE:MRO
NYSE:MPC
NYSE:SLB
NYSE:KMI
NYSE:OKE
NYSE:HES
NYSE:OXY
NYSE:DVN
NASDAQ:APA
NYSE:NOV
NASDAQ:FANG
NYSE:BKR
NYSE:WMB
NYSE:TFC
NYSE:AXP
NYSE:CB
NYSE:RF
NYSE:AMP
NYSE:USB
NYSE:WRB
NYSE:AFL
NYSE:WFC
NASDAQ:WLTW
NYSE:UNM
NASDAQ:PFG
NYSE:TRV
NYSE:ALL
NYSE:SYF
NASDAQ:NDAQ
NASDAQ:PBCT
NYSE:DFS
NYSE:AON
NYSE:SCHW
NASDAQ:CINF
NYSE:PGR
NASDAQ:CME
NYSE:RJF
NYSE:C
NYSE:CFG
NASDAQ:NTRS
NYSE:PNC
NYSE:CMA
NYSE:BRK.B
NYSE:AIG
NYSE:AJG
NYSE:MMC
NYSE:AIZ
NYSE:PRU
NASDAQ:FITB
NYSE:GL
NYSE:FRC
NYSE:LNC
NYSE:JPM
NASDAQ:HBAN
NYSE:KEY
NYSE:RE
NYSE:BEN
NYSE:MET
NYSE:STT
NYSE:L
NYSE:MCO
NYSE:MTB
NYSE:ICE
NASDAQ:ZION
NASDAQ:TROW
NYSE ARCA:CBOE
NYSE:IVZ
NYSE:MSCI
NYSE:SPGI
NYSE:COF
NYSE:HIG
NASDAQ:MKTX
NYSE:MS
NYSE:BK
NYSE:BLK
NYSE:BAC
NYSE:GS
NASDAQ:SIVB
NYSE:BDX
NYSE:DHR
NYSE:DVA
NYSE:BAX
NASDAQ:GILD
NYSE:BSX
NASDAQ:HSIC
NYSE:CVS
NYSE:ABC
NYSE:CI
NYSE:CRL
NYSE:CNC
NYSE:ANTM
NYSE:BMY
NYSE:BIO
NYSE:ABT
NASDAQ:CERN
NASDAQ:BIIB
NYSE:CTLT
NASDAQ:ALGN
NASDAQ:DXCM
NYSE:LLY
NASDAQ:XRAY
NYSE:A
NYSE:EW
NYSE:CAH
NYSE:HUM
NASDAQ:HOLX
NASDAQ:ABMD
NYSE:ABBV
NASDAQ:AMGN
NYSE:HCA
NASDAQ:IDXX
NYSE:RMD
NYSE:PKI
NYSE:UHS
NASDAQ:ISRG
NASDAQ:INCY
NASDAQ:VRTX
NYSE:ZBH
NYSE:WAT
NYSE:IQV
NYSE:WST
NYSE:DGX
NASDAQ:VTRS
NYSE:JNJ
NASDAQ:MRNA
NYSE:STE
NYSE:OGN
NYSE:PRGO
NYSE:UNH
NYSE:SYK
NYSE:TFX
NYSE:ZTS
NYSE:COO
NYSE:MCK
NASDAQ:ILMN
NYSE:LH
NASDAQ:REGN
NYSE:TMO
NYSE:MDT
NYSE:PFE
NYSE:MRK
NYSE:MTD
NYSE:MAS
NASDAQ:CHRW
NASDAQ:JBHT
NASDAQ:HON
NYSE:HWM
NYSE:NSC
NYSE:GD
NYSE:LUV
NYSE:ROK
NASDAQ:EXPD
NYSE:ROL
NYSE:ROP
NYSE:CARR
NYSE:SNA
NYSE:HII
NYSE:NLSN
NYSE:MMM
NYSE:BA
NYSE:EFX
NYSE:GNRC
NYSE:CAT
NYSE:IR
NYSE:ETN
NYSE:GE
NYSE:EMR
NASDAQ:FAST
NYSE:LHX
NYSE:JCI
NYSE:FBHS
NYSE:LDOS
NYSE:FTV
NYSE:LMT
NYSE:FDX
NYSE:KSU
NYSE:ITW
NYSE:TDY
NYSE:J
NYSE:TT
NYSE:PNR
NYSE:RSG
NASDAQ:ODFL
NYSE:DE
NYSE:OTIS
NASDAQ:PCAR
NASDAQ:CSX
NASDAQ:CTAS
NYSE:PH
NYSE:CMI
NYSE:RTX
NYSE:IEX
NASDAQ:CPRT
NYSE:PWR
NYSE:TXT
NYSE:NOC
NYSE:DAL
NYSE:AOS
NYSE:INFO
NASDAQ:VRSK
NYSE:SWK
NYSE:AME
NYSE:UNP
NASDAQ:UAL
NYSE:UPS
NYSE:URI
NASDAQ:AAL
NYSE:RHI
NYSE:ALLE
NYSE:ALK
NYSE:GWW
NYSE:WAB
NYSE:WM
NYSE:XYL
NYSE:DOV
NYSE:TDG
NASDAQ:MSFT
NASDAQ:MU
NASDAQ:MCHP
NYSE:ORCL
NASDAQ:AKAM
NASDAQ:IPGP
NASDAQ:CTSH
NASDAQ:INTU
NASDAQ:PAYX
NYSE:PAYC
NYSE:WU
NYSE:TYL
NASDAQ:ENPH
NASDAQ:QRVO
NYSE:V
NASDAQ:CTXS
NYSE:HPQ
NASDAQ:PTC
NASDAQ:PYPL
NASDAQ:ADBE
NASDAQ:MPWR
NASDAQ:QCOM
NASDAQ:VRSN
NASDAQ:MXIM
NYSE:MA
NASDAQ:CSCO
NASDAQ:NXPI
NYSE:GLW
NYSE:HPE
NASDAQ:INTC
NYSE:JNPR
NYSE:ACN
NASDAQ:SWKS
NASDAQ:NLOK
NASDAQ:WDC
NASDAQ:JKHY
NASDAQ:AMD
NYSE:GPN
NASDAQ:ZBRA
NASDAQ:FFIV
NASDAQ:ADSK
NYSE:APH
NASDAQ:ADI
NASDAQ:ANSS
NASDAQ:NVDA
NASDAQ:CDW
NASDAQ:SNPS
NASDAQ:AAPL
NYSE:FIS
NASDAQ:AMAT
NASDAQ:FISV
NYSE:FLT
NYSE:ANET
NYSE:BR
NASDAQ:XLNX
NYSE:CRM
NASDAQ:TRMB
NYSE:KEYS
NASDAQ:STX
NASDAQ:TXN
NASDAQ:KLAC
NASDAQ:CDNS
NYSE:MSI
NASDAQ:FTNT
NYSE:IBM
NASDAQ:TER
NYSE:NOW
NASDAQ:AVGO
NYSE:TEL
NYSE:IT
NYSE:DXC
NASDAQ:ADP
NASDAQ:NTAP
NASDAQ:LRCX
NYSE:ECL
NYSE:FMC
NYSE:CE
NYSE:NUE
NYSE:CTVA
NYSE:MLM
NYSE:CF
NYSE:SHW
NYSE:LYB
NYSE:PPG
NYSE:LIN
NYSE:SEE
NYSE:DOW
NYSE:PKG
NYSE:DD
NYSE:AVY
NYSE:NEM
NYSE:IP
NYSE:IFF
NYSE:ALB
NYSE:FCX
NYSE:WRK
NYSE:BLL
NYSE:AMCR
NYSE:APD
NYSE:MOS
NYSE:VMC
NYSE:EMN
NYSE:PSA
NYSE:UDR
NYSE:VTR
NYSE:AMT
NASDAQ:EQIX
NYSE:O
NYSE:PLD
NYSE:EXR
NASDAQ:REG
NYSE:CBRE
NYSE:BXP
NYSE:PEAK
NYSE:IRM
NYSE:EQR
NASDAQ:SBAC
NYSE:VNO
NYSE:ESS
NYSE:CCI
NASDAQ:HST
NYSE:FRT
NYSE:DRE
NYSE:KIM
NYSE:ARE
NYSE:WELL
NYSE:DLR
NYSE:WY
NYSE:AVB
NYSE:SPG
NYSE:MAA
NYSE:ETR
NASDAQ:AEP
NYSE:ED
NYSE:CMS
NYSE:PNW
NYSE:WEC
NYSE:ES
NYSE:EVRG
NYSE:AWK
NYSE:PEG
NYSE:PPL
NYSE:FE
NASDAQ:LNT
NYSE:NI
NYSE:DTE
NYSE:SRE
NYSE:NEE
NASDAQ:XEL
NYSE:NRG
NYSE:AES
NYSE:DUK
NYSE:D
NYSE:SO
NYSE:CNP
NYSE:AEE
NYSE:ATO
NYSE:EIX
NASDAQ:EXC

[scripts] Refactor script to use simple python rather than tuttle

Also consider reverting to original script (non-wikipedia source).

Can you add cik to the dataset?

Hi,

Thanks for putting the list together. We are using your data for building stock market dashboards, if you could add one more column as the cik number that would be great, thanks!

can't use

I've tried for three days to download your s and p spreadsheet with data. It seems we are speaking different languages. My idea of easy to use is when you press download and it downloads. I've tried excel and google docs. I've read your stuff at the bottom of the page, but none of it makes sense. I've tried your help page, but I want help, and not to chat.

Cannot import name 'pipeline'

Hi everyone,

When I try to make the package, I get the following:

python test_data.py
Traceback (most recent call last):
  File "test_data.py", line 10, in <module>
    from goodtables import pipeline as _pipeline
ImportError: cannot import name 'pipeline'
make: *** [valid.txt] Error 1

Even after updating goodtables (only installed xlrd-1.1.0), I'm still running into the issue.

Goodtables version: 1.5.1
Python version: 3.6.4

Anyone else having this problem/know how to fix it?

Thanks

[scripts] Rework travis automation so we can run again

We want to rework travis automation so we can run again

only commit and push if a change in data
push to github
push to datahub

Acceptance criteria

only commit and push if a change in data
- push to GitHub
- publish to DataHub

Tasks

refactor travis.yml since it is outdated
add publishing to DataHub

Analysis

After trying to run a script, there are a bunch of libraries are outdated. For example, datapackage-py, goodtables-py and so on. And test script on scripts/test_data.py is not working.
I suggest to remove this test script or to write something else.

For the DataHub part we can add this snipped which runs only on deploy part

language: node_js
node_js:
- '8'
install:
- npm install -g git+https://github.com/datahq/datahub-cli.git
script: data push ./ --published
env:
  global:
    id: core
    username: core
  TRAVIS_SECURE_ENV_VARS: true

Using travis configuration, we can add token env.variable

Automate data updates with travis

@rgrp my doc on continuous processing with travis is nearly over, and I'd like this s-and-p-500-companies project to be the pilot.

When I want to add this project to travis I've got this message You require admin rights to enable these repositories. Could you either give me admin rights to the repository (I'm not sure what it means) or enable travis to run this project ? Thanks...

During installation ran into an OSError on OS X 10.11.6

System Version: OS X 10.11.6 (15G31)
Kernel Version: Darwin 15.6.0

$ pip install -r requirements.txt
Requirement already satisfied (use --upgrade to upgrade): xlrd==0.9.3 in /Library/Python/2.7/site-packages (from -r requirements.txt (line 1))
Requirement already satisfied (use --upgrade to upgrade): unicodecsv in /Library/Python/2.7/site-packages (from -r requirements.txt (line 2))
Requirement already satisfied (use --upgrade to upgrade): jinja2 in /Library/Python/2.7/site-packages (from -r requirements.txt (line 3))
Requirement already satisfied (use --upgrade to upgrade): beautifulsoup4 in /Library/Python/2.7/site-packages (from -r requirements.txt (line 4))
Collecting datapackage (from -r requirements.txt (line 5))
Downloading datapackage-0.8.0.tar.gz
Collecting goodtables (from -r requirements.txt (line 6))
Downloading goodtables-0.7.5.tar.gz
Requirement already satisfied (use --upgrade to upgrade): MarkupSafe in /Library/Python/2.7/site-packages (from jinja2->-r requirements.txt (line 3))
Collecting six>=1.10.0 (from datapackage->-r requirements.txt (line 5))
Downloading six-1.10.0-py2.py3-none-any.whl
Collecting requests>=2.8.0 (from datapackage->-r requirements.txt (line 5))
Downloading requests-2.10.0-py2.py3-none-any.whl (506kB)
100% |████████████████████████████████| 512kB 1.2MB/s
Collecting jsonschema>=2.5.1 (from datapackage->-r requirements.txt (line 5))
Downloading jsonschema-2.5.1-py2.py3-none-any.whl
Collecting tabulator>=0.4.0 (from datapackage->-r requirements.txt (line 5))
Downloading tabulator-0.4.0-py2.py3-none-any.whl
Collecting jsontableschema>=0.5.1 (from datapackage->-r requirements.txt (line 5))
Downloading jsontableschema-0.6.5-py2.py3-none-any.whl (41kB)
100% |████████████████████████████████| 51kB 9.9MB/s
Collecting Click>=3.3 (from goodtables->-r requirements.txt (line 6))
Downloading click-6.6.tar.gz (283kB)
100% |████████████████████████████████| 286kB 1.5MB/s
Collecting cchardet>=1.0.0 (from goodtables->-r requirements.txt (line 6))
Downloading cchardet-1.0.0.tar.gz (609kB)
100% |████████████████████████████████| 614kB 1.1MB/s
Collecting tellme>=0.2.4 (from goodtables->-r requirements.txt (line 6))
Downloading tellme-0.2.6.tar.gz
Collecting functools32; python_version == "2.7" (from jsonschema>=2.5.1->datapackage->-r requirements.txt (line 5))
Downloading functools32-3.2.3-2.zip
Collecting chardet>=2.0 (from tabulator>=0.4.0->datapackage->-r requirements.txt (line 5))
Downloading chardet-2.3.0.tar.gz (164kB)
100% |████████████████████████████████| 174kB 1.7MB/s
Collecting linear-tsv>=0.99.1 (from tabulator>=0.4.0->datapackage->-r requirements.txt (line 5))
Downloading linear-tsv-0.99.1.tar.gz
Collecting openpyxl>=2.0 (from tabulator>=0.4.0->datapackage->-r requirements.txt (line 5))
Downloading openpyxl-2.3.5.tar.gz (141kB)
100% |████████████████████████████████| 143kB 3.8MB/s
Collecting ijson>=2.0 (from tabulator>=0.4.0->datapackage->-r requirements.txt (line 5))
Downloading ijson-2.3-py2.py3-none-any.whl
Collecting python-dateutil>=2.4.0 (from jsontableschema>=0.5.1->datapackage->-r requirements.txt (line 5))
Downloading python_dateutil-2.5.3-py2.py3-none-any.whl (201kB)
100% |████████████████████████████████| 204kB 1.7MB/s
Collecting rfc3986>=0.3.0 (from jsontableschema>=0.5.1->datapackage->-r requirements.txt (line 5))
Downloading rfc3986-0.3.1-py2.py3-none-any.whl
Collecting future>=0.15.2 (from jsontableschema>=0.5.1->datapackage->-r requirements.txt (line 5))
Downloading future-0.15.2.tar.gz (1.6MB)
100% |████████████████████████████████| 1.6MB 552kB/s
Collecting PyYAML>=3.11 (from tellme>=0.2.4->goodtables->-r requirements.txt (line 6))
Downloading PyYAML-3.11.zip (371kB)
100% |████████████████████████████████| 378kB 1.4MB/s
Collecting dataset>=0.5.5 (from tellme>=0.2.4->goodtables->-r requirements.txt (line 6))
Downloading dataset-0.6.4.tar.gz
Collecting tabulate>=0.7.4 (from tellme>=0.2.4->goodtables->-r requirements.txt (line 6))
Downloading tabulate-0.7.5.tar.gz
Collecting jdcal (from openpyxl>=2.0->tabulator>=0.4.0->datapackage->-r requirements.txt (line 5))
Downloading jdcal-1.2.tar.gz
Collecting et_xmlfile (from openpyxl>=2.0->tabulator>=0.4.0->datapackage->-r requirements.txt (line 5))
Downloading et_xmlfile-1.0.1.tar.gz
Collecting sqlalchemy>=0.9.1 (from dataset>=0.5.5->tellme>=0.2.4->goodtables->-r requirements.txt (line 6))
Downloading SQLAlchemy-1.0.14.tar.gz (4.8MB)
100% |████████████████████████████████| 4.8MB 237kB/s
Collecting alembic>=0.6.2 (from dataset>=0.5.5->tellme>=0.2.4->goodtables->-r requirements.txt (line 6))
Downloading alembic-0.8.7.tar.gz (968kB)
100% |████████████████████████████████| 972kB 846kB/s
Collecting normality>=0.2.2 (from dataset>=0.5.5->tellme>=0.2.4->goodtables->-r requirements.txt (line 6))
Downloading normality-0.2.4-py2-none-any.whl
Collecting Mako (from alembic>=0.6.2->dataset>=0.5.5->tellme>=0.2.4->goodtables->-r requirements.txt (line 6))
Downloading Mako-1.0.4.tar.gz (574kB)
100% |████████████████████████████████| 583kB 1.1MB/s
Collecting python-editor>=0.3 (from alembic>=0.6.2->dataset>=0.5.5->tellme>=0.2.4->goodtables->-r requirements.txt (line 6))
Downloading python-editor-1.0.1.tar.gz
Installing collected packages: six, requests, functools32, jsonschema, chardet, linear-tsv, python-dateutil, rfc3986, future, Click, jsontableschema, jdcal, et-xmlfile, openpyxl, ijson, tabulator, datapackage, cchardet, PyYAML, sqlalchemy, Mako, python-editor, alembic, normality, dataset, tabulate, tellme, goodtables
Found existing installation: six 1.4.1
DEPRECATION: Uninstalling a distutils installed project (six) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.
Uninstalling six-1.4.1:
Exception:
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/pip-8.1.2-py2.7.egg/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/Library/Python/2.7/site-packages/pip-8.1.2-py2.7.egg/pip/commands/install.py", line 317, in run
prefix=options.prefix_path,
File "/Library/Python/2.7/site-packages/pip-8.1.2-py2.7.egg/pip/req/req_set.py", line 736, in install
requirement.uninstall(auto_confirm=True)
File "/Library/Python/2.7/site-packages/pip-8.1.2-py2.7.egg/pip/req/req_install.py", line 742, in uninstall
paths_to_remove.remove(auto_confirm)
File "/Library/Python/2.7/site-packages/pip-8.1.2-py2.7.egg/pip/req/req_uninstall.py", line 115, in remove
renames(path, new_path)
File "/Library/Python/2.7/site-packages/pip-8.1.2-py2.7.egg/pip/utils/init.py", line 267, in renames
shutil.move(old, new)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 302, in move
copy2(src, real_dst)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 131, in copy2
copystat(src, dst)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 103, in copystat
os.chflags(dst, st.st_flags)
OSError: [Errno 1] Operation not permitted: '/var/folders/tk/dblrc8gd4cn2kn04khqd10cm0000gp/T/pip-9oMFA8-uninstall/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/six-1.4.1-py2.7.egg-info'

COV replaced with ENDP

COV was replaced by ENDP on 20150126. See: http://www.reuters.com/article/2015/01/21/sp500-brief-idUSWEN00EKQ20150121

Remove POM

now a private holding co

POM Pepco Holdings Inc. Utilities 26.52 4.07 20.64 1.28 17.35 21.61

Remove constituents-financials?

Hello, cool project. I propose that the constituent-financials be removed from this data package.

The constituents.csv table is useful in its own right, containing symbol, name, and sector. This file currently still builds but the overall makefile is failing for multiple reasons, one of which is that the Yahoo Finance API seems to have have changed its terms or perhaps is going away altogether. which is causing the constituents-financials to fail.

My assumption is that stock quote and other financial data will forever be relegated to depending on free services that should be considered at a minimum as less dependable than the Wikipedia dependence for just the constituents data.

The financials are useful but they could be appended by a separate data package that validates the financials only.

Simplify data validation done in data_test.py

data_test.py step is failing because the pipeline class is no longer a part of the goodtables package. I found the datapackage-pipelines project which appears to be targeting a different use case than one-off simple validation.

I simply created https://github.com/noahg/s-and-p-500-csv/blob/master/scripts/validate.py as a quick way to check the validity of the newly generated csv (that it conforms to the datapackage.json).

My question, perhaps for @zelima , would my validate.py script suffice for this project going forward?

It's not clear to me what the organization's preference would be as I'm finding varying validation steps (or non at all) across other more recently updated datasets. Thanks!

Outdated S&P500 companies

Hi (I am moving this issue to this tracker)

The S&P500 dataset contains outdated constituents. Maybe there is an issue with the logic parsing wikipedia and removing eliminated/changed symbols?

i.e. search for DOW (Down Chemical Company - ticker gone due to merger), or RAI (Reynolds American - tricker gone due to buyout). I thin the full list of removed symbols as of now is AN, BCR, R, DD, SPLS, CHK, BBBY, DLPH, SIG, LVLT, BHI, RIG, DNB, YHOO, DOW, FTR, PDCO, SNI, COH, TSO, MJN, SWN, HAR, PCLN, HCN, MNK, FSLR, TGNA, WFM, URBN, MUR, CBG, LLTC, TDC, RAI

Relevant data set:

okfn data (containing outdated tickers):
https://pkgstore.datahub.io/core/s-and-p-500-companies/constituents_json/data/b0c0dbabbc66fa902dd40a9e5596263e/constituents_json.json

wikipedia source:
https://en.wikipedia.org/wiki/List_of_S%26P_500_companies

Sorry, I'd try and fix the code but Python isn't my natural habitat. I have since written a Wikipedia scraper in Java but that's probably of little help here.