maread99 / market_prices Goto Github PK
View Code? Open in Web Editor NEWGet meaningful OHLCV datasets
License: MIT License
Get meaningful OHLCV datasets
License: MIT License
Numpy 1.25 requires python >= 3,9,
Bump minimum version to 3.9, as #9.
It looks like Yahoo Finance blocked all non-US IPs on certain endpoints as of a few days ago.
For example, attempting to hit
https://query2.finance.yahoo.com/v6/finance/quoteSummary/AMD?modules=summaryProfile
from a non-US IP will result in a 404 error.
Hitting it using a US VPN provides the expected result.
Follow along here:
ranaroussi/yfinance#1729
Same issue as for gerrymanoim/exchange_calendars#207 - see notes there.
The 'install and import' check following the upload to the main PyPI repo is NOT installing and importing the uploaded new release, but rather the prior release.
However, the check following the upload to the test PyPI repo does SEEM to be checking the newly uploaded version although, as noted to exchange_calendars#207, this may not be the case.
Hi there, I'm trying to test the library but I'm getting this error:
RetryError: HTTPSConnectionPool(host='query2.finance.yahoo.com', port=443): Max retries exceeded with url: [/v1/test/getcrumb](https://file+.vscode-resource.vscode-cdn.net/v1/test/getcrumb) (Caused by ResponseError('too many 500 error responses'))
PD: it is the first run, other libraries (yfinance) are working as intended
Any hint?
Bugs with pandas 1.5 need addressing.
In the meantime the dependency is as in requirements.txt, i.e. pandas <= 1.4.4
TODO for advancing minimum python version to 3.9:
intervals.py
_BaseInterval.daily_bi
_BaseInterval.intraday_bis
classmethod
decorator will be deprecated.Optional
, Union
, List
etc can be removed in favour of builtin types and the | operator WITH THE EXCEPTION of not being able to use the | operator in parameter annotations of functions decorated with @valimp.parse
(get_type_hints does not support the | operator until py 3.10 - noted forward to migrate these to | as well when bump min py version to 3.10) (NB everywhere, can still use dict as opposed to Dict, and Union[str, None] as opposed to Optional[str].
@functools.lru_cache(maxsize=None)
to @functools.cache
pytz
in favor of zoneinfo
standard library zoneinfo
as its default timezone package. If it looks like pandas 2.0 could be released any time soon then might want to address this now. If it looks still to be a way off then could roll this item forwards (to dedicated issue), although in any event pandas already supports zoneinfo
and would be better off depending on a stdlib library than a third-party one.exchange-calendars
.Tests in test_yahoo.py
can be flaky (SEE following comment for update). The module includes integration tests of public methods (for example, PricesYahoo.get
) that call the yahoo API. These tests are not isolated, rather they are susceptible to inconsistent returns from the API, not least the 'missing prices for certain sessions' issue explained here (NB the issue arises as result of the high frequency of requests associated with executing the full test suite, it does not affect general usage or tests executed in isolation).
If one of these tests fails, either:
test_base_prices.py
and redefine to get prices from a locally stored resource.Over time these flaky tests were weeded out as bugs were caught and the moderation of test inputs was tightened up (for example, through the use of test_yahoo.get_valid_sessions
). However, some flaky tests WILL remain and it remains a matter of investigating migrating any of them that fail due to missing price data in order that another weed can be pulled out.
Separately, tests of test_daterange.py
employ hypothesis
which subjects the tests to dynamic inputs. Whilst flaky tests here are now rare, there's always the possibility of a failure on an combination of inputs that hypothesis hadn't tried before. Again, it's a matter of investigating there and then and either fixing the code being tested or the test itself.
It would be preferable for documentation to be built with Sphinx or MkDocs and published to a dedicated docs site.
Current documentation consists of:
It's all there, but it would be better prestented and easier to navigate if it were on a dedicated docs site.
It hopefully wouldn't be a huge task; the docs files are all .md and the docstrings are (aka 'should be') in numpy format.
Which one, Sphinx or MkDocs? Sphinx is probably more flexible, although would likely involve more work.
Given the number and nature of dev dependencies, dependabot would raise PRs pretty much every day (if it weren't limited to 5). Merging them all would clog up the commit history (I currently ignore them and raise a single PR manually every couple of weeks or so).
I'm thinking the ideal would be:
pip-compile --upgrade pyproject.toml
pip-compile --upgrade --extra=dev --output-file=requirements_dev.txt pyproject.toml
(output location could change if set up dependabot for main requirements only...)market_prices
to fail (as the tests executed when the PR is raised will show as failing).
requirements.txt
in a separate directory to the requirements_dev.txt
file. requirements_dev.txt
to a sub-directory (have a look through the workflows to change the location of any of them that look at it). EDIT - Done #105requirements_dev.txt
. #116 moves requirements.txt
to a dedicated folder and sets dependabot to look only at that folder. The following dependabot issues cover ongoing conversations WRT making it easier to define / ignore the lock files that dependabot acts on:
parsing.parsing_start_end
handles parsing to dates/times of an xcals.ExchangeCalendar
and the gregorian calendar. Review if this would be better handled be independent functions, perhaps with common code refactored out.
daterange.GetterDaily
accommodates getting dateranges for both daily and monthly prices, although the former is assessed against an xcals.ExchangeCalendar
and the latter against the gregorian calendar.
Review whether would be cleaner to define a dedicated GetterMonthly
class, perhaps with GetterDaily
and GetterMonthly
having a common subclass.
Would it be worth including an option to supress market_prices
warnings?
If anyone would like this feature, please ๐ this comment.
Ideal would be to be able to select which warnings to surpress, with an option to supress all.
Perhaps implement by introducing a config file?
It seems that the output price dataframe does not include the "adj. close" column from Yahoo Finance (or am I missing it somewhere?). As mentiond in the Yahoo Finance footnote:
"Close price adjusted for splits. Adjusted close price adjusted for splits and dividend and/or capital gain distributions."
So the adjusted close is more critical in some, if not most, cases.
PricesYahoo("VUX.V")
market_prices.errors.CalendarError: Unable to ascertain calendar for symbol 'VUX.V'
https://finance.yahoo.com/quote/VUX.V?p=VUX.V&.tsrc=fin-srch
It seems that ETF symbols are not supported, is there any planned work for that?
#176 added a short-term patch to support pydantic v2. This just detects (in a pretty precarious manner) if user has imported pydantic v2 and if so it imports v1 (which is accessible via v2) over the v2 version. This is not a long-term solution.
Need to migrate to pydantic v2
Should a lead_symbol = "all"
option be introduced to evalute the period against a composite of all underlying calendars as opposed to a specific calendar?
The period would be evaluated such that:
start
and end
evaluate to sessions/times of the composite calendar.Please ๐ this comment if you would find this option useful.
Without giving it too much thought, might require:
daterange.Getter_
subclasses to work off a calendar_utils.CompositeCalendar
? Or would it be necessary to define dedicated GetterCompDaily
and GetterCompIntraday
classes?Would definiitely require:
โฏ /usr/bin/python3
Python 3.9.6 (default, May 7 2023, 23:32:45)
[Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from market_prices import PricesYahoo
>>> prices = PricesYahoo("MSFT")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/ilankleiman/Library/Python/3.9/lib/python/site-packages/valimp/valimp.py", line 935, in wrapped_f
return f(**new_as_kwargs)
File "/Users/ilankleiman/Library/Python/3.9/lib/python/site-packages/market_prices/prices/yahoo.py", line 353, in __init__
calendars = self._ascertain_calendars(calendars)
File "/Users/ilankleiman/Library/Python/3.9/lib/python/site-packages/market_prices/prices/yahoo.py", line 389, in _ascertain_calendars
exchange = self._yahoo_exchange_name[s]
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/functools.py", line 969, in __get__
val = self.func(instance)
File "/Users/ilankleiman/Library/Python/3.9/lib/python/site-packages/market_prices/prices/yahoo.py", line 369, in _yahoo_exchange_name
d[s] = self._ticker.price[s]["exchangeName"]
TypeError: string indices must be integers
latest release, 0.11
daterange.GetterIntraday.daterange
is relatively slow and is called multiple times with each call to prices.base.PricesBase.get
(indeed, an instance of daterange.GetterIntraday
is created multiple times).
Review method to see if it can be sped up, to include:
exchange_calendar
methods be made with _parse=False
? Would it make much of a difference when all added up?I run the following example:
from market_prices import PricesYahoo
prices = PricesYahoo("MSFT") # prices for a single instrument, Microsoft
prices.get("5min", minutes=40) # last 40 minutes of prices at 5 minute intervals
and got the following error:
TypeError Traceback (most recent call last)
in
1 from market_prices import PricesYahoo
----> 2 prices = PricesYahoo("MSFT") # prices for a single instrument, Microsoft
3 prices.get("5min", minutes=40) # last 40 minutes of prices at 5 minute intervals
e:\anaconda3\lib\site-packages\pydantic\decorator.cp38-win_amd64.pyd in pydantic.decorator.validate_arguments.validate.wrapper_function()
e:\anaconda3\lib\site-packages\pydantic\decorator.cp38-win_amd64.pyd in pydantic.decorator.ValidatedFunction.call()
e:\anaconda3\lib\site-packages\pydantic\decorator.cp38-win_amd64.pyd in pydantic.decorator.ValidatedFunction.execute()
e:\anaconda3\lib\site-packages\market_prices\prices\yahoo.py in init(self, symbols, calendars, lead_symbol, delays)
348 self._cache_vol_bug_adj_start: None | tuple[pd.Timestamp, pd.Timestamp] = None
349 self._set_daily_bi_limit()
--> 350 super().init(symbols, calendars, lead_symbol, delays)
351
352 # Methods called via constructor
e:\anaconda3\lib\site-packages\market_prices\prices\base.py in init(self, symbols, calendars, lead_symbol, delays)
440 self._set_pdata()
441 self.trading_indexes: dict[BI, pd.IntervalIndex]
--> 442 self._set_trading_indexes()
443 self.indices_aligned: dict[BI, pd.Series]
...
603 # evaluate index as nano array
<array_function internals> in concatenate(*args, **kwargs)
TypeError: concatenate() got an unexpected keyword argument 'dtype'
Optional
and Union
can now be migrated to using the |
operator in the following circumstances:
@valimp.parse
(get_type_hints did not support the | operator prior to py 3.10). For example, Optional[Union[str, float, int]]
can now be annotated as str | float | int | None
.Symbols = Union[list[str], str]
can now be defined as Symbols = list[str] | str
Optional
and Union
, including with the docs, and remove / revise.from __future__ import annoations
lines from the top of modules.TO REVIEW losing the right limit on data.Data
data.Data.rl
(the right limit up to which data can be requested for a specific base interval) is currently set to 'now' + base interval (bi). The + bi
provides for querying and requesting data that includes the live interval. BUT, only if querying for the bi. If query the end of the 'now' live interval based on a downsample interval (ds_interval) that's longer than the bi then comes up as not available or raises an error - it reasonably assumes the data's not available on the right side of the right limit.
Accommodating this currently involves three mini hacks in the following methods of base.py
:
_get_bi_table_intraday
_bis_available_end
_bis_available
If were to able to lose the hard right limit then these methods could be tidied up. There's a comment above the relevent lines in each of these method.
Any changes would likely involve quite a bit of reworking the tests for data.Data
.
Same issue as gerrymanoim/exchange_calendars#207.
The release workflow deploys to PyPI ok, although fails on the subsequent install and import check.
As noted here, looks like an error in the workflow's while loop.
Workflow failed on checking the 0.10.2 release although until then had always passed (never previously having entered the while loop).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.