GithubHelp home page GithubHelp logo

maread99 / market_prices Goto Github PK

View Code? Open in Web Editor NEW
73.0 73.0 8.0 28.37 MB

Get meaningful OHLCV datasets

License: MIT License

Python 100.00%
bitcoin-data datasets financial-data ohlc ohlcv pandas python stock-data yahoo-finance

market_prices's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

market_prices's Issues

too many 500 error responses

Hi there, I'm trying to test the library but I'm getting this error:
RetryError: HTTPSConnectionPool(host='query2.finance.yahoo.com', port=443): Max retries exceeded with url: [/v1/test/getcrumb](https://file+.vscode-resource.vscode-cdn.net/v1/test/getcrumb) (Caused by ResponseError('too many 500 error responses'))

PD: it is the first run, other libraries (yfinance) are working as intended

Any hint?

Minimum python version to 3.9 TODO

TODO for advancing minimum python version to 3.9:

intervals.py

  • Look to change following enum class methods to class properties:
    • _BaseInterval.daily_bi
    • _BaseInterval.intraday_bis
      EDIT 22/06/07 - no longer an option as from 3.11 wrapping other decorators with classmethod decorator will be deprecated.

pydantic

  • given changes with pydantic v2, migrate to in-house input validation and drop the pydantic dependency

types

  • Typing classes, of like Optional, Union, List etc can be removed in favour of builtin types and the | operator WITH THE EXCEPTION of not being able to use the | operator in parameter annotations of functions decorated with @valimp.parse (get_type_hints does not support the | operator until py 3.10 - noted forward to migrate these to | as well when bump min py version to 3.10) (NB everywhere, can still use dict as opposed to Dict, and Union[str, None] as opposed to Optional[str].
    • Also remove these deprecated type generics from all the doc in favour of builtins and | operator.

functools

  • Change any use of @functools.lru_cache(maxsize=None) to @functools.cache

timezones

  • Drop pytz in favor of zoneinfo standard library (either now or later). Looks likely that pandas 2.0 will switch to zoneinfo as its default timezone package. If it looks like pandas 2.0 could be released any time soon then might want to address this now. If it looks still to be a way off then could roll this item forwards (to dedicated issue), although in any event pandas already supports zoneinfo and would be better off depending on a stdlib library than a third-party one. See gerrymanoim/exchange_calendars#322. Move over at same time as exchange-calendars.

Migration of flaky tests

Tests in test_yahoo.py can be flaky (SEE following comment for update). The module includes integration tests of public methods (for example, PricesYahoo.get) that call the yahoo API. These tests are not isolated, rather they are susceptible to inconsistent returns from the API, not least the 'missing prices for certain sessions' issue explained here (NB the issue arises as result of the high frequency of requests associated with executing the full test suite, it does not affect general usage or tests executed in isolation).

If one of these tests fails, either:

  • it's still susceptible to flakylisted sessions - the test needs revising EDIT - migrate it to test_base_prices.py and redefine to get prices from a locally stored resource.
  • it's failing on a nature of inputs that it hasn't been tested against before (or if it was it wasn't addressed). In which case:
    • either there's a bug in the code that needs fixing.
    • or there's a bug in the test that needs fixing.

Over time these flaky tests were weeded out as bugs were caught and the moderation of test inputs was tightened up (for example, through the use of test_yahoo.get_valid_sessions). However, some flaky tests WILL remain and it remains a matter of investigating migrating any of them that fail due to missing price data in order that another weed can be pulled out.

Separately, tests of test_daterange.py employ hypothesis which subjects the tests to dynamic inputs. Whilst flaky tests here are now rare, there's always the possibility of a failure on an combination of inputs that hypothesis hadn't tried before. Again, it's a matter of investigating there and then and either fixing the code being tested or the test itself.

Documentation (to a doc site)

It would be preferable for documentation to be built with Sphinx or MkDocs and published to a dedicated docs site.

Current documentation consists of:

  • README.md.
  • .ipynb notebook tutorials in docs/tutorials.
  • various .md files in the docs folder.
  • method docstrings (which are comprehensive).

It's all there, but it would be better prestented and easier to navigate if it were on a dedicated docs site.

It hopefully wouldn't be a huge task; the docs files are all .md and the docstrings are (aka 'should be') in numpy format.

Which one, Sphinx or MkDocs? Sphinx is probably more flexible, although would likely involve more work.

Calm down dependabot

Given the number and nature of dev dependencies, dependabot would raise PRs pretty much every day (if it weren't limited to 5). Merging them all would clog up the commit history (I currently ignore them and raise a single PR manually every couple of weeks or so).

I'm thinking the ideal would be:

  • Update dependencies once a fortnight, even monthly, via either
    • as presently, update manually via running the following pip-compile commands and raising a PR with the changes:
      • pip-compile --upgrade pyproject.toml
      • pip-compile --upgrade --extra=dev --output-file=requirements_dev.txt pyproject.toml (output location could change if set up dependabot for main requirements only...)
    • set up a GH workflow to automate the above, set to run once a month (see gerrymanoim/exchange_calendars#238).
  • Use dependabot ONLY for the main requirements (not the dev). Don't merge the PRs that dependabot raises (rely instead on the above) but having dependabot raise the PRs will give an immediate heads up if a dependency upgrade causes market_prices to fail (as the tests executed when the PR is raised will show as failing).
    • This would require having requirements.txt in a separate directory to the requirements_dev.txt file. Probably could just move the requirements_dev.txt to a sub-directory (have a look through the workflows to change the location of any of them that look at it). EDIT - Done #105. EDIT - that didn't work, dependabot looks in the configured directory and all its subdirectories, i.e. under the previous arrangement, with the configured directory as the project root, dependabot could still find requirements_dev.txt. #116 moves requirements.txt to a dedicated folder and sets dependabot to look only at that folder. The following dependabot issues cover ongoing conversations WRT making it easier to define / ignore the lock files that dependabot acts on:

Refactor `parsing_start_end`

parsing.parsing_start_end handles parsing to dates/times of an xcals.ExchangeCalendar and the gregorian calendar. Review if this would be better handled be independent functions, perhaps with common code refactored out.

Refactor from `GetterDaily` to new `GetterMonthly` class

daterange.GetterDaily accommodates getting dateranges for both daily and monthly prices, although the former is assessed against an xcals.ExchangeCalendar and the latter against the gregorian calendar.

Review whether would be cleaner to define a dedicated GetterMonthly class, perhaps with GetterDaily and GetterMonthly having a common subclass.

Make warnings optional?

Would it be worth including an option to supress market_prices warnings?

If anyone would like this feature, please ๐Ÿ‘ this comment.

Ideal would be to be able to select which warnings to surpress, with an option to supress all.

Perhaps implement by introducing a config file?

Adjusted close price

It seems that the output price dataframe does not include the "adj. close" column from Yahoo Finance (or am I missing it somewhere?). As mentiond in the Yahoo Finance footnote:

"Close price adjusted for splits. Adjusted close price adjusted for splits and dividend and/or capital gain distributions."

So the adjusted close is more critical in some, if not most, cases.

Include a `lead_symbol = "all"` option?

Should a lead_symbol = "all" option be introduced to evalute the period against a composite of all underlying calendars as opposed to a specific calendar?

The period would be evaluated such that:

  • start and end evaluate to sessions/times of the composite calendar.
  • durations are evaluated against the minutes / sessions of the composite calendar.

Please ๐Ÿ‘ this comment if you would find this option useful.

Without giving it too much thought, might require:

  • Could an option be added to existing daterange.Getter_ subclasses to work off a calendar_utils.CompositeCalendar? Or would it be necessary to define dedicated GetterCompDaily and GetterCompIntraday classes?

Would definiitely require:

  • updating relevant docs.
  • updating relevant tutorials.

TypeError: string indices must be integers

โฏ /usr/bin/python3
Python 3.9.6 (default, May  7 2023, 23:32:45)
[Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from market_prices import PricesYahoo
>>> prices = PricesYahoo("MSFT")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ilankleiman/Library/Python/3.9/lib/python/site-packages/valimp/valimp.py", line 935, in wrapped_f
    return f(**new_as_kwargs)
  File "/Users/ilankleiman/Library/Python/3.9/lib/python/site-packages/market_prices/prices/yahoo.py", line 353, in __init__
    calendars = self._ascertain_calendars(calendars)
  File "/Users/ilankleiman/Library/Python/3.9/lib/python/site-packages/market_prices/prices/yahoo.py", line 389, in _ascertain_calendars
    exchange = self._yahoo_exchange_name[s]
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/functools.py", line 969, in __get__
    val = self.func(instance)
  File "/Users/ilankleiman/Library/Python/3.9/lib/python/site-packages/market_prices/prices/yahoo.py", line 369, in _yahoo_exchange_name
    d[s] = self._ticker.price[s]["exchangeName"]
TypeError: string indices must be integers

latest release, 0.11

Review `daterange.GetterIntraday.daterange` for performance

daterange.GetterIntraday.daterange is relatively slow and is called multiple times with each call to prices.base.PricesBase.get (indeed, an instance of daterange.GetterIntraday is created multiple times).

Review method to see if it can be sped up, to include:

  • can calls to exchange_calendar methods be made with _parse=False? Would it make much of a difference when all added up?

Define proxies for `PricesYahoo`

I run the following example:

from market_prices import PricesYahoo
prices = PricesYahoo("MSFT")  # prices for a single instrument, Microsoft
prices.get("5min", minutes=40)  # last 40 minutes of prices at 5 minute intervals

and got the following error:

TypeError Traceback (most recent call last)
in
1 from market_prices import PricesYahoo
----> 2 prices = PricesYahoo("MSFT") # prices for a single instrument, Microsoft
3 prices.get("5min", minutes=40) # last 40 minutes of prices at 5 minute intervals

e:\anaconda3\lib\site-packages\pydantic\decorator.cp38-win_amd64.pyd in pydantic.decorator.validate_arguments.validate.wrapper_function()

e:\anaconda3\lib\site-packages\pydantic\decorator.cp38-win_amd64.pyd in pydantic.decorator.ValidatedFunction.call()

e:\anaconda3\lib\site-packages\pydantic\decorator.cp38-win_amd64.pyd in pydantic.decorator.ValidatedFunction.execute()

e:\anaconda3\lib\site-packages\market_prices\prices\yahoo.py in init(self, symbols, calendars, lead_symbol, delays)
348 self._cache_vol_bug_adj_start: None | tuple[pd.Timestamp, pd.Timestamp] = None
349 self._set_daily_bi_limit()
--> 350 super().init(symbols, calendars, lead_symbol, delays)
351
352 # Methods called via constructor

e:\anaconda3\lib\site-packages\market_prices\prices\base.py in init(self, symbols, calendars, lead_symbol, delays)
440 self._set_pdata()
441 self.trading_indexes: dict[BI, pd.IntervalIndex]
--> 442 self._set_trading_indexes()
443 self.indices_aligned: dict[BI, pd.Series]
...
603 # evaluate index as nano array

<array_function internals> in concatenate(*args, **kwargs)

TypeError: concatenate() got an unexpected keyword argument 'dtype'

Minimum Python version to 3.10 TODO

type annotations

  • Use of Optional and Union can now be migrated to using the | operator in the following circumstances:
    • parameter annotations of functions decorated with @valimp.parse (get_type_hints did not support the | operator prior to py 3.10). For example, Optional[Union[str, float, int]] can now be annotated as str | float | int | None.
    • creating type aliases, for example Symbols = Union[list[str], str] can now be defined as Symbols = list[str] | str
    • Indeed, search across the library for any references to Optional and Union, including with the docs, and remove / revise.
  • Should also be able to lose the from __future__ import annoations lines from the top of modules.
  • Can probably by now revert the fix implemented to accommodate bug in pandas 2.1.0 concerning treatment of origin. Intention in pandas-dev/pandas#55064 was to revert in 2.1.1. Required changes to revert fix are noted in comments on the commit.

Lose hard right limit on data.Data?

TO REVIEW losing the right limit on data.Data

data.Data.rl (the right limit up to which data can be requested for a specific base interval) is currently set to 'now' + base interval (bi). The + bi provides for querying and requesting data that includes the live interval. BUT, only if querying for the bi. If query the end of the 'now' live interval based on a downsample interval (ds_interval) that's longer than the bi then comes up as not available or raises an error - it reasonably assumes the data's not available on the right side of the right limit.

Accommodating this currently involves three mini hacks in the following methods of base.py:

  • _get_bi_table_intraday
  • _bis_available_end
  • _bis_available

If were to able to lose the hard right limit then these methods could be tidied up. There's a comment above the relevent lines in each of these method.

Any changes would likely involve quite a bit of reworking the tests for data.Data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.