GithubHelp home page GithubHelp logo

stefan-jansen / machine-learning-for-trading Goto Github PK

View Code? Open in Web Editor NEW
11.8K 377.0 3.9K 667.34 MB

Code for Machine Learning for Algorithmic Trading, 2nd edition.

Home Page: https://ml4trading.io

Python 0.04% Shell 0.01% Jupyter Notebook 99.96% JavaScript 0.01%
machine-learning trading investment finance data-science investment-strategies artificial-intelligence trading-strategies deep-learning synthetic-data

machine-learning-for-trading's Introduction

ML for Trading - 2nd Edition

This book aims to show how ML can add value to algorithmic trading strategies in a practical yet comprehensive way. It covers a broad range of ML techniques from linear regression to deep reinforcement learning and demonstrates how to build, backtest, and evaluate a trading strategy driven by model predictions.

In four parts with 23 chapters plus an appendix, it covers on over 800 pages:

  • important aspects of data sourcing, financial feature engineering, and portfolio management,
  • the design and evaluation of long-short strategies based on supervised and unsupervised ML algorithms,
  • how to extract tradeable signals from financial text data like SEC filings, earnings call transcripts or financial news,
  • using deep learning models like CNN and RNN with market and alternative data, how to generate synthetic data with generative adversarial networks, and training a trading agent using deep reinforcement learning

This repo contains over 150 notebooks that put the concepts, algorithms, and use cases discussed in the book into action. They provide numerous examples that show:

  • how to work with and extract signals from market, fundamental and alternative text and image data,
  • how to train and tune models that predict returns for different asset classes and investment horizons, including how to replicate recently published research, and
  • how to design, backtest, and evaluate trading strategies.

We highly recommend reviewing the notebooks while reading the book; they are usually in an executed state and often contain additional information not included due to space constraints.

In addition to the information in this repo, the book's website contains chapter summary and additional information.

Join the ML4T Community!

To make it easy for readers to ask questions about the book's content and code examples, as well as the development and implementation of their own strategies and industry developments, we are hosting an online platform.

Please join our community and connect with fellow traders interested in leveraging ML for trading strategies, share your experience, and learn from each other!

What's new in the 2nd Edition?

First and foremost, this book demonstrates how you can extract signals from a diverse set of data sources and design trading strategies for different asset classes using a broad range of supervised, unsupervised, and reinforcement learning algorithms. It also provides relevant mathematical and statistical knowledge to facilitate the tuning of an algorithm or the interpretation of the results. Furthermore, it covers the financial background that will help you work with market and fundamental data, extract informative features, and manage the performance of a trading strategy.

From a practical standpoint, the 2nd edition aims to equip you with the conceptual understanding and tools to develop your own ML-based trading strategies. To this end, it frames ML as a critical element in a process rather than a standalone exercise, introducing the end-to-end ML for trading workflow from data sourcing, feature engineering, and model optimization to strategy design and backtesting.

More specifically, the ML4T workflow starts with generating ideas for a well-defined investment universe, collecting relevant data, and extracting informative features. It also involves designing, tuning, and evaluating ML models suited to the predictive task. Finally, it requires developing trading strategies to act on the models' predictive signals, as well as simulating and evaluating their performance on historical data using a backtesting engine. Once you decide to execute an algorithmic strategy in a real market, you will find yourself iterating over this workflow repeatedly to incorporate new information and a changing environment.

The second edition's emphasis on the ML4t workflow translates into a new chapter on strategy backtesting, a new appendix describing over 100 different alpha factors, and many new practical applications. We have also rewritten most of the existing content for clarity and readability.

The trading applications now use a broader range of data sources beyond daily US equity prices, including international stocks and ETFs. It also demonstrates how to use ML for an intraday strategy with minute-frequency equity data. Furthermore, it extends the coverage of alternative data sources to include SEC filings for sentiment analysis and return forecasts, as well as satellite images to classify land use.

Another innovation of the second edition is to replicate several trading applications recently published in top journals:

All applications now use the latest available (at the time of writing) software versions such as pandas 1.0 and TensorFlow 2.2. There is also a customized version of Zipline that makes it easy to include machine learning model predictions when designing a trading strategy.

Installation, data sources and bug reports

The code examples rely on a wide range of Python libraries from the data science and finance domains.

It is not necessary to try and install all libraries at once because this increases the likeliihood of encountering version conflicts. Instead, we recommend that you install the libraries required for a specific chapter as you go along.

Update March 2022: zipline-reloaded, pyfolio-reloaded, alphalens-reloaded, and empyrical-reloaded are now available on the conda-forge channel. The channel ml4t only contains outdated versions and will soon be removed.

Update April 2021: with the update of Zipline, it is no longer necessary to use Docker. The installation instructions now refer to OS-specific environment files that should simplify your running of the notebooks.

Update Februar 2021: code sample release 2.0 updates the conda environments provided by the Docker image to Python 3.8, Pandas 1.2, and TensorFlow 1.2, among others; the Zipline backtesting environment with now uses Python 3.6.

  • The installation directory contains detailed instructions on setting up and using a Docker image to run the notebooks. It also contains configuration files for setting up various conda environments and install the packages used in the notebooks directly on your machine if you prefer (and, depending on your system, are prepared to go the extra mile).
  • To download and preprocess many of the data sources used in this book, see the instructions in the README file alongside various notebooks in the data directory.

If you have any difficulties installing the environments, downloading the data or running the code, please raise a GitHub issue in the repo (here). Working with GitHub issues has been described here.

Update: You can download the algoseek data used in the book here. See instructions for preprocessing in Chapter 2 and an intraday example with a gradient boosting model in Chapter 12.

Update: The figures directory contains color versions of the charts used in the book.

Outline & Chapter Summary

The book has four parts that address different challenges that arise when sourcing and working with market, fundamental and alternative data sourcing, developing ML solutions to various predictive tasks in the trading context, and designing and evaluating a trading strategy that relies on predictive signals generated by an ML model.

The directory for each chapter contains a README with additional information on content, code examples and additional resources.

Part 1: From Data to Strategy Development

Part 2: Machine Learning for Trading: Fundamentals

Part 3: Natural Language Processing for Trading

Part 4: Deep & Reinforcement Learning

Part 1: From Data to Strategy Development

The first part provides a framework for developing trading strategies driven by machine learning (ML). It focuses on the data that power the ML algorithms and strategies discussed in this book, outlines how to engineer and evaluates features suitable for ML models, and how to manage and measure a portfolio's performance while executing a trading strategy.

01 Machine Learning for Trading: From Idea to Execution

This chapter explores industry trends that have led to the emergence of ML as a source of competitive advantage in the investment industry. We will also look at where ML fits into the investment process to enable algorithmic trading strategies.

More specifically, it covers the following topics:

  • Key trends behind the rise of ML in the investment industry
  • The design and execution of a trading strategy that leverages ML
  • Popular use cases for ML in trading

02 Market & Fundamental Data: Sources and Techniques

This chapter shows how to work with market and fundamental data and describes critical aspects of the environment that they reflect. For example, familiarity with various order types and the trading infrastructure matter not only for the interpretation of the data but also to correctly design backtest simulations. We also illustrate how to use Python to access and manipulate trading and financial statement data.

Practical examples demonstrate how to work with trading data from NASDAQ tick data and Algoseek minute bar data with a rich set of attributes capturing the demand-supply dynamic that we will later use for an ML-based intraday strategy. We also cover various data provider APIs and how to source financial statement information from the SEC.

In particular, this chapter covers:
  • How market data reflects the structure of the trading environment
  • Working with intraday trade and quotes data at minute frequency
  • Reconstructing the limit order book from tick data using NASDAQ ITCH
  • Summarizing tick data using various types of bars
  • Working with eXtensible Business Reporting Language (XBRL)-encoded electronic filings
  • Parsing and combining market and fundamental data to create a P/E series
  • How to access various market and fundamental data sources using Python

03 Alternative Data for Finance: Categories and Use Cases

This chapter outlines categories and use cases of alternative data, describes criteria to assess the exploding number of sources and providers, and summarizes the current market landscape.

It also demonstrates how to create alternative data sets by scraping websites, such as collecting earnings call transcripts for use with natural language processing (NLP) and sentiment analysis algorithms in the third part of the book.

More specifically, this chapter covers:

  • Which new sources of signals have emerged during the alternative data revolution
  • How individuals, business, and sensors generate a diverse set of alternative data
  • Important categories and providers of alternative data
  • Evaluating how the burgeoning supply of alternative data can be used for trading
  • Working with alternative data in Python, such as by scraping the internet

04 Financial Feature Engineering: How to research Alpha Factors

If you are already familiar with ML, you know that feature engineering is a crucial ingredient for successful predictions. It matters at least as much in the trading domain, where academic and industry researchers have investigated for decades what drives asset markets and prices, and which features help to explain or predict price movements.

This chapter outlines the key takeaways of this research as a starting point for your own quest for alpha factors. It also presents essential tools to compute and test alpha factors, highlighting how the NumPy, pandas, and TA-Lib libraries facilitate the manipulation of data and present popular smoothing techniques like the wavelets and the Kalman filter that help reduce noise in data. After reading it, you will know about:

  • Which categories of factors exist, why they work, and how to measure them,
  • Creating alpha factors using NumPy, pandas, and TA-Lib,
  • How to de-noise data using wavelets and the Kalman filter,
  • Using Zipline to test individual and multiple alpha factors,
  • How to use Alphalens to evaluate predictive performance.

05 Portfolio Optimization and Performance Evaluation

Alpha factors generate signals that an algorithmic strategy translates into trades, which, in turn, produce long and short positions. The returns and risk of the resulting portfolio determine whether the strategy meets the investment objectives.

There are several approaches to optimize portfolios. These include the application of machine learning (ML) to learn hierarchical relationships among assets and treat them as complements or substitutes when designing the portfolio's risk profile. This chapter covers:

  • How to measure portfolio risk and return
  • Managing portfolio weights using mean-variance optimization and alternatives
  • Using machine learning to optimize asset allocation in a portfolio context
  • Simulating trades and create a portfolio based on alpha factors using Zipline
  • How to evaluate portfolio performance using pyfolio

Part 2: Machine Learning for Trading: Fundamentals

The second part covers the fundamental supervised and unsupervised learning algorithms and illustrates their application to trading strategies. It also introduces the Quantopian platform that allows you to leverage and combine the data and ML techniques developed in this book to implement algorithmic strategies that execute trades in live markets.

06 The Machine Learning Process

This chapter kicks off Part 2 that illustrates how you can use a range of supervised and unsupervised ML models for trading. We will explain each model's assumptions and use cases before we demonstrate relevant applications using various Python libraries.

There are several aspects that many of these models and their applications have in common. This chapter covers these common aspects so that we can focus on model-specific usage in the following chapters. It sets the stage by outlining how to formulate, train, tune, and evaluate the predictive performance of ML models as a systematic workflow. The content includes:

  • How supervised and unsupervised learning from data works
  • Training and evaluating supervised learning models for regression and classification tasks
  • How the bias-variance trade-off impacts predictive performance
  • How to diagnose and address prediction errors due to overfitting
  • Using cross-validation to optimize hyperparameters with a focus on time-series data
  • Why financial data requires additional attention when testing out-of-sample

07 Linear Models: From Risk Factors to Return Forecasts

Linear models are standard tools for inference and prediction in regression and classification contexts. Numerous widely used asset pricing models rely on linear regression. Regularized models like Ridge and Lasso regression often yield better predictions by limiting the risk of overfitting. Typical regression applications identify risk factors that drive asset returns to manage risks or predict returns. Classification problems, on the other hand, include directional price forecasts.

Chapter 07 covers the following topics:

  • How linear regression works and which assumptions it makes
  • Training and diagnosing linear regression models
  • Using linear regression to predict stock returns
  • Use regularization to improve the predictive performance
  • How logistic regression works
  • Converting a regression into a classification problem

08 The ML4T Workflow: From Model to Strategy Backtesting

This chapter presents an end-to-end perspective on designing, simulating, and evaluating a trading strategy driven by an ML algorithm. We will demonstrate in detail how to backtest an ML-driven strategy in a historical market context using the Python libraries backtrader and Zipline. The ML4T workflow ultimately aims to gather evidence from historical data that helps decide whether to deploy a candidate strategy in a live market and put financial resources at risk. A realistic simulation of your strategy needs to faithfully represent how security markets operate and how trades execute. Also, several methodological aspects require attention to avoid biased results and false discoveries that will lead to poor investment decisions.

More specifically, after working through this chapter you will be able to:

  • Plan and implement end-to-end strategy backtesting
  • Understand and avoid critical pitfalls when implementing backtests
  • Discuss the advantages and disadvantages of vectorized vs event-driven backtesting engines
  • Identify and evaluate the key components of an event-driven backtester
  • Design and execute the ML4T workflow using data sources at minute and daily frequencies, with ML models trained separately or as part of the backtest
  • Use Zipline and backtrader to design and evaluate your own strategies

09 Time Series Models for Volatility Forecasts and Statistical Arbitrage

This chapter focuses on models that extract signals from a time series' history to predict future values for the same time series. Time series models are in widespread use due to the time dimension inherent to trading. It presents tools to diagnose time series characteristics such as stationarity and extract features that capture potentially useful patterns. It also introduces univariate and multivariate time series models to forecast macro data and volatility patterns. Finally, it explains how cointegration identifies common trends across time series and shows how to develop a pairs trading strategy based on this crucial concept.

In particular, it covers:

  • How to use time-series analysis to prepare and inform the modeling process
  • Estimating and diagnosing univariate autoregressive and moving-average models
  • Building autoregressive conditional heteroskedasticity (ARCH) models to predict volatility
  • How to build multivariate vector autoregressive models
  • Using cointegration to develop a pairs trading strategy

10 Bayesian ML: Dynamic Sharpe Ratios and Pairs Trading

Bayesian statistics allows us to quantify uncertainty about future events and refine estimates in a principled way as new information arrives. This dynamic approach adapts well to the evolving nature of financial markets. Bayesian approaches to ML enable new insights into the uncertainty around statistical metrics, parameter estimates, and predictions. The applications range from more granular risk management to dynamic updates of predictive models that incorporate changes in the market environment.

More specifically, this chapter covers:

  • How Bayesian statistics applies to machine learning
  • Probabilistic programming with PyMC3
  • Defining and training machine learning models using PyMC3
  • How to run state-of-the-art sampling methods to conduct approximate inference
  • Bayesian ML applications to compute dynamic Sharpe ratios, dynamic pairs trading hedge ratios, and estimate stochastic volatility

11 Random Forests: A Long-Short Strategy for Japanese Stocks

This chapter applies decision trees and random forests to trading. Decision trees learn rules from data that encode nonlinear input-output relationships. We show how to train a decision tree to make predictions for regression and classification problems, visualize and interpret the rules learned by the model, and tune the model's hyperparameters to optimize the bias-variance tradeoff and prevent overfitting.

The second part of the chapter introduces ensemble models that combine multiple decision trees in a randomized fashion to produce a single prediction with a lower error. It concludes with a long-short strategy for Japanese equities based on trading signals generated by a random forest model.

In short, this chapter covers:

  • Use decision trees for regression and classification
  • Gain insights from decision trees and visualize the rules learned from the data
  • Understand why ensemble models tend to deliver superior results
  • Use bootstrap aggregation to address the overfitting challenges of decision trees
  • Train, tune, and interpret random forests
  • Employ a random forest to design and evaluate a profitable trading strategy

12 Boosting your Trading Strategy

Gradient boosting is an alternative tree-based ensemble algorithm that often produces better results than random forests. The critical difference is that boosting modifies the data used to train each tree based on the cumulative errors made by the model. While random forests train many trees independently using random subsets of the data, boosting proceeds sequentially and reweights the data. This chapter shows how state-of-the-art libraries achieve impressive performance and apply boosting to both daily and high-frequency data to backtest an intraday trading strategy.

More specifically, we will cover the following topics:

  • How does boosting differ from bagging, and how did gradient boosting evolve from adaptive boosting,
  • Design and tune adaptive and gradient boosting models with scikit-learn,
  • Build, optimize, and evaluate gradient boosting models on large datasets with the state-of-the-art implementations XGBoost, LightGBM, and CatBoost,
  • Interpreting and gaining insights from gradient boosting models using SHAP values, and
  • Using boosting with high-frequency data to design an intraday strategy.

13 Data-Driven Risk Factors and Asset Allocation with Unsupervised Learning

Dimensionality reduction and clustering are the main tasks for unsupervised learning:

  • Dimensionality reduction transforms the existing features into a new, smaller set while minimizing the loss of information. A broad range of algorithms exists that differ by how they measure the loss of information, whether they apply linear or non-linear transformations or the constraints they impose on the new feature set.
  • Clustering algorithms identify and group similar observations or features instead of identifying new features. Algorithms differ in how they define the similarity of observations and their assumptions about the resulting groups.

More specifically, this chapter covers:

  • How principal and independent component analysis (PCA and ICA) perform linear dimensionality reduction
  • Identifying data-driven risk factors and eigenportfolios from asset returns using PCA
  • Effectively visualizing nonlinear, high-dimensional data using manifold learning
  • Using T-SNE and UMAP to explore high-dimensional image data
  • How k-means, hierarchical, and density-based clustering algorithms work
  • Using agglomerative clustering to build robust portfolios with hierarchical risk parity

Part 3: Natural Language Processing for Trading

Text data are rich in content, yet unstructured in format and hence require more preprocessing so that a machine learning algorithm can extract the potential signal. The critical challenge consists of converting text into a numerical format for use by an algorithm, while simultaneously expressing the semantics or meaning of the content.

The next three chapters cover several techniques that capture language nuances readily understandable to humans so that machine learning algorithms can also interpret them.

14 Text Data for Trading: Sentiment Analysis

Text data is very rich in content but highly unstructured so that it requires more preprocessing to enable an ML algorithm to extract relevant information. A key challenge consists of converting text into a numerical format without losing its meaning. This chapter shows how to represent documents as vectors of token counts by creating a document-term matrix that, in turn, serves as input for text classification and sentiment analysis. It also introduces the Naive Bayes algorithm and compares its performance to linear and tree-based models.

In particular, in this chapter covers:

  • What the fundamental NLP workflow looks like
  • How to build a multilingual feature extraction pipeline using spaCy and TextBlob
  • Performing NLP tasks like part-of-speech tagging or named entity recognition
  • Converting tokens to numbers using the document-term matrix
  • Classifying news using the naive Bayes model
  • How to perform sentiment analysis using different ML algorithms

15 Topic Modeling: Summarizing Financial News

This chapter uses unsupervised learning to model latent topics and extract hidden themes from documents. These themes can generate detailed insights into a large corpus of financial reports. Topic models automate the creation of sophisticated, interpretable text features that, in turn, can help extract trading signals from extensive collections of texts. They speed up document review, enable the clustering of similar documents, and produce annotations useful for predictive modeling. Applications include identifying critical themes in company disclosures, earnings call transcripts or contracts, and annotation based on sentiment analysis or using returns of related assets.

More specifically, it covers:

  • How topic modeling has evolved, what it achieves, and why it matters
  • Reducing the dimensionality of the DTM using latent semantic indexing
  • Extracting topics with probabilistic latent semantic analysis (pLSA)
  • How latent Dirichlet allocation (LDA) improves pLSA to become the most popular topic model
  • Visualizing and evaluating topic modeling results -
  • Running LDA using scikit-learn and gensim
  • How to apply topic modeling to collections of earnings calls and financial news articles

16 Word embeddings for Earnings Calls and SEC Filings

This chapter uses neural networks to learn a vector representation of individual semantic units like a word or a paragraph. These vectors are dense with a few hundred real-valued entries, compared to the higher-dimensional sparse vectors of the bag-of-words model. As a result, these vectors embed or locate each semantic unit in a continuous vector space.

Embeddings result from training a model to relate tokens to their context with the benefit that similar usage implies a similar vector. As a result, they encode semantic aspects like relationships among words through their relative location. They are powerful features that we will use with deep learning models in the following chapters.

More specifically, in this chapter, we will cover:

  • What word embeddings are and how they capture semantic information
  • How to obtain and use pre-trained word vectors
  • Which network architectures are most effective at training word2vec models
  • How to train a word2vec model using TensorFlow and gensim
  • Visualizing and evaluating the quality of word vectors
  • How to train a word2vec model on SEC filings to predict stock price moves
  • How doc2vec extends word2vec and helps with sentiment analysis
  • Why the transformerโ€™s attention mechanism had such an impact on NLP
  • How to fine-tune pre-trained BERT models on financial data

Part 4: Deep & Reinforcement Learning

Part four explains and demonstrates how to leverage deep learning for algorithmic trading. The powerful capabilities of deep learning algorithms to identify patterns in unstructured data make it particularly suitable for alternative data like images and text.

The sample applications show, for exapmle, how to combine text and price data to predict earnings surprises from SEC filings, generate synthetic time series to expand the amount of training data, and train a trading agent using deep reinforcement learning. Several of these applications replicate research recently published in top journals.

17 Deep Learning for Trading

This chapter presents feedforward neural networks (NN) and demonstrates how to efficiently train large models using backpropagation while managing the risks of overfitting. It also shows how to use TensorFlow 2.0 and PyTorch and how to optimize a NN architecture to generate trading signals. In the following chapters, we will build on this foundation to apply various architectures to different investment applications with a focus on alternative data. These include recurrent NN tailored to sequential data like time series or natural language and convolutional NN, particularly well suited to image data. We will also cover deep unsupervised learning, such as how to create synthetic data using Generative Adversarial Networks (GAN). Moreover, we will discuss reinforcement learning to train agents that interactively learn from their environment.

In particular, this chapter will cover

  • How DL solves AI challenges in complex domains
  • Key innovations that have propelled DL to its current popularity
  • How feedforward networks learn representations from data
  • Designing and training deep neural networks (NNs) in Python
  • Implementing deep NNs using Keras, TensorFlow, and PyTorch
  • Building and tuning a deep NN to predict asset returns
  • Designing and backtesting a trading strategy based on deep NN signals

18 CNN for Financial Time Series and Satellite Images

CNN architectures continue to evolve. This chapter describes building blocks common to successful applications, demonstrates how transfer learning can speed up learning, and how to use CNNs for object detection. CNNs can generate trading signals from images or time-series data. Satellite data can anticipate commodity trends via aerial images of agricultural areas, mines, or transport networks. Camera footage can help predict consumer activity; we show how to build a CNN that classifies economic activity in satellite images. CNNs can also deliver high-quality time-series classification results by exploiting their structural similarity with images, and we design a strategy based on time-series data formatted like images.

More specifically, this chapter covers:

  • How CNNs employ several building blocks to efficiently model grid-like data
  • Training, tuning and regularizing CNNs for images and time series data using TensorFlow
  • Using transfer learning to streamline CNNs, even with fewer data
  • Designing a trading strategy using return predictions by a CNN trained on time-series data formatted like images
  • How to classify economic activity based on satellite images

19 RNN for Multivariate Time Series and Sentiment Analysis

Recurrent neural networks (RNNs) compute each output as a function of the previous output and new data, effectively creating a model with memory that shares parameters across a deeper computational graph. Prominent architectures include Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) that address the challenges of learning long-range dependencies. RNNs are designed to map one or more input sequences to one or more output sequences and are particularly well suited to natural language. They can also be applied to univariate and multivariate time series to predict market or fundamental data. This chapter covers how RNN can model alternative text data using the word embeddings that we covered in Chapter 16 to classify the sentiment expressed in documents.

More specifically, this chapter addresses:

  • How recurrent connections allow RNNs to memorize patterns and model a hidden state
  • Unrolling and analyzing the computational graph of RNNs
  • How gated units learn to regulate RNN memory from data to enable long-range dependencies
  • Designing and training RNNs for univariate and multivariate time series in Python
  • How to learn word embeddings or use pretrained word vectors for sentiment analysis with RNNs
  • Building a bidirectional RNN to predict stock returns using custom word embeddings

20 Autoencoders for Conditional Risk Factors and Asset Pricing

This chapter shows how to leverage unsupervised deep learning for trading. We also discuss autoencoders, namely, a neural network trained to reproduce the input while learning a new representation encoded by the parameters of a hidden layer. Autoencoders have long been used for nonlinear dimensionality reduction, leveraging the NN architectures we covered in the last three chapters. We replicate a recent AQR paper that shows how autoencoders can underpin a trading strategy. We will use a deep neural network that relies on an autoencoder to extract risk factors and predict equity returns, conditioned on a range of equity attributes.

More specifically, in this chapter you will learn about:

  • Which types of autoencoders are of practical use and how they work
  • Building and training autoencoders using Python
  • Using autoencoders to extract data-driven risk factors that take into account asset characteristics to predict returns

21 Generative Adversarial Nets for Synthetic Time Series Data

This chapter introduces generative adversarial networks (GAN). GANs train a generator and a discriminator network in a competitive setting so that the generator learns to produce samples that the discriminator cannot distinguish from a given class of training data. The goal is to yield a generative model capable of producing synthetic samples representative of this class. While most popular with image data, GANs have also been used to generate synthetic time-series data in the medical domain. Subsequent experiments with financial data explored whether GANs can produce alternative price trajectories useful for ML training or strategy backtests. We replicate the 2019 NeurIPS Time-Series GAN paper to illustrate the approach and demonstrate the results.

More specifically, in this chapter you will learn about:

  • How GANs work, why they are useful, and how they could be applied to trading
  • Designing and training GANs using TensorFlow 2
  • Generating synthetic financial data to expand the inputs available for training ML models and backtesting

22 Deep Reinforcement Learning: Building a Trading Agent

Reinforcement Learning (RL) models goal-directed learning by an agent that interacts with a stochastic environment. RL optimizes the agent's decisions concerning a long-term objective by learning the value of states and actions from a reward signal. The ultimate goal is to derive a policy that encodes behavioral rules and maps states to actions. This chapter shows how to formulate and solve an RL problem. It covers model-based and model-free methods, introduces the OpenAI Gym environment, and combines deep learning with RL to train an agent that navigates a complex environment. Finally, we'll show you how to adapt RL to algorithmic trading by modeling an agent that interacts with the financial market while trying to optimize an objective function.

More specifically,this chapter will cover:

  • Define a Markov decision problem (MDP)
  • Use value and policy iteration to solve an MDP
  • Apply Q-learning in an environment with discrete states and actions
  • Build and train a deep Q-learning agent in a continuous environment
  • Use the OpenAI Gym to design a custom market environment and train an RL agent to trade stocks

23 Conclusions and Next Steps

In this concluding chapter, we will briefly summarize the essential tools, applications, and lessons learned throughout the book to avoid losing sight of the big picture after so much detail. We will then identify areas that we did not cover but would be worth focusing on as you expand on the many machine learning techniques we introduced and become productive in their daily use.

In sum, in this chapter, we will

  • Review key takeaways and lessons learned
  • Point out the next steps to build on the techniques in this book
  • Suggest ways to incorporate ML into your investment process

24 Appendix - Alpha Factor Library

Throughout this book, we emphasized how the smart design of features, including appropriate preprocessing and denoising, typically leads to an effective strategy. This appendix synthesizes some of the lessons learned on feature engineering and provides additional information on this vital topic.

To this end, we focus on the broad range of indicators implemented by TA-Lib (see Chapter 4) and WorldQuant's 101 Formulaic Alphas paper (Kakushadze 2016), which presents real-life quantitative trading factors used in production with an average holding period of 0.6-6.4 days.

This chapter covers:

  • How to compute several dozen technical indicators using TA-Lib and NumPy/pandas,
  • Creating the formulaic alphas describe in the above paper, and
  • Evaluating the predictive quality of the results using various metrics from rank correlation and mutual information to feature importance, SHAP values and Alphalens.

machine-learning-for-trading's People

Contributors

franz101 avatar geoffworks17 avatar jmprathab avatar karlosq avatar leehbi avatar mg-ding avatar minggnim avatar norbertraus avatar ryanrussell avatar sherrytp avatar ssilverac avatar stefan-jansen avatar thebetauser avatar tomas-rampas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

machine-learning-for-trading's Issues

issue when setting up new environment using yml file provided

Hello
I have tried to create an environment using the environment.yml file provided but I get the following error:
Solving environment: failed
ResolvePackageNotFound:
gcc_linux-64=7.2.0
binutils_impl_linux-64=2.28.1
gxx_linux-64=7.2.0
gst-plugins-base=1.14.0
gstreamer=1.14.0
gmp=6.1.2
pango=1.42.4
dbus=1.13.2
gcc_impl_linux-64=7.2.0
binutils_linux-64=7.2.0
gxx_impl_linux-64=7.2.0
ncurses=6.1
libgcc-ng=8.2.0
libstdcxx-ng=8.2.0
libuuid=1.0.3
readline=7.0
expat=2.2.6
fribidi=1.0.5
libgfortran-ng=7.3.0
graphviz=2.40.1
libedit=3.1.20170329
is there any way I can fix this? thanks for the help

Issues with datetime in 02_rebuild_nasdaq_order_book.ipynb

I've got an issue when converting timestamp to integer so changed:
buy_per_min.timestamp = buy_per_min.timestamp.add(utc_offset).values.astype(int)
Same for sell_per_min and trades_per_min
But now I receive error
OSError: [Errno 22] Invalid argument
in the last plot in the line
xticks = [datetime.fromtimestamp(ts / 1e9).strftime('%H:%M') for ts in ax.get_xticks()]
as all timestamps are negative.

earnings to csv

Good morning,
I correctly run the sa_selenium.py on spyder
The code correctly scrap the seeking alpha website but I can't save the results to csv file
I have write my folder path in rows 23, 24, 25
I have modify row 32 using 'html.parser'
I have add to row 89 the geckodriver path in folder
I have modify rows 114 and 115

I think I'm wrong with the csv paths or files declaring please help me to solv that problem
Ty Ale

import re
from pathlib import Path
from random import random
from time import sleep
from urllib.parse import urljoin

import pandas as pd
from bs4 import BeautifulSoup
from furl import furl
from selenium import webdriver

transcript_path = Path('transcripts')

def store_result(meta, participants, content):
    """Save parse content to csv"""
    path = transcript_path / 'parsed' / meta['symbol']
    if not path.exists():
        path.mkdir(parents=True, exist_ok=True)
    pd.DataFrame(content, columns=['speaker', 'q&a', 'content']).to_csv(path / '/Users/alessiomontani/Documents/0_Python/CODE/Machine-Learning-for-Algorithmic-Trading-Second-Edition-master/03_alternative_data/02_earnings_calls copy/content.csv', index=False)
    pd.DataFrame(participants, columns=['type', 'name']).to_csv(path / '/Users/alessiomontani/Documents/0_Python/CODE/Machine-Learning-for-Algorithmic-Trading-Second-Edition-master/03_alternative_data/02_earnings_calls copy/participants.csv', index=False)
    pd.Series(meta).to_csv(path / '/Users/alessiomontani/Documents/0_Python/CODE/Machine-Learning-for-Algorithmic-Trading-Second-Edition-master/03_alternative_data/02_earnings_calls copy/earnings.csv')


def parse_html(html):
    """Main html parser function"""
    date_pattern = re.compile(r'(\d{2})-(\d{2})-(\d{2})')
    quarter_pattern = re.compile(r'(\bQ\d\b)')
    soup = BeautifulSoup(html, 'html.parser')

    meta, participants, content = {}, [], []
    h1 = soup.find('h1', itemprop='headline')
    if h1 is None:
        return
    h1 = h1.text
    meta['company'] = h1[:h1.find('(')].strip()
    meta['symbol'] = h1[h1.find('(') + 1:h1.find(')')]

    title = soup.find('div', class_='title')
    if title is None:
        return
    title = title.text
    print(title)
    match = date_pattern.search(title)
    if match:
        m, d, y = match.groups()
        meta['month'] = int(m)
        meta['day'] = int(d)
        meta['year'] = int(y)

    match = quarter_pattern.search(title)
    if match:
        meta['quarter'] = match.group(0)

    qa = 0
    speaker_types = ['Executives', 'Analysts']
    for header in [p.parent for p in soup.find_all('strong')]:
        text = header.text.strip()
        if text.lower().startswith('copyright'):
            continue
        elif text.lower().startswith('question-and'):
            qa = 1
            continue
        elif any([type in text for type in speaker_types]):
            for participant in header.find_next_siblings('p'):
                if participant.find('strong'):
                    break
                else:
                    participants.append([text, participant.text])
        else:
            p = []
            for participant in header.find_next_siblings('p'):
                if participant.find('strong'):
                    break
                else:
                    p.append(participant.text)
            content.append([header.text, qa, '\n'.join(p)])
    return meta, participants, content


SA_URL = 'https://seekingalpha.com/'
TRANSCRIPT = re.compile('Earnings Call Transcript')

next_page = True
page = 1
driver = webdriver.Firefox(executable_path='/Users/alessiomontani/Documents/0_Python/CODE/Machine-Learning-for-Algorithmic-Trading-Second-Edition-master/03_alternative_data/02_earnings_calls copy/geckodriver')
while next_page:
    print(f'Page: {page}')
    url = f'{SA_URL}/earnings/earnings-call-transcripts/{page}'
    driver.get(urljoin(SA_URL, url))
    sleep(8 + (random() - .5) * 2)
    response = driver.page_source
    page += 1
    soup = BeautifulSoup(response, 'html.parser')
    links = soup.find_all(name='a', string=TRANSCRIPT)
    if len(links) == 0:
        next_page = False
    else:
        for link in links:
            transcript_url = link.attrs.get('href')
            article_url = furl(urljoin(SA_URL, transcript_url)).add({'part': 'single'})
            driver.get(article_url.url)
            html = driver.page_source
            result = parse_html(html)
            if result is not None:
                meta, participants, content = result
                meta['link'] = link
                store_result(meta, participants, content)
                sleep(8 + (random() - .5) * 2)

driver.close()
earnings = pd.read_csv('earnings.csv')
#print(earnings)

searching alpha- featuring engineering

Hi Stefen,
I have gone trough the algo, (lagged returns, etc) but at the end I did not understand how you have identified the alpha factor. I m basically missing the conclusion.

thks Angelo

zipline backtest with_pf_optimization error

chapter 5 - 05_strategy_evaluation
02_backtest_with_pf_optimization.ipynb throws error - Cannot compare tz-naive and tz-aware timestamps

backtest = run_algorithm(start=start,
                         end=end,
                         initialize=initialize,
                         before_trading_start=before_trading_start,
                         bundle='quandl',
                         capital_base=capital_base)

`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in
4 before_trading_start=before_trading_start,
5 bundle='quandl',
----> 6 capital_base=capital_base)
7
8 # backtest = run_algorithm(start=start,

C:\ProgramData\Anaconda3\envs\env_zipline\lib\site-packages\zipline\utils\run_algo.py in run_algorithm(start, end, initialize, capital_base, handle_data, before_trading_start, analyze, data_frequency, data, bundle, bundle_timestamp, trading_calendar, metrics_set, default_extension, extensions, strict_extensions, environ, blotter)
428 local_namespace=False,
429 environ=environ,
--> 430 blotter=blotter,
431 )

C:\ProgramData\Anaconda3\envs\env_zipline\lib\site-packages\zipline\utils\run_algo.py in _run(handle_data, initialize, before_trading_start, analyze, algofile, algotext, defines, data_frequency, capital_base, data, bundle, bundle_timestamp, start, end, output, trading_calendar, print_algo, metrics_set, local_namespace, environ, blotter)
212 capital_base=capital_base,
213 data_frequency=data_frequency,
--> 214 trading_calendar=trading_calendar,
215 ),
216 metrics_set=metrics_set,

C:\ProgramData\Anaconda3\envs\env_zipline\lib\site-packages\zipline\utils\factory.py in create_simulation_parameters(year, start, end, capital_base, num_days, data_frequency, emission_rate, trading_calendar)
60 data_frequency=data_frequency,
61 emission_rate=emission_rate,
---> 62 trading_calendar=trading_calendar,
63 )
64

C:\ProgramData\Anaconda3\envs\env_zipline\lib\site-packages\zipline\finance\trading.py in init(self, start_session, end_session, trading_calendar, capital_base, emission_rate, data_frequency, arena)
146 assert start_session <= end_session,
147 "Period start falls after period end."
--> 148 assert start_session <= trading_calendar.last_trading_session,
149 "Period start falls after the last known trading day."
150 assert end_session >= trading_calendar.first_trading_session, \

pandas/_libs/tslib.pyx in pandas._libs.tslib._Timestamp.richcmp()

pandas/_libs/tslib.pyx in pandas._libs.tslib._Timestamp._assert_tzawareness_compat()

TypeError: Cannot compare tz-naive and tz-aware timestamps`
Additional details:
Env - Windows 10
Python - python=3.5.6
Pandas - pandas=0.22.0

Misclassification DataFrame is badly composed in LDA notebook

The week 7 practice notebook 04_lda_with_sklearn.ipynb contains a badly compiled pandas DataFrame. It groups the test set predictions by ground truth topic, which reorders the rows. However, it leaves the headings and articles in their original, ungrouped order as it concatenates them into the DataFrame. The rows are thus jumbled chaotically.

test_assignments = test_opt_eval.groupby(level='topic').idxmax(axis=1)
test_assignments = test_assignments.reset_index(-1, drop=True).to_frame('predicted').reset_index()
test_assignments['heading'] = test_docs.heading.values
test_assignments['article'] = test_docs.article.values
test_assignments.head(6)

# Output:
#      topic predicted  heading                           article
# 0 Business Topic 4    Kilroy launches 'Veritas' party   Ex-BBC chat show host and East Midlands MEP R...
# 1 Business Topic 4    Radcliffe eyes hard line on drugs Paula Radcliffe has called for all athletes f...
# 2 Business Topic 4    S Korean consumers spending again South Korea looks set to sustain its revival ...
# 3 Business Topic 4    Quiksilver moves for Rossignol    Shares of Skis Rossignol, the world's largest...
# 4 Business Topic 4    Britons fed up with net service   A survey conducted by PC Pro Magazine has rev...
# 5 Business Topic 4    Scissor Sisters triumph at Brits  US band Scissor Sisters led the winners at th.

According to the code's output, the ground truth classification for all 6 articles is "Business." However, the classifications are in fact "Politics," "Sport," "Business," "Business," "Tech," and "Entertainment."

Here's code that fixes the problem:

test_assignments = pd.DataFrame(test_eval.idxmax(axis=1), columns=["prediction"])
test_assignments['heading'] = test_docs.heading.values
test_assignments['article'] = test_docs.article.values
test_assignments.head()

# Output
#    topic   prediction heading                           article
# Politics   Topic 5    Kilroy launches 'Veritas' party   Ex-BBC chat show host and East Midlands MEP R...
# Sport      Topic 3    Radcliffe eyes hard line on drugs Paula Radcliffe has called for all athletes f...
# Business   Topic 2    S Korean consumers spending again South Korea looks set to sustain its revival ...
# Business   Topic 2    Quiksilver moves for Rossignol    Shares of Skis Rossignol, the world's largest...
# Tech       Topic 4    Britons fed up with net service   A survey conducted by PC Pro Magazine has rev...

You have my permission to use my code and analysis; I would only request the courtesy of being credited for any contribution I make toward your final product. (Thanks!)

Cannot repeat with 01_parse_itch_order_flow_messages.ipynb

Cannot repeat 01_parse_itch_order_flow_messages.ipynb. Numerous errors, some could resolve by changing names of variables, but cannot in
The following code processes the binary file and produces the parsed orders stored by message type:

I receive the error:
error Traceback (most recent call last)
in
17 # read & store message
18 record = data.read(message_size - 1)
---> 19 message = message_fields[message_type]._make(unpack(fstring[message_type], record))
20 messages[message_type].append(message)
21

error: bad char in struct format

Getting UnsortedIndexError when trying to read/filter assets.h5 file in 04_alpha_factor_research/00_data/feature_engineering notebook

Hi:

I created the assets.h5 file with data/create_datasets.ipynb and it looks fine:

$ h5ls -f assets.h5 
/fred                    Group
/quandl                  Group
/sp500                   Group
/us_equities             Group

However, 04_alpha_factor_research/00_data/feature_engineering.ipynb throws the following error when it tries to read and filter the prices data set:

DATA_STORE = '../../data/assets.h5'
with pd.HDFStore(DATA_STORE) as store:
    prices = store['quandl/wiki/prices'].loc[idx['2000':'2018', :], 'adj_close'].unstack('ticker')
    stocks = store['us_equities/stocks'].loc[:, ['marketcap', 'ipoyear', 'sector']]
[...]
UnsortedIndexError: 'MultiIndex slicing requires the index to be lexsorted: slicing on levels [0], lexsort depth 0'

The prices data seems fine -- it just appears to be the filtering which is breaking.

I am still fairly new to Pandas, but I got the filter to work by explicitly creating a date range:

prices = store['quandl/wiki/prices'].loc[ pd.date_range(start='1/1/2000', end='12/31/2018'), idx['adj_close'] ].unstack('ticker')

Thanks!
Jeffrey

01_latent_semantic_indexing - bug

when running this code I keep getting this error.

docs = pd.DataFrame(doc_list, columns=['Category', 'Heading', 'Article'])
docs.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2226 entries, 0 to 2225
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Category  2226 non-null   object
 1   Heading   2226 non-null   object
 2   Article   2226 non-null   object
dtypes: object(3)
memory usage: 52.3+ KB
train_docs, test_docs = train_test_split(docs,
                                         stratify=docs.Category,
                                         test_size=50,
                                         random_state=42)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-7ce1db248c17> in <module>
----> 1 train_docs, test_docs = train_test_split(docs,
      2                                          stratify=docs.Category,
      3                                          test_size=50,
      4                                          random_state=42)

~\miniconda3\envs\torch\lib\site-packages\sklearn\model_selection\_split.py in train_test_split(*arrays, **options)
   2150                      random_state=random_state)
   2151 
-> 2152         train, test = next(cv.split(X=arrays[0], y=stratify))
   2153 
   2154     return list(chain.from_iterable((_safe_indexing(a, train),

~\miniconda3\envs\torch\lib\site-packages\sklearn\model_selection\_split.py in split(self, X, y, groups)
   1339         """
   1340         X, y, groups = indexable(X, y, groups)
-> 1341         for train, test in self._iter_indices(X, y, groups):
   1342             yield train, test
   1343 

~\miniconda3\envs\torch\lib\site-packages\sklearn\model_selection\_split.py in _iter_indices(self, X, y, groups)
   1666         class_counts = np.bincount(y_indices)
   1667         if np.min(class_counts) < 2:
-> 1668             raise ValueError("The least populated class in y has only 1"
   1669                              " member, which is too few. The minimum"
   1670                              " number of groups for any class cannot"

ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

Issue in downloading stooq data

Hello Stefan,
I am getting the following error in create_stooq_data.ipynb while trying to download US assets from stooq. Please note I am on windows 10. Appreciate your help.
Thanks
Sabir

tse stocks
stooq/data/daily/jp/tse stocks
500
1000
1500
2000
2500
3000
3500

No. of observations per asset
count    3672.000000
mean     2804.119281
std      1176.615453
min         1.000000
25%      2146.000000
50%      3041.000000
75%      3621.000000
max      4905.000000
dtype: float64
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 10296726 entries, (1301.JP, 2005-03-22 00:00:00) to (9997.JP, 2019-12-30 00:00:00)
Data columns (total 5 columns):
open      10296726 non-null float64
high      10296726 non-null float64
low       10296726 non-null float64
close     10296726 non-null float64
volume    10296726 non-null int64
dtypes: float64(4), int64(1)
memory usage: 432.1+ MB
None
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3719 entries, 0 to 3718
Data columns (total 2 columns):
ticker    3719 non-null object
name      3719 non-null object
dtypes: object(2)
memory usage: 58.2+ KB
None

nasdaq etfs
stooq/data/daily/us/nasdaq etfs
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-22-0e99a64c5acf> in <module>
     11         prices, tickers = get_stooq_prices_and_tickers(frequency=frequency, 
     12                                                        market=market,
---> 13                                                        asset_class=asset_class)
     14 
     15         prices = prices.sort_index().loc[idx[:, '2000': '2019'], :]

<ipython-input-21-65e60f1965c0> in get_stooq_prices_and_tickers(frequency, market, asset_class)
     39 
     40 #     print(prices)
---> 41     prices = (pd.concat(prices, ignore_index=True)
     42               .rename(columns=str.lower)
     43               .set_index(['ticker', date_label])

~\AppData\Roaming\Python\Python37\site-packages\pandas\core\reshape\concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    253         verify_integrity=verify_integrity,
    254         copy=copy,
--> 255         sort=sort,
    256     )
    257 

~\AppData\Roaming\Python\Python37\site-packages\pandas\core\reshape\concat.py in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy, sort)
    302 
    303         if len(objs) == 0:
--> 304             raise ValueError("No objects to concatenate")
    305 
    306         if keys is None:

ValueError: No objects to concatenate

missing csv file

The file data/create_datasets.ipynb has the following line

df = pd.read_csv('us_equities_meta_data.csv')

However, this csv file is not present. I can't seem to find it anywhere in the distribution. Can you tell me where to locate it?

Regarding the Russian translation

Hi, Stefan.

I'm working on a Russian translation of your book for a Saint-Petersburg Publ. and found some omissions in the text of ch 17 on DL (formulas) and ch. 18 on CNN (picture #1) for filters so far.
Can you indicate, where to find the amended versions of the chapters or advise on what to do?

By the way, I found your book very enlightening, an excellent add-on to the Advances in FML by Marcos de Prado.

B.R.,
Andrey Logunov

.numpy() in RL

Are there source codes for this particular function?
Just giving us an environment.yml and not explaining how the code works don't seem like a good way to learn and generalize...

Installation

when I run docker run -it -v $(pwd):/home/packt/ml4t -p 8888:8888 -e QUANDL_API_KEY=myapi*- --name ml4t appliedai/packt:latest bash

from command line after installing docker desktop and allocating 4gb ram i get this:

docker: error during connect: Post https://192.168.99.100:2376/v1.40/containers/create?name=ml4t: dial tcp 192.168.99.100:2376: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
See 'docker run --help'.

any help to get me on the right track would be greatly appreciated, so far i've just been trying to run the code in pycharm or jupyter notebooks with abit of trouble

docker install in GCP

Hi,
I'm trying to setup environment in GCP and AWS because my laptop power is small.
I succeeded in docker run.
But when it comes to 'zipline ingest', I can do on AWS, while in GCP there are error statement as below. .

PermissionError: [Errno 13] Permission denied: '/home/packt/ml4t/data/.zipline'

Do I have any solution?

KR

Open Table Spyder MAC IOS problem

Good morning,
I'm trying to run the spyder files about open table folders, I have a mac with Catalina and I have a problem with geckodriver, it seems there is a bug
https://firefox-source-docs.mozilla.org/testing/geckodriver/Notarization.html
mozilla/geckodriver#1629

I have installed geckodrive with homebrew but it doesn't work anyway
I receive this spyder console error: WebDriverException: 'geckodriver' executable needs to be in PATH.

Please help me to resolve this issue
Ty Ale

zipline data ingest for backtesting

Hi I tried running the zipline backtest in chapter 12, and upon following the notebook code by code, I came across the following error:

"ValueError: Failed to find any assets with country_code 'US' that traded between 2016-01-13 00:00:00+00:00 and 2016-01-21 00:00:00+00:00.
This probably means that your asset db is old or that it has incorrect country/exchange metadata."

This happened to be so for any stard_date I tried, the first week will always raise this ValueError.

I also followed the book's instruction on ingesting quandl data.
Any suggestions?
Thanks in advance!

Also, for chapter 11, how is the 'stooq' bundle used for backtesting ingested?

ISSUE WITH

20_autoencoders_for_conditional_risk_factors/06_conditional_autoencoder_for_asset_pricing_model.ipynb

HI STEFAN , I WANT TO THANK YOU FIRST FOR THE NICE WEBINAR FEW DAYS BACK AIRED ON YOUTUBE ,HOWVEVER I RAN ON TO THE BELOW ERROR (LIST INDEX OUT OF THE RANGE ) AND HERE ARE SOME SCREENSHOTS
Screenshot from 2020-10-13 22-36-51
Screenshot from 2020-10-13 22-36-24
YOU WILL NOTICE THAT IN THE SECOND SCREEN SHOT THAT THE ERROR IS COMING FROM THE LINES OR DATES WHICH ONE OF THEM IS OUT OF LINE WITH THE OTHER BETWEEN TRAINING AND TESTING
YOUR INPUT IS HIGHLY APPRECIATED
BEST REGARDS

07_linear_models - Ridge and Lasso regression runs forever

Hi Stefan,
Ridge and Lasso regression take lot to time to execute with a big list of warning
Ridge Regression: Wall time: 25min 26s
Lasso: My laptop stop responding almost after 1 hrs and I have to reboot it

My Env:
Windows 10
System Type: x64-based PC
Processor(s): 1 Processor(s) Installed.
[01]: Intel64 Family 6 Model 78 Stepping 3 GenuineIntel ~2601 Mhz
Python: Python 3.6.10
Virtual Env: ml4trading
Warnings:

C:\ProgramData\Anaconda3\envs\ml4trading\lib\site-packages\sklearn\pipeline.py:331: DataConversionWarning: Data with input dtype uint8, float64 were all converted to float64 by StandardScaler.
  Xt = transform.transform(Xt)
C:\ProgramData\Anaconda3\envs\ml4trading\lib\site-packages\sklearn\preprocessing\data.py:625: DataConversionWarning: Data with input dtype uint8, float64 were all converted to float64 by StandardScaler.
  return self.partial_fit(X, y)
C:\ProgramData\Anaconda3\envs\ml4trading\lib\site-packages\sklearn\base.py:465: DataConversionWarning: Data with input dtype uint8, float64 were all converted to float64 by StandardScaler.
  return self.fit(X, y, **fit_params).transform(X)
C:\ProgramData\Anaconda3\envs\ml4trading\lib\site-packages\sklearn\linear_model\coordinate_descent.py:492: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\ProgramData\Anaconda3\envs\ml4trading\lib\site-packages\sklearn\pipeline.py:331: DataConversionWarning: Data with input dtype uint8, float64 were all converted to float64 by StandardScaler.
  Xt = transform.transform(Xt)
......


Ch11_06_alphalens_signals_quality

Hi,

In Ch11, Notebook 06_alphalens_signals_quality.

test_tickers = best_predictions.index.get_level_values('ticker').unique()
trade_prices = get_trade_prices(test_tickers)
trade_prices.info()

Variable best_predictions is not defined before its usage. Please advice. Thanks!

Cannot access Jupyter Notebook from Dockers container.

Thank you for sharing this awesome work.

I am using windows and following the dockers workflow, currently i am stuck at step 7 on opening the jupyter notebook.

(base) packt@1154ebfb6453:~/ ml4t$ conda activate ml4t

(ml4t) packt@1154ebfb6453:~/ml4t$ jupyter notebook --ip 0.0.0.0 --no-browser --allow-root
[I 09:37:37.174 NotebookApp] [nb_conda_kernels] enabled, 4 kernels found
[I 09:37:37.184 NotebookApp] Writing notebook server cookie secret to
/home/packt/.local/share/jupyter/runtime/notebook_cookie_secret
[I 09:37:39.115 NotebookApp] JupyterLab extension loaded from /opt/conda/envs/ml4t/lib/python3.7/site-packages/jupyterlab
[I 09:37:39.115 NotebookApp] JupyterLab application directory is /opt/conda/envs/ml4t/share/jupyter/lab
[I 09:37:39.118 NotebookApp] Serving notebooks from local directory: /home/packt/ml4t
[I 09:37:39.118 NotebookApp] The Jupyter Notebook is running at:
[I 09:37:39.118 NotebookApp] http://1154ebfb6453:8888/?token=ffbf0a92a3c46f4d3cbbffab66a66d144dc3a23c0a93bb6f
[I 09:37:39.118 NotebookApp] or http://127.0.0.1:8888/?token=ffbf0a92a3c46f4d3cbbffab66a66d144dc3a23c0a93bb6f
[I 09:37:39.118 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 09:37:39.122 NotebookApp]

To access the notebook, open this file in a browser:
    file:///home/packt/.local/share/jupyter/runtime/nbserver-296-open.html
Or copy and paste one of these URLs:
    http://1154ebfb6453:8888/?token=ffbf0a92a3c46f4d3cbbffab66a66d144dc3a23c0a93bb6f
 or http://127.0.0.1:8888/?token=ffbf0a92a3c46f4d3cbbffab66a66d144dc3a23c0a93bb6f

i copy and paste this url http://127.0.0.1:8888/?token=ffbf0a92a3c46f4d3cbbffab66a66d144dc3a23c0a93bb6f on the browser to open up jupyter notebook but was prompted to an authentication page as follows

image

I have tried the following but still fail to access the jupyternotebook

  • entering the authentication token code ffbf0a92a3c46f4d3cbbffab66a66d144dc3a23c0a93bb6f

  • running this command before running jupyter notebook --ip 0.0.0.0 --no-browser --allow-root jupyter notebook --NotebookApp.token='' to clear authentication

Thank you for your help.

ResolvePackageNotFound: error (window os)

Hello. I recently bought a book(first edition) and am reading it well. Thank you.

I'm going to practice the code.

Using 'conda env create -f environment.yml'

I was trying to install my environment, but the following error occurred:

What should I do? (I am a Windows os user.)

`ResolvePackageNotFound:

binutils_impl_linux-64=2.28.1
gxx_impl_linux-64=7.2.0
gxx_linux-64=7.2.0
libgcc-ng=8.2.0
libstdcxx-ng=8.2.0
readline=7.0
gcc_linux-64=7.2.0
gmp=6.1.2
libuuid=1.0.3
gstreamer=1.14.0
graphviz=2.40.1
dbus=1.13.2
binutils_linux-64=7.2.0
expat=2.2.6
libgfortran-ng=7.3.0
gcc_impl_linux-64=7.2.0
ncurses=6.1
gst-plugins-base=1.14.0
libedit=3.1.20170329`

Installation \Environments\ no Windows path

Stefan - congrats on the new book. I am trying to replicate the environments you have provided but the directory has installations for Linux only. Zipline on the other hand is very fiddly. I will attempt the docker images but would have preferred to work on my local computer. Thanks for your custom and insights.

LDA model converges poorly due to bad hyperparameters

In the notebook lda_with_sklearn, the LDA model predicts only 3 topics for documents in 5 classes.

By performing hyperparameter optimization with sklearn.model_selection.GridSearchCV, I was able to determine the following:

  • Root cause is badly chosen min_df and max_df parameters to the TfidfVectorizer instance.
  • Setting max_df = 0.11 and min_df = 0.026 produces excellent results.

My code and analysis are in my public Github repo here. You have my permission to use my code and analysis; I would only request the courtesy of being credited for any contribution I make toward your final product. (Thanks!)

Error running NASDAQ File

AttributeError Traceback (most recent call last)
in
1 file_name = may_be_download(FTP_URL + SOURCE_FILE)
----> 2 date = file_name.split('.')[0]

AttributeError: 'PosixPath' object has no attribute 'split'

NameError Traceback (most recent call last)
in
----> 1 message_labels = (df.loc[:, ['message_type', 'notes']]
2 .dropna()
3 .rename(columns={'notes': 'name'}))
4 message_labels.name = (message_labels.name
5 .str.lower()

NameError: name 'df' is not defined

storing predictions code

The code on analyzing cross-validation section in rf and boosting chapters are very misleading and unclear. Why are there different labels on stored metrics and evaluated metrics on the hdf file? Why are different hdf files used when runningcv/storing and evaluating?

It seems like the author has run the cross-validation code snippet somewhere else and pasted it on the repo, resulting in different file and metric names?

Docker Installation

HI there,

I'm attempting to do a fresh install of the ml4t setup within docker desktop on windows 10 pro.
Having trouble with the docker run -it -v line you provided, even after downloading the directory and switching into that folder with "cd" whether it be the unzipped machine learning for trading/ folder or down to env/linux/ where the .yaml file is for zipline ect,

each time I run docker run -it -v $(pwd):/home/packt/ml4t -p 8888:8888 -e QUANDL_API_KEY= --name ml4t appliedai/packt:latest bash

i get back the system cannot find the file specified.

I have also linked the file within docker>resources

When I put in the docker getting started example "docker run" file into my cmd terminal I am able to get dockers "getting started" repo or whatever it is called no problem, but no matter what i try i just get system cannot find the file specified when trying to setup,

I've just wiped my whole machine to start fresh for this, I was thinking to set back up with linux mint because i was looking into someone elses issue of setting up linux no problem, but docker desktop isnt for linux so It just went with windows since i already upgrated to pro for it, but still facing this issue either way not sure what to do., I was trying to set up with andaconda last night but everytime I setup the zipline_env and activated I got hit with the giant list of missing dependancies so I was like okay I clearly need to setup the image in the docker environment to make it work, but not i just cant get the file located to do so,

Im super excited to get this set up though

I greatly appreciate your help with this and the insights you're sharing through the book I was wondering if you had a ebook or audio copy so I could listen as well to consumer the information in multiple ways, i just dont have any money to buy the ebook on packt right now or i would I could just see myself being able to slide in another 15-20 hours just listening to the book on top of what im already trying to consume, just thought id ask so i could consume in the most efficient amount of time as possible, this is my sole task/ goal right now.

Thanks again, have a good one.

Windows .yml file

Stefan,

Is it fair to say that readers of the book using windows should focus on the docker installation method? I tried creating a conda environment on a windows machine using the linux .yml file and got the below:

ResolvePackageNotFound:

libxkbcommon=0.10.0
libgcc-ng=9.2.0
gmp=6.2.0
gxx_impl_linux-64=7.5.0
libedit=3.1.20191231
gxx_linux-64=7.5.0
libgfortran-ng=7.5.0
ld_impl_linux-64=2.34
dbus=1.13.14
graphviz=2.42.3
nspr=4.25
libuuid=2.32.1
libxgboost=1.0.2
libtool=2.4.6
readline=8.0
gcc_linux-64=7.5.0
_openmp_mutex=4.5
nss=3.47
gst-plugins-base=1.14.5
gcc_impl_linux-64=7.5.0
ta-lib-base=0.4.0
binutils_linux-64=2.34
py-xgboost=1.0.2
libgomp=9.2.0
ncurses=6.2
libstdcxx-ng=9.2.0
gstreamer=1.14.5
xgboost=1.0.2
binutils_impl_linux-64=2.34
As you mentioned in an earlier comment, I deleted these from the linux file and tried again to create the conda environment but it appeared to result in way too many conflicts. When do you expect to have a tested windows .yml file (you mentioned this was something you were looking into) or should I just proceed with the docker installation approach?

Thanks for your help - I'm looking forward to working through the book!

create_message_spec.py

Hello Stefan,

I bought your book on Packt, I trying to do the exercises, but I am a bit confuse, where is the file create_message_spec.py? i didn't understand what to do.

Chapter 2 - 01_build_itch_order_book.ipynb

could you please share the notebook for pgportfolio

I've been working on deconstructing pgfortfolio which I found you pointed to in the read me in 22. reinforcement learning, i decided to wander off into the packt library and came across a new RL book releases sept 2020 and listened to it with a google chrome extension fairly quickly so it allowed me to gain some high inspiration for trader, where I actually wrote out my first hypothesis which was that reinforcement learning will allow me to learn the strategy which the agent uses so I could at least implement this for my own strategy and stop loosing money, as oct 1st i wasted away the last of my october funds on poor signals which gave me ultra motivation to commit to my fullest capability.

Anyways I know its not your code but i was really hoping you might be kind enough to share the code which you used for pgfortfolio to receive the 4 fold returns so i could play with it further and try to deconstruct it into my own system, it would really help if i could help it working from the start, i've been taking notes from their system for the majority of the week to understand the architecture i only tried to implement the code a couple times, but i have note figured it out yet as of now,
was really hoping you might be able to help me with a quick consult to get me on the right track so i could see what code you put in what order. so i could then pull up pgfortfolio in a jupyter notebook and disect it,

thanks so much once again

--> 145 return registry.make(id, **kwargs)

Hi i was wondering if you could point me in the right direction im getting this error, I thought it might have been something to do with loading quandl data but It seemed fine when I loaded in another notebook, I thought I had it all working then my computer went to sleep over night and it would not work anymore, I am in the ml4t-dl conda environment using docker, inside of jupyter lab, i had to first reinstall the tensorflow estimator last night from an error it was throwing then this was the next, I can't get past it, my quandl api key works well with other notebooks reloading, I thought you may be more familiar with this code than I am, thank you kindly

`INFO:trading_env:trading_env` logger started.
INFO:trading_env:loading data for AAPL...
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-10-f07228b2136b> in <module>
----> 1 trading_environment = gym.make('trading-v0')
      2 trading_environment.env.trading_days = trading_days
      3 trading_environment.env.trading_cost_bps = 1e-3
      4 trading_environment.env.time_cost_bps = 1e-4
      5 trading_environment.env.ticker = 'AAPL'

/opt/conda/envs/ml4t-dl/lib/python3.7/site-packages/gym/envs/registration.py in make(id, **kwargs)
    143 
    144 def make(id, **kwargs):
--> 145     return registry.make(id, **kwargs)
    146 
    147 def spec(id):

/opt/conda/envs/ml4t-dl/lib/python3.7/site-packages/gym/envs/registration.py in make(self, path, **kwargs)
     88             logger.info('Making new env: %s', path)
     89         spec = self.spec(path)
---> 90         env = spec.make(**kwargs)
     91         # We used to have people override _reset/_step rather than
     92         # reset/step. Set _gym_disable_underscore_compat = True on

/opt/conda/envs/ml4t-dl/lib/python3.7/site-packages/gym/envs/registration.py in make(self, **kwargs)
     58         else:
     59             cls = load(self.entry_point)
---> 60             env = cls(**_kwargs)
     61 
     62         # Make the environment aware of which spec it came from.

~/ml4t/22_deep_reinforcement_learning/trading_env.py in __init__(self, trading_days, trading_cost_bps, time_cost_bps, ticker)
    234         self.time_cost_bps = time_cost_bps
    235         self.data_source = DataSource(trading_days=self.trading_days,
--> 236                                       ticker=ticker)
    237         self.simulator = TradingSimulator(steps=self.trading_days,
    238                                           trading_cost_bps=self.trading_cost_bps,

~/ml4t/22_deep_reinforcement_learning/trading_env.py in __init__(self, trading_days, ticker, normalize)
     62         self.trading_days = trading_days
     63         self.normalize = normalize
---> 64         self.data = self.load_data()
     65         self.preprocess_data()
     66         self.min_values = self.data.min()

~/ml4t/22_deep_reinforcement_learning/trading_env.py in load_data(self)
     73         idx = pd.IndexSlice
     74         with pd.HDFStore('../data/assets.h5') as store:
---> 75             df = (store['quandl/wiki/prices']
     76                   .loc[idx[:, self.ticker],
     77                        ['adj_close', 'adj_volume', 'adj_low', 'adj_high']]

/opt/conda/envs/ml4t-dl/lib/python3.7/site-packages/pandas/io/pytables.py in __getitem__(self, key)
    551 
    552     def __getitem__(self, key: str):
--> 553         return self.get(key)
    554 
    555     def __setitem__(self, key: str, value):

/opt/conda/envs/ml4t-dl/lib/python3.7/site-packages/pandas/io/pytables.py in get(self, key)
    744         group = self.get_node(key)
    745         if group is None:
--> 746             raise KeyError(f"No object named {key} in the file")
    747         return self._read_group(group)
    748 

KeyError: 'No object named quandl/wiki/prices in the file'

link might not be correct

I cannot run https://github.com/stefan-jansen/machine-learning-for-trading/blob/master/data/create_datasets.ipynb

I get a KeyError: 'code' in the section were we download the Wiki Prices Metadata.

It seems like that the link (https://www.quandl.com/databases/WIKIP/documentation) in section 3.1 is also pointing towards the prices data set while it should point towards a data set with meta data. The referenced data set does not contain the required 'code' and 'name' columns.

Has the meta data been moved?

2nd edition publishing date

Hi Stefan,
What are the 2nd edition publishing and availability date? Is it the end of this month? Eagerly waiting for it.
Thanks

Create_stooq_data.ipynb error storing data

I was running create_stooq_data.ipynb

load some Japanese and all US assets for 2000-2019

markets = {'jp': ['tse stocks'],
'us': ['nasdaq etfs', 'nasdaq stocks', 'nyse etfs', 'nyse stocks', 'nysemkt stocks']
}
frequency = 'daily'

idx = pd.IndexSlice
for market, asset_classes in markets.items():
for asset_class in asset_classes:
print(f'\n{asset_class}')
prices, tickers = get_stooq_prices_and_tickers(frequency=frequency,market=market,asset_class=asset_class)

    prices = prices.sort_index().loc[idx[:, '2000': '2019'], :]
    names = prices.index.names
    prices = (prices
              .reset_index()
              .drop_duplicates()
              .set_index(names)
              .sort_index())
    
    print('\nNo. of observations per asset')
    print(prices.groupby('ticker').size().describe())
    key = f'stooq/{market}/{asset_class.replace(" ", "/")}/'
    
    print(prices.info(null_counts=True))
    
    prices.to_hdf(DATA_STORE, key + 'prices', format='t')
    
    print(tickers.info())
    tickers.to_hdf(DATA_STORE, key + 'tickers', format='t')

got this error.

tse stocks
stooq/data/daily/jp/tse stocks

ValueError Traceback (most recent call last)
in
9 for asset_class in asset_classes:
10 print(f'\n{asset_class}')
---> 11 prices, tickers = get_stooq_prices_and_tickers(frequency=frequency,market=market,asset_class=asset_class)
12
13 prices = prices.sort_index().loc[idx[:, '2000': '2019'], :]

in get_stooq_prices_and_tickers(frequency, market, asset_class)
38 file.unlink()
39
---> 40 prices = (pd.concat(prices, ignore_index=True)
41 .rename(columns=str.lower)
42 .set_index(['ticker', date_label])

/opt/conda/envs/ml4t/lib/python3.7/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
279 verify_integrity=verify_integrity,
280 copy=copy,
--> 281 sort=sort,
282 )
283

/opt/conda/envs/ml4t/lib/python3.7/site-packages/pandas/core/reshape/concat.py in init(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
327
328 if len(objs) == 0:
--> 329 raise ValueError("No objects to concatenate")
330
331 if keys is None:

ValueError: No objects to concatenate

Any help is appreciated.

do not have permission in /home/packt/ml4t

When I install via docker on windows10.

After input zipline ingest seem like permission to mkdir in /home/packt/ml4t is demied.
How can I add permission to this folder.

(ml4t-zipline) packt@6463f5cb63ce:~$ zipline ingest
Traceback (most recent call last):
File "/opt/conda/envs/ml4t-zipline/bin/zipline", line 8, in
sys.exit(main())
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/click/core.py", line 1256, in invoke
Command.invoke(self, ctx)
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/zipline/main.py", line 60, in main
os.environ,
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/zipline/utils/run_algo.py", line 249, in load_extensions
pth.ensure_file(default_extension_path)
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/zipline/utils/paths.py", line 58, in ensure_file
ensure_directory_containing(path)
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/zipline/utils/paths.py", line 45, in ensure_directory_containing
ensure_directory(os.path.dirname(path))
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/zipline/utils/paths.py", line 30, in ensure_directory
os.makedirs(path)
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/os.py", line 231, in makedirs
makedirs(head, mode, exist_ok)
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/os.py", line 241, in makedirs
mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/home/packt/ml4t/data'

Thank you for your kind reply.

Thank you for your kind reply.
I
found the environment_windows.yml from the first_edition branch and installed it well and am using it well.
I have one additional question.

Where can I find the zipline installation files for Windows in Python 3.5 environments to use Chapter 4 through 5?

In this chapter folder, i could only find file for linux. Do you provide an installation file for Windows os ?

Hi @silent0506,

thank you for your interest in the book. You can just delete these packages form the environment.yml file. However, the first_edition branch also contains a file tailored to Windows, you may want to use this instead.

The second edition has been released a few weeks ago and contains a lot of additional material; I would highly recommend you review the notebooks in this repo as well, you may find it quite useful.

thank

Originally posted by @silent0506 in #36 (comment)

Train Agent - TypeError: 'tensorflow.python.framework.ops.EagerTensor' object does not support item assignment

Hi, I managed to get the data to redownload correctly but i've run into this syntax error moving forward on the block of code under the "train agent" section I receive this error:

`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in
13 0.0 if done else 1.0)
14 if ddqn.train:
---> 15 ddqn.experience_replay()
16 if done:
17 break

in experience_replay(self)
107
108 q_values = self.online_network.predict_on_batch(states)
--> 109 q_values[[self.idx, actions]] = targets
110
111 loss = self.online_network.train_on_batch(x=states, y=q_values)

TypeError: 'tensorflow.python.framework.ops.EagerTensor' object does not support item assignment`

I appreciate you're guidance greatly, thanks again.

05_strategy_evaluation - Mean Revision Startegy

Hi Stefan, I am trying to test the mean revision strategy provided by you in chapter 05_strategy_evaluation, 01_backtest_with_trades.ipynb with my own custom bundle of Indian equity. It doesn't create any output for
returns, positions, transactions = extract_rets_pos_txn_from_zipline(backtest)

Any help is appreciated. The code looks as below

class MeanReversion(CustomFactor):
    """Compute ratio of latest monthly return to 12m average,
       normalized by std dev of monthly returns"""
    inputs = [Returns(window_length=MONTH)]
    window_length = YEAR

    def compute(self, today, assets, out, monthly_returns):
        df = pd.DataFrame(monthly_returns)
        out[:] = df.iloc[-1].sub(df.mean()).div(df.std())

def compute_factors():
    """Create factor pipeline incl. mean reversion,
        filtered by 30d Dollar Volume; capture factor ranks"""
    mean_reversion = MeanReversion()
    dollar_volume = AverageDollarVolume(window_length=30)
    return Pipeline(columns={'longs'  : mean_reversion.bottom(N_LONGS),
                             'shorts' : mean_reversion.top(N_SHORTS),
                             'ranking': mean_reversion.rank(ascending=False)},
                    screen=dollar_volume.top(VOL_SCREEN))

def rebalance(context, data):
    """Compute long, short and obsolete holdings; place trade orders"""
    factor_data = context.factor_data
    assets = factor_data.index

    longs = assets[factor_data.longs]
    shorts = assets[factor_data.shorts]
    divest = context.portfolio.positions.keys() - longs.union(shorts)

    exec_trades(data, assets=divest, target_percent=0)
    exec_trades(data, assets=longs, target_percent=1 / N_LONGS if N_LONGS else 0)
    exec_trades(data, assets=shorts, target_percent=-1 / N_SHORTS if N_SHORTS else 0)

def exec_trades(data, assets, target_percent):
    """Place orders for assets using target portfolio percentage"""
    for asset in assets:
        if data.can_trade(asset) and not get_open_orders(asset):
            order_target_percent(asset, target_percent)

def before_trading_start(context, data):
    """Run factor pipeline"""
    context.factor_data = pipeline_output('factor_pipeline')
    record(factor_data=context.factor_data.ranking)
    assets = context.factor_data.index
    record(prices=data.current(assets, 'price'))

def initialize(context):
    """Setup: register pipeline, schedule rebalancing,
        and set trading params"""
    set_benchmark(symbol('INFY'))
    attach_pipeline(compute_factors(), 'factor_pipeline')
    schedule_function(rebalance,
                      date_rules.week_start(),
                      time_rules.market_open(),)

    set_commission(us_equities=commission.PerShare(cost=0.00075, min_trade_cost=.01))
    set_slippage(us_equities=slippage.VolumeShareSlippage(volume_limit=0.0025, price_impact=0.01))

backtest = run_algorithm(start=start,
                         end=end,
                         initialize=initialize,
                         before_trading_start=before_trading_start,
                         capital_base=capital_base,
                         data_frequency = 'daily', 
                         bundle= 'nse_data')

Installation/Docker issue

I've downloaded Docker for windows but have encountered a couple issues when attempting to follow the steps here: https://github.com/PacktPublishing/Machine-Learning-for-Algorithmic-Trading-Second-Edition/blob/44c03418255a196c74b698c7d8a1cb82d5c7fa5f/installation/README.md.

It throws the following error:
docker: invalid reference format.

any feedback is much appreciated!

Env setup

Hi Stefan,
Eagerly waiting for the 2nd edition of your book. Meanwhile, I am working with 1st edition and your github repo.
I am using window 10 env.
I setup the env_zipline as per environment.yml under
machine-learning-for-trading/02_market_and_fundamental_data/03_data_providers/05_zipline/
received error for following packages and I removed them from environment.yml. It works and no issues

  - libgfortran-ng=7.3.0
  - libgfortran=3.0.0
  - ncurses=6.2
  - readline=7.0
  - gst-plugins-base=1.14.5
  - libstdcxx-ng=9.2.0
  - libgomp=9.2.0
  - gstreamer=1.14.5
  - dbus=1.13.12
  - requests=2.14.2
  - libgcc-ng=9.2.0
  - libuuid=2.32.1
  - glib=2.63.1
  - _openmp_mutex=4.5
  - libedit=3.1.20181209
'''

however, there is another environment.yml
machine-learning-for-trading/05_strategy_evaluation/02_risk_metrics_pyfolio/ to setup backtesting env when I ran this on my window system it throws a big list of packages not found (listed below). Appreacite your help


`  - markupsafe==1.0=py35h14c3975_1
  - lz4-c==1.8.1.2=h14c3975_0
  - gcc_linux-64==7.3.0=h553295d_7
  - theano==1.0.2=py35h6bb024c_0
  - tk==8.6.8=hbc83047_0
  - nbformat==4.4.0=py35h12e6e07_0
  - h5py==2.8.0=py35h989c5e5_3
  - libgcc-ng==9.1.0=hdf63c60_0
  - traitlets==4.3.2=py35ha522a97_0
  - dbus==1.13.6=h746ee38_0
  - mkl_random==1.0.1=py35h4414c95_1
  - tornado==5.1.1=py35h7b6447c_0
  - cyordereddict==1.0.0=py35h470a237_2
  - glib==2.56.2=hd408876_0
  - ncurses==6.1=he6710b0_1
  - bzip2==1.0.6=h14c3975_5
  - hdf5==1.10.2=hba1933b_1
  - sqlite==3.28.0=h7b6447c_0
  - gst-plugins-base==1.14.0=hbbd80ab_1
  - libffi==3.2.1=hd88cf55_4
  - pcre==8.43=he6710b0_0
  - ptyprocess==0.6.0=py35_0
  - libgpuarray==0.7.6=h14c3975_0
  - numexpr==2.6.8=py35hd89afb7_0
  - zeromq==4.2.5=hf484d3e_1
  - bottleneck==1.2.1=py35h035aef0_1
  - python==3.5.6=hc3d631a_0
  - snappy==1.1.7=hbae5bb6_3
  - mistune==0.8.3=py35h14c3975_1
  - lxml==4.2.5=py35hefd8a0e_0
  - gcc_impl_linux-64==7.3.0=habb00fd_1
  - numpy==1.14.6=py35h3b04361_4
  - scikit-learn==0.20.0=py35h4989274_1
  - binutils_impl_linux-64==2.31.1=h6176602_1
  - libsodium==1.0.16=h1bed415_0
  - jpeg==9b=h024ee3a_2
  - jupyter_console==5.2.0=py35h4044a63_1
  - sqlalchemy==1.2.11=py35h7b6447c_0
  - wcwidth==0.1.7=py35hcd08066_0
  - kiwisolver==1.0.1=py35hf484d3e_0
  - zstd==1.3.7=h0b5b093_0
  - libpng==1.6.37=hbc83047_0
  - pytables==3.4.4=py35ha205bf6_0
  - libxslt==1.1.33=h7d1a2b0_0
  - cffi==1.11.5=py35he75722e_1
  - libuuid==1.0.3=h1bed415_2
  - cython==0.28.5=py35hf484d3e_0
  - mkl_fft==1.0.6=py35h7dd41cf_0
  - jsonschema==2.6.0=py35h4395190_0
  - icu==58.2=h9c2bf20_1
  - xz==5.2.4=h14c3975_4
  - sip==4.19.8=py35hf484d3e_0
  - gstreamer==1.14.0=hb453b48_1
  - readline==7.0=h7b6447c_5
  - qt==5.9.6=h8703b6f_2
  - fontconfig==2.13.0=h9420a91_0
  - cryptography==2.3.1=py35hc365091_0
  - pandas==0.22.0=py35hf484d3e_0
  - blosc==1.16.3=hd408876_0
  - statsmodels==0.9.0=py35h3010b51_0
  - freetype==2.9.1=h8a8886c_1
  - matplotlib==3.0.0=py35h5429711_0
  - scipy==1.1.0=py35hd20e5f9_0
  - expat==2.2.6=he6710b0_0
  - numpy-base==1.14.6=py35h81de0dd_4
  - pygpu==0.7.6=py35h3010b51_0
  - gxx_impl_linux-64==7.3.0=hdf63c60_1
  - gmp==6.1.2=h6c8ec71_1
  - mkl-service==1.1.2=py35h90e4bf4_5
  - binutils_linux-64==2.31.1=h6176602_7
  - libstdcxx-ng==9.1.0=hdf63c60_0
  - libgfortran-ng==7.3.0=hdf63c60_0
  - ipython_genutils==0.2.0=py35hc9e07d0_0
  - contextlib2==0.5.5=py35h6690dba_0
  - cycler==0.10.0=py35hc4d5149_0
  - libxcb==1.13=h1bed415_1
  - intel-openmp==2019.4=243
  - openssl==1.0.2s=h7b6447c_0
  - zlib==1.2.11=h7b6447c_3
  - pyzmq==17.1.2=py35h14c3975_0
  - pyqt==5.9.2=py35h05f1152_2
  - prompt_toolkit==1.0.15=py35hc09de7a_0
  - pickleshare==0.7.4=py35hd57304d_0
  - click==6.7=py35h353a69f_0
  - libxml2==2.9.9=hea5a465_1
  - lzo==2.10=h49e0be7_2
  - gxx_linux-64==7.3.0=h553295d_7
  - testpath==0.3.1=py35had42eaf_0
  - libedit==3.1.20181209=hc058e9b_0
  - wrapt==1.10.11=py35h14c3975_2`

get_data.py script missing in repo?

At location 2646 "Useful pandas and NumPy methods" in the book it states "The notebook uses data generated by the get_data.py script in the data folder in the root directory of the GitHub repo and stored in HDF5 format for faster access".

Can't find the get_data.py script in this repo, missing? Assume it generates the access.h5 file used in many of the other chapters' notebooks.

FEATURE_ENGINEERING ERROR

within the file feature_engineering I keep getting the below error when trying to 'pd.slideIndex'.
not sure what it means and how to overcome this issue.

TypeError: unhashable type: 'slice'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.