Light

g0bel1n / ddql-optimal-execution Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 0.0 8.07 MB

Double Deep Q-Learning for Optimal Execution implementation

License: MIT License

Python 100.00%

deep-learning double-dqn finance rl

ddql-optimal-execution's Introduction

Lucas Saban

Quant Researcher @QRT

Previously Double Degree Master student at Ensae Paris and MVA (ENS Paris-Saclay). Interested in Machine Learning, Deep Learning, Swarm intelligence.

I open source some of my projects here.

Currently working on :

Quantools : [Financial Engineering | Visualization | High Performance Python] A tool-box for quants (ML-oriented)
DDQL : [RL | Optimal Execution | Trading] Implementation of a paper.

Contact : lucas[dot]saban[at]ensae[dot]fr

Awards and Competitions

1st place at 2022 Ostrum Asset Management Finance Game. Deep Recurrent Reinforcement Learning for Dynamic Portfolio Optimization.
1st place at the 2022 Hi Paris Hackacthon delivered by Capgemini. Segmentation and classification computer vision tasks.
1st place at the 2022 ENSAE's Hackathon by Eleven Strategy and Data For Good. We used computer vision, NLP and audio segmentation to evaluate Bechdel scores.
2nd place at the 2021 ENSAE's Hackathon by Capgemini

Some projects

Academy

Leveraging latent representations for efficient textual OOD detection. Paper, Repo (Not published)
Large scale joint Hyperparameters and feature selection using Multi-objective heuristic optimization Report, Presentation and Repo
Implementation of Fast Shapelet discoveries for time series classification Orignal Paper, Report and Repo
Implementation of Double Deep Q-Learning for Optimal Trading Execution Orignal Paper, Report and Repo

Industry

Spatio-temporel spectral clustering algorithm based on HDBSCAN for dynamic event recognition. Source is private but happy to talk about it.
A full-stack Image Classification project using Tensorflow, Transfer-learning and CNN's. An iOS app, that I coded in swift, is the front-end part of the project. It was realized with Augustin Cramer

A scikit-learn plugin for binary Classification tasks, delivered as a pypi package. The source is available here.

An Ant-Search Simulation with GUI using C++ and SFML

The ants adapt to the change in their environnement. They quickly find a new path following the pheromones tracks.

A data Visualization project using Geopandas : Gravity Law Model Applied to inter-cities traffic

gRavlaw

ddql-optimal-execution's People

Contributors

Stargazers

Watchers

ddql-optimal-execution's Issues

check if this is correct and clarify numpy/torch/State object

https://api.github.com/g0bel1n/DDQL-optimal-execution/blob/8fb5bdca42a467e52167801c55360b4aa2c26fba/src/agent/_agent.py#L70

from ._neural_net import QNet
from ._utils import get_device
from ._state import State

import torch
import torch.nn as nn
import torch.optim as optim

import numpy as np
import scipy


from typing import Optional

class Agent:

    def __init__(self, state_dict: Optional[dict] = None,  greedy_decay_rate: float = .1, target_update_rate: int = 100, initial_greediness : float = .2, mode :str = 'train', lr:float = 1e-3) -> None:

        self.device = get_device()
        print(f"Using {self.device} device")

        self.main_net = QNet().to(self.device)
        self.target_net = QNet().to(self.device)

    
        if state_dict is not None:
            self.main_net.load_state_dict(state_dict)
            self.target_net.load_state_dict(state_dict)

        self.greedy_decay_rate = greedy_decay_rate
        self.target_update_rate = target_update_rate
        self.greediness= initial_greediness

        self.mode = mode

        self.learning_step = 0

        if self.mode == 'train':
            self.optimizer = optim.RMSprop(self.main_net.parameters(), lr=lr)
            self.loss_fn = nn.MSELoss()



    def train(self) -> None:
        self.main_net.train()
        self.mode = 'train'

    def eval(self) -> None:
        self.main_net.eval()
        self.mode = 'eval'
    

    def _get_action(self, state) -> torch.Tensor:

        if np.random.rand() < self.greediness and self.mode == 'train':
            action = np.random.binomial(state['inventory'], 1/state['inventory'])
        else:
            action = self.main_net(state, action).argmax().item() #clarify
        
        return action
    
    def _update_target_net(self) -> None:
        self.target_net.load_state_dict(self.main_net.state_dict())


    def _complete_target(self, experience_batch : torch.Tensor) -> torch.Tensor:
        ids = torch.cat(torch.where(experience_batch['done'] == 0)[0], torch.where(experience_batch['predone'] == 0)[0])
        for experience in experience_batch[ids]:
            constraints = ({'type': 'ineq', 'fun': lambda x: x}, {'type': 'ineq', 'fun': lambda x: experience['inventory'] - x}) 
            # TODO: check if this is correct and clarify numpy/torch/State object
            # assignees: g0bel1n
            best_action = scipy.optimize.minimize(lambda x: -self.main_net(experience['next_state'], x), np.array([0.]), constraints=constraints).x
            target_complement = experience['gamma'] * self.target_net(experience['next_state'],best_action)
            experience['target'] += target_complement

        return experience_batch

    
    def learn(self, experience_batch : torch.Tensor) -> None:
        experience_batch = self._complete_target(experience_batch)
        dataloader = torch.utils.data.DataLoader(experience_batch, batch_size=32, shuffle=True)
        for batch in dataloader:
            target = batch['target'].unsqueeze(1)
            pred = self.main_net(batch['state'], batch['action'])
            loss = self.loss_fn(pred, target)
            self.optimizer.zero_grad()
            loss.backward()
            self.optimizer.step()

        self.learning_step += 1
        self.greediness = max(0.01, self.greediness * self.greedy_decay_rate)
        if self.learning_step % self.target_update_rate == 0:
            self._update_target_net()
            print(f"Target network updated at step {self.learning_step} with greediness {self.greediness:.2f}")

        

    def __call__(self, state) -> torch.Tensor:
        return self._get_action(state)

implement create_fake_LOB_data

following the same data structure outputs than in the paper

DDQL-optimal-execution/src/data/_utils.py

Line 18 in dcb42a3

# TODO: implement create_fake_LOB_data

        raise ValueError("return_type must be either 'numpy' or 'torch'")


# TODO: implement create_fake_LOB_data
# following the same data structure outputs than in the paper
def create_fake_LOB_data(
    n_samples: int = 1000, n_features: int = 10, n_classes: int = 2
) -> tuple:
    """Creates fake LOB data for testing purposes."""
    
    pass

implement inventory_action_transformer according to Appendix A.1

and add tests for it in tests/test_data_utils.py (if possible) (might need to implement a function to compute the inverse of the transformation)

DDQL-optimal-execution/src/data/_data_processing.py

Line 31 in 97a6b7c

# TODO: implement inventory_action_transformer according to Appendix A.1

    :param inv_act_pairs: a tensor of shape (batch_size, 2) where the first column is the inventory and
    the second column is the action
    """
    # TODO: implement inventory_action_transformer according to Appendix A.1 
    # and add tests for it in tests/test_data_utils.py (if possible) (might need to implement a function to compute the inverse of the transformation)
    pass

Implement pretraining on boundary cases

https://api.github.com/g0bel1n/DDQL-optimal-execution/blob/11b26100d3b569ce76a89fc4cb73d62558234539/src/_ddql.py#L46

    def _pretrain(self, n_steps: int = 1000) -> dict:
        # TODO: Implement pretraining on boundary cases 
        #  assignees: g0bel1n

        state_dict = dict()
        return state_dict

load episodes from csv ?

https://api.github.com/g0bel1n/DDQL-optimal-execution/blob/8fb5bdca42a467e52167801c55360b4aa2c26fba/src/environnement/_env.py#L11

import pandas as pd 
from ._state import State

class MarketEnvironnement:
    
    def __init__(self, initial_inventory : float = 100.) -> None:

        self.historical_data = pd.read_csv("data/historical_data.csv")
        self.historical_data = self.historical_data.set_index("Date")

        # TODO: load episodes from csv ?
        # assignees: g0bel1n



        self.horizon = self.historical_data.shape[0]

        self.current_step = 0
        self.cols = self.historical_data.columns

        initial_state = self.historical_data.iloc[self.current_step, :].values + [initial_inventory, self.current_step]
        states_elements = self.cols + ['inventory', 'step']

        self.done = False

        self.state = State(states_elements, initial_state)    

    def step(self, action: int) -> tuple:
        # Execute one time step within the environment
        self.current_step += 1

        reward = self._get_reward(action)

        self.done = self.current_step == self.horizon - 1

        self.state['inventory'] = self.state['inventory'] + action
        self.state['step'] = self.state['step'] + 1


        if not self.done:
            self.state.update_state(**self.historical_data.iloc[self.current_step, :].values)

        return None
    
    def get_trading_episodes(self) -> tuple:
        # Return the trading episodes
        return None
    
    
    def _get_reward(self, action: int) -> float:
        return self.state[-1] if action == 0 else -self.state[-1]
    
    def reset(self) -> None:
        # Reset the state of the environment to an initial state
        self.current_step = 0
        self.state = self.historical_data.iloc[self.current_step, :].values
        return self.state

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs