GithubHelp home page GithubHelp logo

ai4finance-foundation / rlsolver Goto Github PK

View Code? Open in Web Editor NEW
117.0 4.0 30.0 62.41 MB

Solvers for NP-hard and NP-complete problems with an emphasis on high-performance GPU computing.

Home Page: https://ai4finance.org

License: MIT License

Python 99.35% MATLAB 0.65%
gpu-acceleration learning massively-parallel optimization reinforcement solver

rlsolver's Introduction

RLSolver: High-performance GPU-based Solvers for Nonconvex and NP-Complete Problems

We aim to showcase that reinforcement learning (RL) or machine learning (ML) with GPUs delivers the best benchmark performance for large-scale nonconvex and NP-complete problems. RL with the help of GPU computing can obtain high-quality solutions within short time.

Sub-repos

Key Technologies

  • RL/ML tricks such as learn to optimize and curriculum learning.
  • OR tricks such as local search and tabu search.
  • Massively parallel sampling of Markov chain Monte Carlo (MCMC) simulations on GPU using thousands of CUDA cores and tensor cores.
  • Podracer scheduling on a GPU cloud such as DGX-2 SuperPod.

Key References

  • Mazyavkina, Nina, et al. "Reinforcement learning for combinatorial optimization: A survey." Computers & Operations Research 134 (2021): 105400.

  • Bengio, Yoshua, Andrea Lodi, and Antoine Prouvost. "Machine learning for combinatorial optimization: a methodological tour d’horizon." European Journal of Operational Research 290.2 (2021): 405-421.

  • Peng, Yun, Byron Choi, and Jianliang Xu. "Graph learning for combinatorial optimization: a survey of state-of-the-art." Data Science and Engineering 6, no. 2 (2021): 119-141.

  • Nair, Vinod, et al. "Solving mixed integer programs using neural networks." arXiv preprint arXiv:2012.13349 (2020).

  • Makoviychuk, Viktor, et al. "Isaac Gym: High performance GPU based physics simulation for robot learning." Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). 2021.

Workflow

Datasets

  • Maxcut:

    1. Gset is stored in the "data" folder of this repo. The number of nodes is from 800 to 10000.

    2. Syn is the synthetic data obtained by calling the function generate_write in util.py. The number of nodes is from 10 to 50000. The (partial) synthetic data is stored in the "data" folder of this repo. If users need all the synthetic data, please refer to Google Drive or Baidu Wangpan (CODE hojh for China users).

  • TSP: TSPLIB

Benchmarks

  • Learning to branch

code 2023 AAAI Reinforcement Learning for Branch-and-Bound Optimisation using Retrospective Trajectories

code 2021 AAAI Parameterizing Branch-and-Bound Search Trees to Learn Branching Policies

  • Learning to cut

code 2020 ICML Reinforcement learning for integer programming: Learning to cut

  • RL/ML-based heuristic

code (greedy) 2017 NeurIPS Learning Combinatorial Optimization Algorithms over Graphs

code (local search) 2023, A Monte Carlo Policy Gradient Method with Local Search for Binary Optimization

code (LKH for TSP) 2021 AAAI Combining reinforcement learning with Lin-Kernighan-Helsgaun algorithm for the traveling salesman problem

  • Variational annealing

code (VCA_RNN) 2023 Machine_Learning Supplementing recurrent neural networks with annealing to solve combinatorial optimization problems

code (VNA) 2021 Nature_Machine_Intelligence Variational neural annealing

  • Discrete sampling

code (iSCO) 2023 ICML Revisiting sampling for combinatorial optimization

Solvers to Compare with

Gurobi is the state-of-the-art solver. The license is required, and professors/students at universities can obtain the academic license for free.

SCIP is a well-known open-source solver, and its simplex is commonly used in "learning to branch/cut". SCIP is open-source and free.

Other Solvers

COPT: a mathematical optimization solver for large-scale problems.

CPLEX: a high-performance mathematical programming solver for linear programming, mixed integer programming, and quadratic programming.

Xpress: an extraordinarily powerful, field-installable Solver Engine.

BiqMac: a solver only for binary quadratic or maxcut. Users should upload txt file, but the response time is not guaranteed. If users use it, we recommend to download the sources and run it by local computers.

Store Results

Partial results are stored in the folder "result" of this repo. All the results are stored in Google Drive or Baidu Wangpan (CODE: hojh for China users).

With respect to maxcut, please refer to Maxcut. With respect to TSP, please refer to TSP.

Performance

Maxcut. TSP. Quantum circuits MIMO Compressive sensing

File Structure

RLSolver
└──helloworld
   └──maxcut
        └──data
        └──result
        └──util.py
        └──mcmc.py
        └──l2a.py (ours)
        └──baseline
            └──greedy.py
            └──gurobi.py
            └──random_walk.py
            └──simulated_annealing.py
            └──variational_classical_annealing_RNN
            └──variational_neural_annealing
└──benchmark
   └──maxcut.md
   └──graph_partitioning.md
   └──tsp.md
   └──tnco.md
└──rlsolver (main folder)
   └──util.py
   └──data
      └──graph
      └──quantum_circuits
      └──milp_coefs
      └──binary_coefs
   └──problems
      └──maxcut
          └──baseline
          └──mcmc.py
          └──l2a.py(ours)
      └──tnco
          └──baseline
          └──mcmc.py
          └──l2a.py(ours)
      └──mimo
          └──baseline
          └──mcmc.py
          └──l2a.py(ours)




Finished

  • MIMO
  • Maxcut
  • TNCO
  • quantum circuits

TODO

  • TSP
  • VRP (Vehicle routing problem)
  • Graph partitioning
  • Minimum vertex cover
  • MILP
  • Portfolio allocation

Related Websites

rlsolver's People

Contributors

bruceyanghy avatar shixun404 avatar spicywei avatar yangletliu avatar yonv1943 avatar zhangaipi avatar zhumingpassional avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

rlsolver's Issues

One question

Implementing updates to the Start-End matrix is ineffective, and forcing changes can't even be changed, which is the current problem with Ring.
image
088e42a3f91e10f540e0a467e2fe6be

📝 Generate a fixed graph with a constant random seed

Generate a fixed undirected graph with a constant random seed
固定随机种子,生成固定的无向图

固定完随机种子之后,算法内部会生成固定的伪随机数,
随机生成无向图的函数 generate_graph(),会“消耗”这些伪随机数

下面的代码,可以在指定无向图的节点数量num_nodes,图的生成方式g_type 以及 图的序号valid_i,直接用代码生成固定的图。无论使用什么设备。

我建议使用方案1,避免储存很多表示无向图的txt文件(不用担心文件丢失)。只要指定以下三个信息,就能直接生成无向图,对于这样的 graph_name = 'powerlaw_100_ID042',可以用函数直接生成唯一的无向图 :

  • 无向图的节点数量num_nodes=100
  • 图的生成方式g_type=‘powerlaw’
  • 图的序号valid_i=42

备注:在生成300个节点的图,我们用的是方案3。在2023-08-17日之后才改成方案1


方案1:将图的序号valid_i 作为 random seed 的序号,算力消耗最小,要换随机种子

import random
for valid_i in range(6):
    random.seed(valid_i) 
    graph, num_nodes, num_edges = generate_graph(num_nodes=num_nodes, g_type=g_type)
random.seed()  # 恢复随机种子的默认设置

方案2:固定随机种子后,消耗定量的随机数值,介于方案1与方案2之间。

import random
for valid_i in range(6):
    random.seed(0) 
    [random.random() for _ in range(valid_i * num_nodes)]
    graph, num_nodes, num_edges = graph, num_nodes, num_edges = generate_graph(num_nodes=num_nodes, g_type=g_type)
random.seed()  # 恢复随机种子的默认设置

方案3:固定随机种子后,直接调用生成函数,自动消耗定量的随机数值,在 valid_i 很大的时候消耗较多算力

import random
for valid_i in range(6):
    random.seed(0) 
    graph_tuples = [generate_graph(num_nodes=num_nodes, g_type=g_type) for _ in range(valid_i + 1)]
    graph, num_nodes, num_edges = graph_tuples[-1]
random.seed()  # 恢复随机种子的默认设置

A problem occurs in TNCO env

37e3892a88c604e78c8305f7ee71222

When N>1500, the existing code overflows and results in inf Tried to fix it by increasing the temp_power variable so that it can calculate a larger range of floating point numbers, but it didn't work.

🚚 single file MIMO for N16K16

backup in github issue.

This is not the official commit code, this is the single file I changed to modify the cumulative gradient scheme.

import os
import time
import math
import wandb
import torch
import torch.nn as nn
import pickle as pkl
from tqdm import tqdm
from functorch import vmap
from argparse import ArgumentParser

'''zjh: 2023-01-17 17:47:03 base on wsx'''

'''network'''


class NetMIMO(nn.Module):
    def __init__(self, mid_dim=1024, k=4, n=4):
        super(NetMIMO, self).__init__()
        self.K = k  # k_antenna
        self.N = n  # n_user

        self.inp_dim = 2 * 3
        self.mid_dim = mid_dim

        self.net = nn.Sequential(
            BiConvNet(mid_dim=mid_dim, inp_dim=(self.inp_dim, k, n), out_dim=mid_dim * 4), nn.ReLU(),
            nn.Linear(mid_dim * 4, mid_dim * 2), nn.ReLU(),
            nn.Linear(mid_dim * 2, mid_dim * 1), nn.ReLU(),
            DenseNet(mid_dim * 1), nn.ReLU(),
            nn.Linear(mid_dim * 4, 2 * k * n), nn.Tanh(),
        )

    def forward(self, state):
        mat_h, mat_w, mat_p, mat_hw = state

        mat_h = (mat_h / (1e-5 + mat_h.norm(dim=1, keepdim=True))).reshape(-1, self.K * self.N)
        mat_w = (mat_w / (1e-5 + mat_w.norm(dim=1, keepdim=True))).reshape(-1, self.K * self.N)
        mat_hw = (mat_hw / (1e-5 + mat_hw.norm(dim=1, keepdim=True))).reshape(-1, self.K * self.N)

        vec_h = torch.cat((mat_h.real, mat_h.imag), dim=1)
        vec_w = torch.cat((mat_w.real, mat_w.imag), dim=1)
        vec_hw = torch.cat((mat_hw.real, mat_hw.imag), dim=1)

        vec_i = torch.cat((vec_h, vec_w, vec_hw), 1).reshape(-1, self.inp_dim, self.K, self.N)

        new_w = self.net(vec_i)
        new_w = new_w / torch.norm(new_w, dim=1, keepdim=True)
        new_w_real, new_w_imag = new_w.chunk(2, dim=1)
        return (new_w_real + new_w_imag * 1.j).reshape(-1, self.K, self.N)  # complex number


class DenseNet(nn.Module):
    def __init__(self, lay_dim):
        super().__init__()
        self.dense1 = nn.Sequential(nn.Linear(lay_dim * 1, lay_dim * 1), nn.Hardswish())
        self.dense2 = nn.Sequential(nn.Linear(lay_dim * 2, lay_dim * 2), nn.Hardswish())
        self.inp_dim = lay_dim
        self.out_dim = lay_dim

    def forward(self, x1):
        x2 = torch.cat((x1, self.dense1(x1)), dim=1)
        x3 = torch.cat((x2, self.dense2(x2)), dim=1)
        return x3


class BiConvNet(nn.Module):
    def __init__(self, mid_dim, inp_dim, out_dim):
        super().__init__()
        i_c_dim, i_h_dim, i_w_dim = inp_dim
        self.cnn_h = nn.Sequential(
            nn.Conv2d(i_c_dim * 1, mid_dim * 2, (1, i_w_dim), bias=True), nn.LeakyReLU(inplace=True),
            nn.Conv2d(mid_dim * 2, mid_dim * 1, (1, 1), bias=True), nn.ReLU(inplace=True), )
        self.linear_h = nn.Linear(i_h_dim * mid_dim, out_dim)
        self.cnn_w = nn.Sequential(
            nn.Conv2d(i_c_dim * 1, mid_dim * 2, (i_h_dim, 1), bias=True), nn.LeakyReLU(inplace=True),
            nn.Conv2d(mid_dim * 2, mid_dim * 1, (1, 1), bias=True), nn.ReLU(inplace=True), )
        self.linear_w = nn.Linear(i_w_dim * mid_dim, out_dim)

    def forward(self, state):
        ch = self.cnn_h(state)
        xh = self.linear_h(ch.reshape(ch.shape[0], -1))
        cw = self.cnn_w(state)
        xw = self.linear_w(cw.reshape(cw.shape[0], -1))
        return xw + xh


'''environment'''


class MIMOEnv:
    def __init__(self, k=4, n=4, p=10.0, noise_power=1, episode_length=6, num_env=4096, device=torch.device("cuda:0"),
                 reward_mode='sl', snr=10):
        self.N = n  # #antennas
        self.K = k  # #users
        self.P = p  # Power
        self.noise_power = noise_power
        self.reward_mode = reward_mode

        self.num_env = num_env
        self.episode_length = episode_length
        self.num_x = 1000
        self.epsilon = 1
        self.snr = snr
        self.if_test = False

        self.device = device

        '''reset'''
        self.vec_h = None
        self.mat_w = None
        self.mat_h = None
        self.X = None
        self.num_steps = 0
        self.done = False

        self.basis_vec, _ = torch.linalg.qr(
            torch.rand(2 * self.K * self.N, 2 * self.K * self.N, dtype=torch.float, device=self.device)
        )  # QR decomposition, return the matrixQ and matrixR

        self.subspace_dim = 2 * k * n
        if self.reward_mode == 'empirical':
            self.get_vec_reward = vmap(self.get_reward_empirical, in_dims=(0, 0, None), out_dims=(0, 0))
        elif self.reward_mode == 'analytical':
            self.get_vec_reward = vmap(self.get_reward_analytical, in_dims=(0, 0, None), out_dims=(0, 0))
        elif self.reward_mode == 'supervised_mmse' or self.reward_mode == 'supervised_mmse_curriculum':
            self.get_vec_reward = vmap(self.get_reward_supervised_mmse, in_dims=(0, 0, 0), out_dims=(0, 0))
        elif self.reward_mode == 'rl':
            self.subspace_dim = 1  # 2 * K * N
            self.get_vec_reward = None
        else:
            raise ValueError(f"{self.reward_mode} should be in {REWARD_MODES}")

        self.get_vec_sum_rate = vmap(self.get_sum_rate, in_dims=(0, 0), out_dims=(0, 0))

        with open(f"./K{self.K}N{self.N}Samples=100.pkl", 'rb') as f:
            self.test_H = torch.as_tensor(pkl.load(f), dtype=torch.cfloat, device=self.device)

    def reset(self, if_test=False, test_p=None, if_mmse=False):
        if self.subspace_dim <= 2 * self.K * self.N:
            self.vec_h = self.generate_channel_batch(self.N, self.K, self.num_env, self.subspace_dim, self.basis_vec)
        else:
            self.vec_h = torch.randn(self.num_env, 2 * self.K * self.N, dtype=torch.cfloat,
                                     device=self.device) / math.sqrt(2)

        '''get mat_h'''
        self.if_test = False
        if if_test:
            self.if_test = True
            self.mat_h = self.test_H * math.sqrt(test_p)
            # print(self.mat_H.shape)
        else:
            self.mat_h = (self.vec_h[:, :self.K * self.N] + self.vec_h[:, self.K * self.N:] * 1.j).reshape(-1, self.K,
                                                                                                           self.N)
            self.mat_h *= math.sqrt(self.P)
            # variable SNR
            # self.mat_H[:self.mat_H.shape[0] // 3] *= math.sqrt(10)
            # self.mat_H[self.mat_H.shape[0] // 3:2 * self.mat_H.shape[0] // 3] *= math.sqrt(10 ** 1.5)
            # self.mat_H[2 * self.mat_H.shape[0] // 3:] *= math.sqrt(10 ** 2)

        '''get mat_w'''
        if if_mmse:
            self.mat_w, _ = compute_mmse_beamformer(
                self.mat_h, k=self.K, n=self.N, noise_power=self.noise_power, device=self.device)
        elif self.reward_mode == 'rl':
            vec_w = torch.randn((self.mat_h.shape[0], self.K * self.K), dtype=torch.cfloat, device=self.device)
            vec_w = vec_w / torch.norm(vec_w, dim=1, keepdim=True)
            self.mat_w = vec_w.reshape(-1, self.K, self.N)
        else:
            self.mat_w = torch.zeros_like(self.mat_h, device=self.device)  # self.mat_H.conj().transpose(-1, -2)

        '''get self.X'''
        if self.reward_mode == 'supervised_mmse' or self.reward_mode == 'supervised_mmse_curriculum':
            self.X, _ = compute_mmse_beamformer(self.mat_h, k=self.K, n=self.N, noise_power=self.noise_power,
                                                device=self.device)
        else:
            self.X = torch.randn(self.K, self.num_x, dtype=torch.cfloat).to(self.device)

        '''get mat_hw'''
        mat_hw = torch.bmm(self.mat_h, self.mat_w.transpose(-1, -2))

        self.num_steps = 0
        self.done = False
        return self.mat_h, self.mat_w, self.P, mat_hw

    def step(self, action):
        if self.reward_mode == 'rl' or (not self.if_test):
            sum_rate, mat_hw = self.get_vec_sum_rate(self.mat_h, action)
            reward = sum_rate
            self.mat_w = action.detach()
        else:
            obj, mat_hw = self.get_vec_reward(self.mat_h, action, self.X)
            sum_rate, mat_hw = self.get_vec_sum_rate(self.mat_h, action)
            if self.reward_mode == "supervised_mmse_curriculum":
                reward = (-obj / obj.norm(keepdim=True) * (1 - self.epsilon) +
                          sum_rate / sum_rate.norm(keepdim=True) * self.epsilon)
            else:
                reward = -obj

        self.num_steps += 1
        self.done = self.num_steps >= self.episode_length
        return (self.mat_h, self.mat_w, self.P, mat_hw.detach()), reward, self.done, sum_rate.detach()

    def generate_channel_batch(self, n, k, batch_size, subspace_dim, basis_vectors):
        coordinates = torch.randn(batch_size, subspace_dim, 1, device=self.device)
        basis_vectors_batch = basis_vectors[:subspace_dim].T.repeat(batch_size, 1).reshape(-1, 2 * k * n, subspace_dim)
        vec_channel = torch.bmm(basis_vectors_batch, coordinates).reshape(-1, 2 * k * n)
        return vec_channel / math.sqrt(2)

    @staticmethod
    def get_sum_rate(h, w):
        mat_hw = torch.matmul(h, w.T)
        mat_s = torch.abs(mat_hw.diag()) ** 2
        mat_i = torch.sum(torch.abs(mat_hw) ** 2, dim=-1) - torch.abs(mat_hw.diag()) ** 2
        noise = 1  # / self.P_
        sinr = mat_s / (mat_i + noise)
        reward = torch.log2(1 + sinr).sum(dim=-1)
        return reward, mat_hw

    def get_reward_analytical(self, h, w):
        mat_hw = torch.matmul(h, w.T)
        mat_hw_t = mat_hw.T.conj()

        trace = torch.trace(torch.matmul(mat_hw, mat_hw_t) - mat_hw - mat_hw_t).real
        return (trace + self.K * (1 / self.snr)).mean() / self.K, mat_hw

    def get_reward_empirical(self, h, w, x):
        mat_hw = torch.matmul(h, w.T)
        mat_hwx = torch.matmul(mat_hw / mat_hw.norm(keepdim=True), self.X)
        mat_norm = (mat_hwx - x).norm(dim=1, keepdim=True) ** 2
        return mat_norm.mean().real / 1000, mat_hw  # todo why / 1000?

    @staticmethod
    def get_reward_supervised_mmse(h, w, w_mmse):
        mat_hw = torch.matmul(h, w.T)
        return (torch.abs(w - w_mmse) ** 2).mean(), mat_hw


def compute_mmse_beamformer(mat_h, k=4, n=4, noise_power=1, device=torch.device("cuda:0")):
    p = torch.diag_embed(torch.ones(mat_h.shape[0], 1, device=device).repeat(1, k)).to(torch.cfloat)
    eye_n = torch.diag_embed((torch.zeros(mat_h.shape[0], n, device=device) + noise_power))
    denominator = torch.inverse(eye_n + torch.bmm(mat_h.conj().transpose(1, 2), torch.bmm(p / k, mat_h)))
    wslnr_max = torch.bmm(denominator, mat_h.conj().transpose(1, 2)).transpose(1, 2)
    wslnr_max = wslnr_max / wslnr_max.norm(dim=2, keepdim=True)
    mat_w = torch.bmm(wslnr_max, torch.sqrt(p / k))
    mat_hw = torch.bmm(mat_h, mat_w.transpose(-1, -2))
    mat_s = torch.abs(torch.diagonal(mat_hw, dim1=-2, dim2=-1)) ** 2
    mat_i = torch.sum(torch.abs(mat_hw) ** 2, dim=-1) - torch.abs(torch.diagonal(mat_hw, dim1=-2, dim2=-1)) ** 2
    sinr = mat_s / (mat_i + noise_power)
    return mat_w, torch.log2(1 + sinr).sum(dim=-1).unsqueeze(-1)


'''train'''


def get_cwd(env_name):
    file_list = os.listdir()
    if env_name not in file_list:
        os.mkdir(env_name)
    file_list = os.listdir('./{}/'.format(env_name))
    max_exp_id = 0
    for exp_id in file_list:
        if int(exp_id) + 1 > max_exp_id:
            max_exp_id = int(exp_id) + 1
    os.mkdir('./{}/{}/'.format(env_name, max_exp_id))
    os.mkdir('./{}/{}/{}/'.format(env_name, max_exp_id, 'source_code'))
    os.mkdir('./{}/{}/{}/'.format(env_name, max_exp_id, 'logs'))
    return f"./{env_name}/{max_exp_id}"


def run(gpu_id: int = 0, n_user: int = 8, n_mode: int = 3):
    k_antenna = n_user
    reward_mode = REWARD_MODES[n_mode]

    snr = 10
    power = 10.0 ** (snr / 10)  # not used
    mid_dim = 1024
    noise_power = 1
    learning_rate = 5e-5
    if_save = False
    if_wandb = False

    '''init save'''
    save_dir = get_cwd(f"MIMO_N{n_user}_K{k_antenna}_SNR{snr}")  # cwd (current work directory): folder to save network
    if if_save:
        import shutil
        [shutil.copy2(file, f"{save_dir}/source_code")
         for file in ["env_mimo.py", "net_mimo.py", "train.py"]]

    '''init wandb'''
    wandb_name = f"{reward_mode}_H_CL_REINFORCE_N{n_user}_K{k_antenna}_SNR{snr}"
    if if_wandb:
        config = {
            'method': 'REINFORCE',
            'objective': reward_mode,
            'SNR': snr,
            'mid_dim': mid_dim,
            'num_subspace_dim_update': 2,
            'path': save_dir,
            'num_env': 1024
        }
        wandb.init(
            project=f'REINFORCE_' + 'H' + f'_N{n_user}K{k_antenna}',
            entity="beamforming",
            sync_tensorboard=True,
            config=config,
            name=wandb_name,
            monitor_gym=True,
            save_code=True,
        )

    '''init train'''
    device = torch.device(f"cuda:{gpu_id}" if torch.cuda.is_available() else "cpu")
    net = NetMIMO(mid_dim=mid_dim, k=k_antenna, n=n_user).to(device)
    optimizer = torch.optim.AdamW(net.parameters(), lr=learning_rate)

    num_epochs = 1000000
    num_epochs_per_subspace = 1200
    num_epochs_to_save_model = 1e5
    num_env = 512
    epoch_end_switch = 20000

    env_mimo_relay = MIMOEnv(k=k_antenna, n=n_user, p=power, noise_power=noise_power, device=device, num_env=num_env,
                             reward_mode=reward_mode, episode_length=6)
    pbar = tqdm(range(num_epochs))
    sum_rate = torch.zeros(100, env_mimo_relay.episode_length, 2)
    sum_rate_train = torch.zeros(num_env, env_mimo_relay.episode_length, 1)
    test_powers = [10 ** 1, 10 ** 2]
    start_time = time.time()
    for epoch in pbar:
        state = env_mimo_relay.reset()
        loss = torch.zeros((), dtype=torch.float32, device=device).detach()
        while True:
            action = net(state)
            next_state, reward, done, _ = env_mimo_relay.step(action)
            if env_mimo_relay.reward_mode == "rl":
                loss -= reward.mean()
            else:
                loss -= reward.mean()
            if (epoch + 1) % 5 == 0:
                sum_rate_train[:, env_mimo_relay.num_steps - 1, 0] = _.squeeze()
            state = next_state
            if done:
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()
                break
        if epoch % 20 == 0 and env_mimo_relay.reward_mode == 'supervised_mmse_curriculum':
            env_mimo_relay.epsilon = min(1, (epoch / epoch_end_switch))
        if os.path.isfile(os.path.join(save_dir, "change_to_sr")):
            env_mimo_relay.reward_mode = "rl"
        if epoch == epoch_end_switch:
            env_mimo_relay.reward_mode = "rl"
        if (epoch + 1) % num_epochs_to_save_model == 0 and if_save:
            torch.save(net.state_dict(), save_dir + f"{epoch}.pth")
        if (epoch + 1) % num_epochs_per_subspace == 0 and env_mimo_relay.subspace_dim <= 2 * k_antenna * n_user:
            env_mimo_relay.subspace_dim += int((2 * k_antenna * n_user) / 128)
        if (epoch + 1) % 5 == 0:
            with torch.no_grad():
                for i_p in range(2):
                    state = env_mimo_relay.reset(if_test=True, test_p=test_powers[i_p])
                    while True:
                        action = net(state)
                        next_state, _, done, reward = env_mimo_relay.step(action)
                        sum_rate[:, env_mimo_relay.num_steps - 1, i_p] = reward.squeeze()
                        state = next_state
                        if done:
                            break
                description = f"id: {epoch} | test_sum_rate_SNR=10: {sum_rate[:, :, 0].max(dim=1)[0].mean()} " \
                              f"| test_sum_rate_SNR=20:{sum_rate[:, :, 1].max(dim=1)[0].mean()}" \
                              f"| training_loss: {loss.mean().item() / env_mimo_relay.episode_length:.3f} " \
                              f"| gpu memory:{torch.cuda.memory_allocated():3d} " \
                              f"| elapsed_time:{time.time() - start_time}"  # todo
                pbar.set_description(description)
                if if_save:
                    log_path = open(os.path.join(save_dir, 'logs', 'log.txt'), "a+")
                    log_path.write(description + '\n')
                    log_path.close()
            if if_wandb:
                wandb.log({f"train_sum_rate_SNR=10": sum_rate_train[:, :, 0].max(dim=1)[0].mean(),
                           f"test_sum_rate_SNR=10": sum_rate[:, :, 0].max(dim=1)[0].mean(),
                           f"test_sum_rate_SNR=20": sum_rate[:, :, 1].max(dim=1)[0].mean(),
                           "training_loss": loss.mean().item() / env_mimo_relay.episode_length,
                           "elapsed_time": (time.time() - start_time)})


if __name__ == "__main__":
    Parser = ArgumentParser(description='ArgumentParser for ElegantRL')
    Parser.add_argument('--gpu', type=int, default=0, help='GPU device ID for training')
    Parser.add_argument('--user', type=int, default=8, help='the number of user, equal to the number of antenna')
    Parser.add_argument('--mode', type=int, default=3, help="3=RL+curriculum, 4=supervised_MMSE+curriculum")

    Args = Parser.parse_args()
    GPU_ID = Args.gpu
    N_USER = Args.user
    N_MODE = Args.mode

    REWARD_MODES = ['empirical', 'analytical', 'supervised_mmse', 'rl', 'supervised_mmse_curriculum']
    run(gpu_id=GPU_ID, n_user=N_USER, n_mode=N_MODE)

📝 GraphMaxCut: the solution_x of graph set 14, 15, 22, 49, 55, 70

the solution_x of graph_set 14, 15, 22, 49, 50, 55, 70

Graph Score
g14 3029
g15 2995
g22 13167
g49 5712
g50 10012
g55 10017
g70 9358

SlnX14 denotes Solution X of Graph 14
"""base64 string""" # the socre of this solution.

SlnX14 = """
2dbChJAXfdo2GRp49ecgPjQwfRSIJqcfANlGZMwwAOZzjFXMXFcLYmRi27fT49J38CH8NUHFf8nLFzuUQh_LltgK6ofnt0P2NEwwUdURMPtFC8ZhlzftdQJj
MQ1aEyvV6RqIv8
"""  # 3029

SlnX15 = """
AqOThDGdbjuzr7FAoXiCBgbwlc3lsy9bo2vb$KWBJ51OOIjkGcCBKwsQtu0zrlerpyWzpoAdOyeeoSMb3SoG$DSge$TNnfvsDwiivHk3JNFWTdXtiea2IEAg
$hxSFRWpKuqFo
"""  # 2995

SlnX22 = """
15M1wnHOEdRzp8Zws3XH0KzUDip$CCuC0dhgj9onOtyhWEbpH6hTzmSNozsdeNqM6jW7mwIC_nfrh1TYS8uz8St1$gz4rvd$8H5IfzhOOg3bXg7VS$4GtxyJ
$YothXUyFlOmUzaRwD31BeyYHKZPejOTcvXfcXsL$m$utCAABFN77HQCYHq5eNgTY9hy8oSivuLd09Crlz5AKI7u7ZNBKZPtOEM$sA00vFU9QmkX7e6JzPS3
32fItphr4PhDkGBFADHgVNBREHqPMO5BeIYPAah8chsT2ZesqvwFF2PE0zUbyNJY$2iYA1RCZWVxbPo1CqfgQaE5ddDWrF
"""  # 13167

SlnX49 = """
gggggrL9LLLLLggL9LLLLKgjQggggfLMhQggggjLAbLLLLQgbQLLLLLAhLgggggLLhMgggghLQhLLLLLgjQbLLLLIgLLAggggbALggggggggggrLLLLMfLLL
LLLLLLKgggggbQgggggggggfLLLLLgLLLLLLLLLLQgggggsfKggggggggrLLLLLggjLLLLLLLKggggggLLAgggggggfLLLLLMghLLLLLLLLQgggggbLIgggg
ggggrLLLLLggrLArLLLLLgggggfLIghLKggLghLLLLLQhLLQgjLIhLIgggggLIggLLQgrKgrLLLLLhLLMggLMgjKggggfQgggjLMgbLQjLLLLQLLLLIgjKgg
sgggggMggggfLQjLLLLLLLIbLLLLQgrAgggggggrgggggLKfLLLLLLRKfLLLLMgfQggggghMjQggggbLQLLLLLLQbALLLLLAgsgggggfLgMgggghLKbLLLLL
AfKbLLLLIgjgggggjLQj
"""  # 5712

SlnX50 = """
3$yNJNKF0O_oM3vugH093O1fuLz3oC8kX4Gmjj7TtPTrL4juGHoRVsD7OHNjVZj6BjbU5pIckCFn98vC4ZEUohXPoU6ENeshvFtzxZK$D_T0ZFRvAgjxxsfD
Z__hc$C8$kzZDo9JPfxDh7KCbCn5pTuGX9pNnC6XTn57C5FqEl26knwUxd6Y7GfF5wqJvcHRA3Z_yoPuBo8sw3K7m6PFtwo1czNu7drNj6te66N$qhRONsUE
qXjOu9pCTFa$by9AfSTIMjiOxkj2I1klq3gKeS7Ibv4vQn6kVx4uUU0Gt083Wt1Go_2E7pxBzrEFEqz3o3KPIwjxj78ZcGuaCfjzuMYFEYIafBBW9cGItE0S
ZwXrCvdjLc46j5xHCwbAyoEjeiq4gwxEbufV49dDyN981WqDzb5$QD9tYneyc8Ow7$5LM04fNrCceq2T1qUitZmQNwGWVYZVCObLLeHzjxZf16NeKePyA3tO
byOPWuKEpX2VuwCGl_kgGxUQHxu9Lj83uMjsw99YNrwHbXYu5vXTObMeLb$lwg0h4g34bg9hnw4KCSXhqwHNPwJqUqbPolanMUIY4gGHAgJesJZPDp2$Pi7I
5ihjuBfCjs9jZKZ$yvvR0zvS4ZihdPfJ3nSfiRUd$yn_BAIOsUPFA912ye1eSlAFEAzefeOG7LZlgexCIfy9bRmMpuGrqrqsb$8THT_Io2biNsT5FhcBgIKB
TnTiubOn$noF4o0Rq96SeKBu6mgc$B75zVYvegCYrLdSVfYEJlqn9okrC8MttD0YtdBWH27BWIYaQk2LUrV0jb4$$pqlHo9xoWlXusOw0qWtN$Hcty
"""  # 10012

SlnX55 = """
2zIr9xJvYT_ZNOrNX06oRyw2eSmGJ825qGQ5V5LClEsnGzndFFLMTwfyO4ra4xuFSLXGHJErt$SxrPwawntywfNP9nfX7TX4DKs8c9wUfkVvWRMtmPjbWG7o
RBppqloq2mwguPhW$QL$V64WcxyRR9SYldF7OEr4Qb2hsS2p$4Lp2cUUYyhsFSRJy97JM1OUxgSl26B2$ZS2z9_gOGH1uicbuQT6dyGRNWwY6NqAvbYX4u_T
aneOixd4t0DcXWtEwTR8UOcGc$5d0ZzahbTb7xIyh3HWYTG_mjutsfDS10ceGHrGLPny0eQJx009AZLpjHjASozZcDmun6lmQOV$zwYtdwTsXga3Q$es0DbJ
HGsRh5eLLKQa5iFLMc406y8bQdOrdJUPWXOSgZzyXPRcSJD2e1OED7VmB5C_54VCvden70073UJB0nHWyji4Zpdsnuaz6stg2Vv99AJSepe2FHcAa1bu7FOV
Ypm_v278KbF5Hhf_vXhnAlpXcbEYq0oFKnn7Oxg6TWZX5Hn_4_szADYWZ_uT6PqNBCBvXCgZ_$orpSOx8Sw4gh9IsnWy9aPagzaHoEncwIlgGkTpqvjESi55
bvNmRhLaI$4KAaUImWPZzRn6oZNbeOb7Al9qbak5c0Q9V7eTsbdnftfShP8lJAMRQkUnVALIgxdPCSGtvWoTscoq7mzosvsoQU1ZoPXMoACIogh4yc6hcvVz
84b1qrwCAsjBYqJVnn1kdSpRP7gsiYyxQK8yJkZvr80nSCoLodqsbytQm$32RgAC2_AMfFNwVqFOTIhGMjSqalW5LvgeVHS8z0N03o8amunh4wmsfU

"""  # 10017
SlnX70 = """
3AcGoeUhKdVU72ZiLyfxgUIhgNQQdEfI569nHq$V5IB8ZsZiZY63iDDSZ6VJsToTMwXTCurzbeCHfyYL_CfRGW1vej_thDlqh$Ee7R5bcXVLISB_MiL6dOyJ
KWhBoU1YXPKuqol2GvULXW105Mh7XJEnQZXOh_NUfbRU2dtM$BRWf17EBl3VyhzOO_yGxi5YRd$dJihU5rRvYLhghN4jyEQZVEp$sxv$5w7vLHfyUrx3u4jO
knlNNiBq9B5p3opgCGog8w38H9SNoUZDjc$XDAit$A8MpKF3ouZqtS0bs46RaY7ZetlJJtlnUC5YKSeafaApJD$mY54ouIdSmH3pl30dsn9_Z3c0XdJ_$60Y
Hm3lf1Z$cYXDSCd6UlcqWGwPkAX7412M_i77J3YopwgsgvlJeyfGPDn84UmP1U9420mp3GiCGj1UupE7ccZc4hTczRs41FGvmi3lcu5IJYmpBAYQYmX0Thqz
PqSQT06_eLTSamhfCwFya11kc4hh1aUbbcpiv5XQqdWf8SvDt6UOtN__mYniq5g5YywE05WBWzio6bRZUr0mfJX7p6Nn34AcQQ_zbSLVcsZLENUyM29RKfkw
E1b43g04$mq0BgeZwVM5HMtt3mI$Q15CMOEriM19KD3I$Wx48af6oJqyN69UpyYy1uSuP64L0Rom3DdVOrDAw7uSFOt0WNus_$ZvSQzVEFLdQb0d1yNC9Lu7
AZWerkE$JrhU1Hfa2W$WyGwu6bpT_a3ZKsBEhUH6GCA3P45DrYrd57K8z$lE88iz1Se9ICPg2sxN9ignqQyjvSlKWQWe_eXSigDOBcOQZC9DX_iulwo2aI5A
uMRrvdjhmjef8EMJZFhkLp2u5gv1Gx3ISl7A5rxd5NvQ372XXJTYi0ZYXr8mAVRO826_tIs27Akoar1Kjy2SjRlrK1zCSq2XroIYV82voeoGGZubHjp1A2Ov
eS_kpXyV5cWCNhVomaD$knBhbL7Y6lPf5UkLfiz48PquooOh7aL6lsa3d7c0cwoRksn6h8RGcT9vvKDc01aIP5q6GEgAR7Suey_iE2HMc2Fv_Zr4ZYrm3MTu
4YHS1DN9g6GJsDCvwsp9LqYayT2wQA$H54I4edj7LrFb7ZlJjbVF57StGESAggUnldR6vsGvUkyRMHHjp3l9f29oIzf6QKztQHTj59uEqFDoypgm4ZvMlFub
z1rHQ4WSytr3OnZta42Au5Y7xatFdbl74dtyMJlXwVM9nNypB6VR_6fAkf_8phP7AtjEXzo0_4tkN59LphlPaQq2pLRqsNxskI9z2Q5TdoFmeb1xWkICPS4o
i5AB4LozUwz80VpPo4LDtVGIhs6jYME31mO0k6FoVh2i8gLRmL0wvqsBncNiaRNDahR5z1xuuGSERZQW1Gh98J7PBsYeVGsejp316q7yTPWR7nBnPQinyv1d
ULnnlmDdG8xSq6g1slTFR16ngzfCOgMNRdjbmpo8O1a9jpIJ8nrCYxiv$cFNedJJ3eRktLMXQ8QvqG_b7InRUfpmyYEFeR8bCvZTg7wZ_RgRRtP2$D3jApeD
sVKJ0hVuqQRNI3bWuEZNVPC_BhuQjsTMrKSQSH6YZCp7d5uRBpJbHI$fQT7s8RCkW5zA6GcY0Qt3YJS5j8mH918L28Os9$rK9MYIVlAMd7_
"""  # 9358

check the solution_x

def check_sln_x():
    graph_name, sln_x = 'gest_14', SlnX14

    env = GraphMaxCutEnv(graph_name=graph_name)
    best_sln_x = env.str_to_bool(sln_x)
    best_score = env.get_scores(best_sln_x.unsqueeze(0)).squeeze(0)
    print(f"NumNodes {env.num_nodes}  NumEdges {env.num_edges}")
    print(f"score {best_score}  sln_x \n{env.bool_to_str(best_sln_x)}")

convert the solustion_x from base64 string to tensor bool:

graph_name, sln_x = 'gest_14', SlnX14
env = GraphMaxCutEnv(graph_name=graph_name)
best_sln_x = env.str_to_bool(sln_x)

def write_result(result: Union[Tensor, List, np.array], filename: str = 'result/result.txt'):
    # assert len(result.shape) == 1
    # N = result.shape[0]
    num_nodes = len(result)
    directory = filename.split('/')[0]
    if not os.path.exists(directory):
        os.mkdir(directory)
    with open(filename, 'w', encoding="UTF-8") as file:
        for node in range(num_nodes):
            file.write(f'{node + 1} {int(result[node] + 1)}\n')

Tensor Chain Contraction Demo: REINFORCE and Brute Force

We have developed a training demo that utilizes REINFORCE and a brute force baseline to find the best contraction order for a tensor chain. We welcome any suggestions or feedback on this demo and environment!

Update Jan 10, 2023, Extend the environment design from tensor train to tensor networks.

@spicywei Wei, @Yonv1943 Jiahao, and Shixun extend the formulation of the tensor train environment to the tensor network.
classical_simulation_01102023.pptx

Update Jan 09, 2023

  • #7 Thanks to Wei @spicywei and Shixun @shixun404 developed a tensor train demo that achieves optimal for the tensor train with 4 tensors.

Update Jan 06, 2023

ElegantRL_Solver Website Development

This issue pertains to website development. Please don't hesitate to contact us if you have any feedback or suggestions. We look forward to hearing from you! 😃 😃

📝 update GraphMaxCutEnv to VecEnv

import torch as th
import numpy as np
from torch import Tensor


class GraphMaxCutEnv:
    def __init__(self, num_envs=8, device=th.device('cpu')):
        txt_path = "./graph_set_G14.txt"

        with open(txt_path, 'r') as file:
            lines = file.readlines()
            lines = [[int(i1) for i1 in i0.split()] for i0 in lines]

        num_nodes, num_edges = lines[0]
        edge_to_n0_n1_dist = [(i[0] - 1, i[1] - 1, i[2]) for i in lines[1:]]

        '''
        n0: index of node0
        n1: index of node1
        dt: distance between node0 and node1
        p0: the probability of node0 is in set, (1-p0): node0 is in another set
        p1: the probability of node0 is in set, (1-p1): node0 is in another set
        '''

        n0_to_n1s = [[] for _ in range(num_nodes)]  # 将 node0_id 映射到 node1_id
        n0_to_dts = [[] for _ in range(num_nodes)]  # 将 mode0_id 映射到 node1_id 与 node0_id 的距离
        for n0, n1, dist in edge_to_n0_n1_dist:
            n0_to_n1s[n0].append(n1)
            n0_to_dts[n0].append(dist)
        n0_to_n1s = [th.tensor(node1s, dtype=th.long, device=device) for node1s in n0_to_n1s]
        n0_to_dts = [th.tensor(node1s, dtype=th.long, device=device) for node1s in n0_to_dts]  # dists == 1
        assert num_nodes == len(n0_to_n1s)
        assert num_nodes == len(n0_to_dts)
        assert num_edges == sum([len(n0_to_n1) for n0_to_n1 in n0_to_n1s])
        assert num_edges == sum([len(n0_to_dt) for n0_to_dt in n0_to_dts])

        self.num_envs = num_envs
        self.num_nodes = len(n0_to_n1s)
        self.num_edges = sum([len(n0_to_n1) for n0_to_n1 in n0_to_n1s])
        self.n0_to_n1s = n0_to_n1s
        self.device = device

        '''为了高性能计算,删掉了 n0_to_n1s 的空item'''
        v2_ids = [i for i, n1 in enumerate(n0_to_n1s) if n1.shape[0] > 0]
        self.v2_ids = v2_ids
        self.v2_n0_to_n1s = [n0_to_n1s[idx] for idx in v2_ids]
        self.v2_num_nodes = len(v2_ids)

    def get_objective(self, p0s):
        assert p0s.shape == (self.num_envs, self.num_nodes)

        sum_dts = []
        for env_i in range(self.num_envs):
            p0 = p0s[env_i]
            n0_to_p1 = []
            for n1 in self.n0_to_n1s:
                p1 = p0[n1]
                n0_to_p1.append(p1)

            sum_dt = []
            for _p0, _p1 in zip(p0, n0_to_p1):
                # dt = _p0 * (1-_p1) + _p1 * (1-_p0)  # 等价于以下一行代码
                dt = _p0 + _p1 - 2 * _p0 * _p1
                sum_dt.append(dt.sum(dim=0))
            sum_dt = th.stack(sum_dt).sum(dim=-1)
            sum_dts.append(sum_dt)
        sum_dts = th.hstack(sum_dts)
        return sum_dts

    def get_objectives_v1(self, p0s):  # version 1
        device = p0s.device
        env_is = th.arange(self.num_envs, device=device)
        num_envs = self.num_envs
        num_nodes = self.num_nodes

        n0s_to_p1 = []
        for n1 in self.n0_to_n1s:
            num_n1 = n1.shape[0]
            if num_n1 == 0:  # 为了高性能计算,可将 n0_to_n1s 的空item 删掉
                p1s = th.zeros((num_envs, 0), dtype=th.float32, device=device)
            else:
                env_js = env_is.repeat(num_n1, 1).T.reshape(num_envs * num_n1)
                n1s = n1.repeat(num_envs)
                p1s = p0s[env_js, n1s].reshape(num_envs, num_n1)
            n0s_to_p1.append(p1s)

        sum_dts = th.zeros((num_envs, num_nodes), dtype=th.float32, device=device)
        for node_i in range(num_nodes):
            _p0 = p0s[:, node_i].unsqueeze(1)
            _p1 = n0s_to_p1[node_i]

            dt = _p0 + _p1 - 2 * _p0 * _p1
            sum_dts[:, node_i] = dt.sum(dim=-1)
        return sum_dts.sum(dim=-1)

    def get_objectives(self, p0s):  # version 2
        device = p0s.device
        env_is = th.arange(self.num_envs, device=device)
        num_envs = self.num_envs
        # num_nodes = self.num_nodes
        v2_num_nodes = len(self.v2_ids)

        v2_p0s = p0s[:, self.v2_ids]

        n0s_to_p1 = []
        for n1 in self.v2_n0_to_n1s:
            num_n1 = n1.shape[0]
            env_js = env_is.repeat(num_n1, 1).T.reshape(num_envs * num_n1)
            n1s = n1.repeat(num_envs)
            p1s = p0s[env_js, n1s].reshape(num_envs, num_n1)
            n0s_to_p1.append(p1s)

        sum_dts = th.zeros((num_envs, v2_num_nodes), dtype=th.float32, device=device)
        for node_i in range(v2_num_nodes):
            _p0 = v2_p0s[:, node_i].unsqueeze(1)
            _p1 = n0s_to_p1[node_i]

            dt = _p0 + _p1 - 2 * _p0 * _p1
            sum_dts[:, node_i] = dt.sum(dim=-1)
        return sum_dts.sum(dim=-1)

    def get_rand_p0s(self):
        device = self.device
        return th.rand((self.num_envs, self.num_nodes), dtype=th.float32, device=device)


def check_env():
    th.manual_seed(0)
    env = GraphMaxCutEnv(num_envs=6)

    p0s = env.get_rand_p0s()
    print(env.get_objective(p0s))
    print(env.get_objectives_v1(p0s))
    print(env.get_objectives(p0s))


check_env()

📝 update graph max cut

穷举所有的解,并用GPU模拟器批量验证

exhaustion_search()
exhaustion_search_result()

根据输入的图,例如g14,我们选取前 num_limit 个点,得到规模较小的问题:

  • Theta 表示问题的解
  • G14 表示选择了g14这个图
  • L30 表示 选取这个图的前30个节点
  • "0Hpfvw" 是穷举后得到的最优解的64进制表达
  • best_score 91 表示最优解的得分是91
  • count 2*1 表示有两个最优解
ThetaG14L10 = "2U"  # best_score 21  count 2*2
ThetaG14L12 = "09v"  # best_score 27  count 2*6
ThetaG14L14 = "0nd"  # best_score 34  count 2*2
ThetaG14L16 = "2US"  # best_score 40  count 2*9
ThetaG14L18 = "09vo"  # best_score 47  count 2*23
ThetaG14L20 = "1QQb"  # best_score 54  count 2*1
ThetaG14L22 = "1E_E"  # best_score 61  count 2*1
ThetaG14L24 = "09xeR"  # best_score 68  count 2*4
ThetaG14L26 = "17dBj"  # best_score 76  count 2*1
ThetaG14L28 = "4SwUU"  # best_score 83  count 2*4
ThetaG14L30 = "0Hpfvw"  # best_score 91  count 2*1

算到 num_limit=30 就不算了,是因为求解出它已经需要4小时。


表示从 version1 到 version4,依次用越来越进阶的方法,去迭代更新问题的解

run_v1_update_theta_by_grad()
run_v2_update_theta_by_adam()
run_v3_update_theta_by_opti()
run_v4_update_theta_by_opti()

表示 version1 到 version1,依次用越来越进阶的方法,去通过自回归去生成问题的解

run_v1_generate_theta_by_auto_regression()

unit tests

检查环境是否有问题,检查搜索出来的解的得分。

check_env()
check_theta()

📝 The tricks in Learn to optimize in TNCO sycamore

下面 theta 表示 TNCO任务的“解”,它是一个表示量子电路收缩顺序的tensor, 计算 theta.argsort() 就能获得有序的 edge_id,表示依次收缩某一条边。

  1. 维持了两个 ReplayBuffer,一个保存了模型实时迭代产生的theta,另一个保存了得分较好的theta。这样让得分好的theta不至于被 ReplayBuffer 的 FIFO 规则删掉。

计算 keep_score
https://github.com/AI4Finance-Foundation/ElegantRL_Solver/blob/41a58d0ecb9daeddfa635d19a3741b0a29162342/rlsolver/rlsolver_learn2opt/tensor_train/TNCO_H2O.py#L418-L419

根据 keep_score 找出需要保存到 buffer0 的 theta
https://github.com/AI4Finance-Foundation/ElegantRL_Solver/blob/41a58d0ecb9daeddfa635d19a3741b0a29162342/rlsolver/rlsolver_learn2opt/tensor_train/TNCO_H2O.py#L372-L375

  1. 从更好的解附近开始搜索

historical_theta 是一个得分好的theta,我们对它加上噪声并在它附近开始迭代
https://github.com/AI4Finance-Foundation/ElegantRL_Solver/blob/41a58d0ecb9daeddfa635d19a3741b0a29162342/rlsolver/rlsolver_learn2opt/tensor_train/TNCO_H2O.py#L402-L408

我们也从 保存了较好的 theta 的 buffer0 里随机选出的theta 开始迭代
https://github.com/AI4Finance-Foundation/ElegantRL_Solver/blob/41a58d0ecb9daeddfa635d19a3741b0a29162342/rlsolver/rlsolver_learn2opt/tensor_train/TNCO_H2O.py#L410-L419

  1. 在训练迭代器之后,使用迭代器进行推理

可以看到

Generative Meta-Learning for Large-Scale Non-Convex Optimization (RL)

Hello!

You can find it here: https://github.com/kayuksel/generative-opt

Just change the following lines for combinatorial optimization.

I also implemented Fast CMA-ES and Tabu Search in PyTorch.

Here is Fast CMA-ES: https://github.com/kayuksel/torch-tsp-es/

Let me know if Tabu Search would also be helpful, I can share.

Please to don't forget to contribute back, and cite when possible.

Sincerely,
Kamer

class Generator(nn.Module):
    def __init__(self, noise_dim = 0):
        super(Generator, self).__init__()
        def block(in_feat, out_feat):
            return [nn.Linear(in_feat, out_feat), nn.Tanh()]
        self.model = nn.Sequential(
            *block(noise_dim+args.cnndim, 512), *block(512, 1024), nn.Linear(1024, len(assets)))
        init_weights(self)
        self.extract = Extractor(args.cnndim)
    def forward(self, x):
        mu = self.model(self.extract(x))
        return torch.bernoulli(mu.sigmoid())

actor = Generator(args.noise).to(device)
opt = torch.optim.AdamW(filter(lambda p: p.requires_grad, actor.parameters()), lr=1e-3)

best_reward = None

for epoch in range(args.iter):
    torch.cuda.empty_cache()
    weights = actor(torch.randn((args.batch, args.noise)).to(device))
    weights = weights / weights.sum(dim=1).reshape(-1, 1)

    loss = calculate_reward(weights.clone(), valid_data[:-test_size], index[:-test_size], True)
    opt.zero_grad()
    loss.mean().backward()
    nn.utils.clip_grad_norm_(actor.parameters(), 1.0)
    opt.step()

    with torch.no_grad():
        weights = weights[rewards.argmin()]
        test_reward = calculate_reward(weights.unsqueeze(0), 
            valid_data[-test_size:], index[-test_size:])[0]

:pencil: convert nodes_id to edges_id and convert back

仿真环境需要一个功能:

把储存了节点收缩顺序的list,从 记录两个节点收缩顺序,到记录这两个节点对应的边的收缩顺序。

见代码TNCO_env.py中的:

先创建仿真环境这个类,选择想要转换的电路

def unit_test_convert_node2s_to_edge_sorts():
    gpu_id = int(sys.argv[1]) if len(sys.argv) > 1 else 0
    device = th.device(f'cuda:{gpu_id}' if th.cuda.is_available() and gpu_id >= 0 else 'cpu')

    nodes_list, ban_edges = NodesSycamoreN12M14, 0
    # nodes_list, ban_edges = NodesSycamoreN14M14, 0
    # nodes_list, ban_edges = NodesSycamoreN53M12, 0
    # nodes_list, ban_edges = get_nodes_list_of_tensor_train(len_list=8), 8
    # nodes_list, ban_edges = get_nodes_list_of_tensor_train(len_list=100), 100
    # nodes_list, ban_edges = get_nodes_list_of_tensor_train(len_list=2000), 2000
    # from TNCO_env import get_nodes_list_of_tensor_tree
    # nodes_list, ban_edges = get_nodes_list_of_tensor_tree(depth=3), 2 ** (3 - 1)

    env = TensorNetworkEnv(nodes_list=nodes_list, ban_edges=ban_edges, device=device)
    print(f"\nnum_nodes      {env.num_nodes:9}"
          f"\nnum_edges      {env.num_edges:9}"
          f"\nban_edges      {env.ban_edges:9}")

下面演示了把 edge_ary 转化成 node2s 转化回 edge_ary 的过程,调用了两个函数:

  • edge_ary → edge_sort → node2s node2s = env.convert_edge_sort_to_node2s(edge_sort=edge_ary.argsort(dim=0))
  • node2s → edge_sort edge_sort = env.convert_node2s_to_edge_sort(node2s=node2s).to(device)
    num_envs = 6

    # th.save(edge_arys, 'temp.pth')
    # edge_arys = th.load('temp.pth', map_location=device)

    edge_arys = th.rand((num_envs, env.num_edges - env.ban_edges), device=device)
    edge_ary = edge_arys[0]
    print(edge_ary.argsort().shape)
    print(edge_ary.argsort())
    node2s = env.convert_edge_sort_to_node2s(edge_sort=edge_ary.argsort(dim=0))
    edge_sort = env.convert_node2s_to_edge_sort(node2s=node2s).to(device)
    print(edge_sort.shape)
    print(edge_sort)

    print(edge_sort - edge_ary.argsort())
    edge_sorts = edge_sort.unsqueeze(0)
    multiple_times = env.get_log10_multiple_times(edge_sorts=edge_sorts)
    print(f"multiple_times(log10) {multiple_times.numpy()}")

输出是:(在这个电路下,nodes_list, ban_edges = NodesSycamoreN12M14, 0

num_nodes             51
num_edges             99
ban_edges              0
torch.Size([99])
tensor([30, 36, 49, 35, 55, 65,  0, 28, 61, 52, 45, 69, 10, 21, 83, 18, 56,  9,
        14, 70, 39, 19, 74, 43, 68, 75, 60, 81, 29, 47, 94, 24, 58, 77, 64, 15,
        13, 72, 87, 32, 71, 51, 85,  6, 44, 34, 96, 40, 38, 97, 46, 53, 82, 84,
        22, 90, 25, 23, 33, 92,  1, 62, 42, 91, 67, 93, 26, 98, 79, 12, 16, 27,
        78, 95,  8, 11, 80, 20,  4, 57, 73, 54,  2,  7, 66,  3,  5, 88, 37, 59,
        17, 48, 50, 41, 86, 89, 76, 63, 31])
torch.Size([99])
tensor([30, 36, 49, 35, 55, 65,  0, 28, 61, 52, 45, 69, 10, 21, 83, 18, 56,  9,
        14, 70, 39, 19, 74, 43, 68, 75, 60, 81, 29, 47, 94, 24, 58, 77, 64, 15,
        13, 72, 87, 32, 71, 51, 85,  6, 44, 34, 96, 40, 38, 97, 46, 53, 82, 84,
        22, 90, 25, 23, 33, 92,  1, 62, 42, 91, 67, 93, 26, 98, 79, 12, 16, 27,
        78, 95,  8, 11, 80, 20,  4, 57, 73, 54,  2,  7, 66,  3,  5, 88, 37, 59,
        17, 48, 50, 41, 86, 89, 76, 63, 31], dtype=torch.int32)
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0])
multiple_times(log10) [12.06995569]

⚗️ pytorch.grad

Get the gradient of network parameters:

import torch


class Net(torch.nn.Module):
    def __init__(self, inp_dim=4, out_dim=2):
        super().__init__()
        self.net = torch.nn.Linear(inp_dim, out_dim)

    def forward(self, inp):
        return self.net(inp)


def run():
    batch_size = 3
    inp_dim = 4
    out_dim = 2

    net = Net(inp_dim, out_dim)

    inp = torch.ones((batch_size, inp_dim))
    out = net(inp)
    assert out.shape == (batch_size, out_dim)
    lab = torch.ones_like(out)
    obj = torch.abs(out - lab).mean(dim=1)
    assert obj.shape == (batch_size,)

    # optimizer.zero_grad()
    out.sum().backward()
    # optimizer.step()
    
    for param in net.parameters():
        print(param.shape, param.grad)
    """print
    torch.Size([2, 4])  tensor([[3., 3., 3., 3.], [3., 3., 3., 3.]])
    torch.Size([2])     tensor([3., 3.])
    """


if __name__ == '__main__':
    run()

✨ update 'Learn to optimize' to batch size mode

问题需要求出的解,我们定义为 theta。例如在miso问题里,theta.shape==(2, 8, 8),其中2是复数的实部和虚部,(8, 8) 分别是用户数量和基站天线数量。

这是之前的方案:

  • 使用LSTM,需要保存两个隐藏状态,分别是 hidden state 和 cell state
  • 需要求解theta的时候,对于模型LSTM,输入是某一个解theta inp.shape=(2*8*8, 1),输出是这个解对应的梯度 grad out.shape=inp.shape

里面对于解 theta 的不同特征,共用了一样的 LSTM模型参数

在 miso问题里面,这是恰当的,因为在 theta这个矩阵里,任意用户和任意天线,以及实部与虚部,都可以相互替换。所以它们被 flatten后放在 batch size 这个并行维度上,对于解 theta 的不同特征,共用了一样的 LSTM模型参数。

https://github.com/AI4Finance-Foundation/ElegantRL_Solver/blob/a7cd35b66a99600386efe1b642dc9b1453ed10f7/rlsolver/rlsolver_learn2opt/tensor_train/L2O_H_term.py#L52-L68

如果想要推广 'Learn to optimize' 到其他问题,那么就需要把 theta的特征维度从 batch size 维度移动到 inp_dim 或者 out_dim 上,这样修改后,训练将会变慢,但是训练后得到的最高分数不会改变。

这是为了让调整后的代码适用于 张量收缩任务(TNCO) @spicywei ,以及 图的最大割任务 (Graph max cut) @shixun404

调整代码如下:

  • 使用GRU,只需要保存一个隐藏状态 hidden state
  • 需要求解theta的时候,对于模型GRU,输入是某一批次的解theta inp.shape=(batch_size, 2*8*8),输出是这个解对应的梯度 grad out.shape=inp.shape
class OptimizerOpti(nn.Module):
    def __init__(self, opt_dim, hid_dim):
        super().__init__()
        self.opt_dim = opt_dim
        self.hid_dim = hid_dim
        self.num_rec = 4

        self.activation = nn.Tanh()
        self.recurs1 = nn.GRUCell(opt_dim, hid_dim)
        self.recurs2 = nn.GRUCell(hid_dim, hid_dim)
        self.output = nn.Linear(hid_dim * 2, opt_dim)

    def forward(self, inp0, hid0):
        hid1 = self.activation(self.recurs1(inp0, hid0[0]))
        hid2 = self.activation(self.recurs2(hid1, hid0[1]))
        hid = th.cat((hid1, hid2), dim=1)
        return self.output(hid), (hid1, hid2)

完整代码见: https://github.com/AI4Finance-Foundation/ElegantRL_Solver/pull/119/files#diff-e03802a5a83ef6f88ad30c077b0f4cec4b4f6cc21f3cbac771087ea2824618ba


Compare

在MISO问题上,(加入更多人类先验知识的)旧方法肯定比新方法更快(达到最好的结果的耗时 1: 3 ),但是他们能达到的最高分是一样高的

以下是旧方法:batch size 并行维度被用来作为 theta特征维度的 LSTM 的结果

start training
    MMSE     5.598    15.900    31.134

     L2O     6.078    11.540    12.724    TimeUsed         9
     L2O     6.491    18.491    35.106    TimeUsed       160
     L2O     6.108    18.211    35.091    TimeUsed       311
     L2O     6.459    18.308    33.260    TimeUsed       465
     L2O     6.454    18.421    34.214    TimeUsed       615
     L2O     6.459    18.347    33.878    TimeUsed       754
     L2O     6.430    18.030    34.527    TimeUsed       892
     L2O     6.440    18.366    34.233    TimeUsed      1033

以下是新方法:区分batch size并行维度 和 theta 特征维度的GRU的结果

    MMSE     5.598    15.900    31.134

training start
     L2O     1.088     2.968     4.889    TimeUsed         7
     L2O     5.991    16.585    21.792    TimeUsed       313
     L2O     6.250    17.582    26.524    TimeUsed       613
     L2O     6.063    16.946    28.046    TimeUsed       927
     L2O     6.313    18.009    30.573    TimeUsed      1234
     L2O     6.189    17.932    32.204    TimeUsed      1551
     L2O     6.098    17.880    33.794    TimeUsed      1860
     L2O     6.299    17.909    34.490    TimeUsed      2165
     L2O     6.233    17.952    34.699    TimeUsed      2471
     L2O     6.285    18.033    34.079    TimeUsed      2785

📝 Graph: Learning Combinatorial Optimization Algorithms over Graphs

我们图上的组合优化问题,例如 Graph maxcut,下面的论文提出了对图结构进行了编码的方案:

直接复现上面的代码,因为版本问题,不容易,我有以下建议:

  • 建议复现他们的 PyTorch版本
  • 建议安装PyTorch的 0.8.5 版本
  • 建议安装 rdkit 和 boost 的最新版本

复现过程碰到问题,随时在这个issue 提出来。


建议复现他们的 PyTorch版本:

Dec. 22, 2017 update: pytorch version of structure2vec
For people who prefer python, here is the pytorch implementation of s2v:
https://github.com/Hanjun-Dai/pytorch_structure2vec

建议安装PyTorch的 0.8.5 版本

2017年的PyTorch 的版本是 0.4 ~ 0.8:

  • 太旧的PyTorch版本,可能现在新的电脑没法找到合适的 cuda
  • 太新的PyTorch版本,可能和旧代码不匹配
  • 所以我们建议下载PyTorch 版本 0.8.5 (2018年的),对应的cuda version 让 conda 自动帮我们安装

0.x 版本的PyTorch 经常能看到下面的代码,这个在 0.8.5 之前都兼容
https://github.com/Hanjun-Dai/pytorch_structure2vec/blob/bcf20c90f21e468f862f13e2f5809a52cd247d4e/graph_classification/main.py#L7C1-L8C4

from torch.autograd import Variable
from torch.nn.parameter import Parameter

建议安装 rdkit 和 boost 的最新版本

和C++有关的,我建议安装最新的 rdkit 和 boost(rdkit一直都有维护) ,如果最新的版本安装后,发现依然编译不成功,才去降低版本并更换C++编译器到对应的低版本(能上网查到)

The following versions have been tested. But newer versions should also be fine.
rdkit : [Q3 2017 Release](https://github.com/rdkit/rdkit/releases/tag/Release_2017_09_1, Release_2017_09_2)
boost : Boost 1.61.0, 1.65.1

运行

按Readme 的方式来


我之前单独抽取里面的 PyTorch 代码跑过它的推理过程,只需要PyTorch就好, rdkit 和 boost 这两个库用来提供“训练标签”,不想要完整复现,只想要了解网络结构的情况下,可以跳过。
https://github.com/Hanjun-Dai/pytorch_structure2vec/tree/master/s2v_lib

✨ DataParallel and DistributedDataParallel for speed up training.

DataParallel: multiple thread for single machine multiple GPUs

DistributedDataParallel: multiple processing for single or multiple machines and multiple GPUs.

It is very easy to add DataParallel into the code, but DataParallel brings less speed up.

It's a little tricky to use because DistributedDataParallel needs to be started from the command line, but it gives a significant speedup with 4 GPUs in single machine in high GPU memory.

📝 update the result of sycamore-n53 m12, m14, m16, m20

result: 以下的结果都需要+log10(2) ,可以参考以下讨论:
#102 (comment)

sycamore Result1 Result2 NumSamples UsedTime
n53 m12 15.478 16.449 185856 56930
n53 m14 16.610 17.748 173568 54152
n53 m16 22.014 32.511 153088 52947
n53 m18
n53 m20 21.782 22.585 148992 58058

sycamore n53 m12

      228       22.789    2.196e+01    TimeUsed     54169
      232       24.986    2.228e+01    TimeUsed     55085
      236       25.889    2.051e+01    TimeUsed     56010
      240       20.804    1.967e+01    TimeUsed     56930
| buffer.save_or_load_history(): Save ./task_TNCO_00/replay_buffer_states.pth    torch.Size([185856, 414])
| buffer.save_or_load_history(): Save ./task_TNCO_00/replay_buffer_scores.pth    torch.Size([185856, 1])

min_score:    15.478
avg_score:    24.392 ±     4.663
max_score:    49.068
best_result:
tensor([235, 371, 215, 137, 246,  62, 320, 178, 147, 325,  31,  17, 274, 333,
        234, 389, 142,  49, 311, 351, 271, 218,  89, 121,  96, 401,  25, 230,
          8, 369, 350, 257, 318, 248, 229, 236,  52,  14, 292, 139, 383, 343,
        207, 195, 209,  47, 394, 355, 329, 149,  53, 130, 372, 398, 273, 339,
        278, 314, 298,  28, 206,  98, 384,  74, 217, 256,  65, 134, 354, 323,
        167, 166, 390, 151, 190, 182, 382, 367, 128,  18,  10, 408, 321, 119,
        344, 100, 199, 120, 181, 405, 179, 288, 411, 140, 330, 305, 264, 336,
        208, 356, 168,  60, 266, 348, 242,  59, 268, 397, 214, 243, 143, 263,
        270,  87, 388, 125,  29, 204,   5,  16, 191, 282, 118, 322,  81, 104,
        211, 228,  34, 296,  76, 152, 114, 392, 406,  24, 171, 244, 116, 306,
        359, 138, 362, 338, 290, 227,  12, 308, 172, 237, 379, 197, 254, 259,
        146, 176, 252, 366, 332,  22,  43, 216,  99,  79, 275, 196,  67, 258,
        342, 294, 373, 198,  56, 283, 108, 346,  64, 175,  88, 102,  27, 135,
        123, 357, 324,  19,   7,  55, 319, 162,   9, 349, 193,  41, 192, 155,
        412, 352,  72, 186,  69, 160, 386, 267, 150, 312, 205,  20, 309, 364,
        272, 164,  82,  23,  33, 358,  68, 107, 345, 285, 226,  95,  85, 145,
        284,  94,  38, 233, 180, 378,   2,  42, 303, 300, 387, 360, 327,  91,
         32, 109, 240, 260, 287, 115, 184,  86, 249,  21, 203,  75,  78, 341,
        317, 286,   1, 276, 253, 131, 251, 241, 328,  36, 315, 310, 110,  11,
        245,  37, 377, 381,  40, 307, 297,   4, 158, 289,  83, 188, 111, 293,
        337, 396, 361, 280,  50, 380, 368,  84,  51,  54, 340, 212, 370, 169,
        154, 326, 385, 255,  44, 201, 232, 103, 353, 409, 262, 156, 291, 101,
        250, 200, 247,  80, 105, 194, 113, 157, 402, 265, 159, 174,  57, 133,
        141, 185,  58, 238, 106, 334, 129, 136, 313, 127,   3, 365,  61, 277,
         73,  92, 391,  46,  71, 213, 231,  26, 399, 144, 210, 304, 375, 374,
        148, 269, 331,  30,  13, 410, 363, 165, 400, 316, 124, 222,  93, 153,
        117,  39,  70, 170,   6, 132,  77, 224,  66,  63,  48, 239, 413, 302,
        376, 219,  45, 126, 173, 221, 223, 183, 404, 177,  97, 279, 187, 407,
        189,  15, 281, 395, 393, 403, 301, 112,  35,  90, 295, 225, 299,   0,
        163, 261, 220, 122, 202, 161, 335, 347], device='cuda:0')

      180       18.700    1.616e+01    TimeUsed     48409
      184       19.871    1.838e+01    TimeUsed     49428
      188       21.397    1.973e+01    TimeUsed     50447
      192       24.096    1.993e+01    TimeUsed     51460
| buffer.save_or_load_history(): Save ./task_TNCO_04/replay_buffer_states.pth    torch.Size([165376, 414])
| buffer.save_or_load_history(): Save ./task_TNCO_04/replay_buffer_scores.pth    torch.Size([165376, 1])

min_score:    16.449
avg_score:    25.135 ±     5.301
max_score:    49.670
best_result:
tensor([396,  88, 166, 366, 408,  81, 167,  51, 243, 238, 148,  90,   5, 222,
        159, 305, 361, 198,  22, 295, 321, 128, 339, 310, 169, 219, 375, 224,
        409, 372,  38, 241, 247, 146, 338, 200,  36,  52,  54, 178, 226, 234,
        173, 192, 117, 260, 108, 278, 387, 147, 245, 399, 227, 275,   1, 329,
        411, 140, 341, 152,  55, 132,  17, 312, 168, 297,  92, 385, 345, 237,
        119, 120, 102, 101,  44,  75,  94,  89, 332, 307,  74,  85, 212, 181,
        118, 348,  23, 412, 413,  69, 196, 386, 235,  43, 383,  45, 210, 172,
        286, 223, 256, 144, 183, 322, 253,  80, 261, 407, 291, 113, 301, 343,
        353, 216, 115, 378, 134, 158,  86,  73, 404,  35,  66, 106, 251, 136,
        161, 137, 274, 246,   8, 162,  97,  32, 157,  18,  91, 303, 410, 349,
        290, 177,  53, 397, 111, 373, 127, 105,  28, 201,  84, 104, 125, 346,
        323,  49, 347, 271, 304, 355, 208, 309, 124,  37, 344, 306, 360, 389,
         31, 377, 392,  30,  39, 284, 163, 351,  10, 255, 395, 186, 382, 184,
         82, 126, 130, 330, 142, 264, 342, 296, 123, 250,  20, 257, 313,  46,
        268,  34, 289, 170, 308, 380, 150, 265, 371, 213,  93, 154, 333, 262,
         61, 151, 232,  87,  56, 285, 206, 267, 263, 340, 217, 114, 317, 107,
         63, 390, 121, 139,   6, 194, 225, 336, 242, 112, 320, 356, 248, 391,
        283, 379,  29,  83,  64, 281, 292, 135,  42, 276, 324,  33, 402, 103,
        204, 314, 156, 193, 364,  68, 244, 352, 110, 369, 187, 272, 214, 365,
        252, 393, 359, 211, 164, 368, 400, 195,  40, 116, 279, 205, 259, 337,
        319, 199,   7, 240,  77,  72, 370, 209, 145, 122, 207,  60, 405, 327,
        160,  21, 273,  62, 334, 406, 100,  14, 197,  26, 311, 403,  16, 354,
        129, 269,  71, 109, 315, 374, 203, 266, 220, 335, 175, 230, 302, 202,
         19, 376, 153, 174, 328, 388, 188,  96, 138,  98, 287,  25, 326, 190,
        191, 239, 288,  76,  50, 179, 299,  48,   3,  78,  11, 228, 280,  57,
        215, 282,   2, 398, 149,  15,  47, 182, 300,  79, 236,  12, 249, 233,
        131, 218, 362, 363,  58,  67, 357, 155,   0,  41,  24, 270, 293, 325,
        229, 133,   9,  13, 277,   4, 298, 254,  65, 171, 358, 350, 180, 185,
        316, 367, 221, 331, 143,  59, 258, 394, 165, 176, 401, 318, 189,  70,
         27, 381, 231, 294, 141, 384,  95,  99], device='cuda:4')

🐛 contract in some orders on a circuit with ring struction may get incorrect multiplication counts.

收缩 sycamore 以及 tensor grid ,tensor ring 这种有环状结构的电路,会有bug,导致乘法次数计算错误
(刚好我们测试的 tensor train,tensor tree 没有环状结构)

需要有环状结构,且按某个顺序收缩张量节点,才会触发

下面的代码,在一个小规模的 sycamore 电路 NodesSycamoreN12M14 上得到,然后逐行检查发现了这个bug

num_nodes             51
num_edges             99
ban_edges              0

先粗略记录一下。

这是print代码

'''calculate the multiple and avoid repeat'''
contract_dims = node_dims_arys[node_i0] + node_dims_arys[node_i1]  # 计算收缩后的node 的邻接张量的维度 以及来源
contract_bool = node_bool_arys[node_i0] | node_bool_arys[node_i1]  # 计算收缩后的node 由哪些原初node 合成
# assert contract_dims.shape == (num_nodes, )
# assert contract_bool.shape == (num_nodes, )

print(';;;', i, node_i0, node_i1)
print(node_dims_arys[node_i0].numpy().astype(int))
print(node_dims_arys[node_i1].numpy().astype(int))
print(contract_dims.numpy().astype(int))
print(contract_bool.numpy().astype(int))

这是print内容。可以看到,对已经收缩的节点竟然进行了不可能的收缩,并且产生了多余的乘法次数。

;;; 52 tensor(3) tensor(9)
[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 352   0   0   0   0  96   0   0   0 128   0   0   0   0 320   0 128   0   0   0  64   0  64  64   0   0   0   0 192   0  64  64   0]
[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 352   0   0   0   0  96   0   0   0 128   0   0   0   0 320   0 128   0   0   0  64   0  64  64   0   0   0   0 192   0  64  64   0]
[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 704   0   0   0   0 192   0   0   0 256   0   0   0   0 640   0 256   0   0   0 128   0 128 128   0   0   0   0 384   0 128 128   0]
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 0 1 1 1 1 0 1 0 1 0 1 0 1 0 0 1 1 0 0 0 1 0 0 1]

🐛 find bug in TNCO env and the explain this env data struct

张量收缩计算图解.pptx


#92

已经提交PR 92 去修复此BUG


https://github.com/AI4Finance-Foundation/ElegantRL_Solver/blob/52b4dc3ac5b8461772751a7294f5c9c10fdba5a5/rlsolver/rlsolver_learn2opt/tensor_train/TNCO_env.py#L269-L271

上面的代码有bug,这里想要修改 list中存放的指针的指向,但是最右边的等号是一个【赋值】操作,它让指针重新指向新的地址,这是不正确的。

应该修改成

node_dims_tens = th.stack([self.node_dims_ten.clone() for _ in range(num_envs)])
node_bool_tens = th.stack([self.node_bool_ten.clone() for _ in range(num_envs)])
for i in range(run_edges):
    ...
    for j in range(num_envs):
        ...
        node_dims_arys = node_dims_tens[j]
        node_bool_arys = node_bool_tens[j]
        ...
        node_dims_arys[contract_bool] = contract_dims.repeat(1, 1)  # 根据 bool 将所有收缩后的节点都刷新成相同的信息
        node_bool_arys[contract_bool] = contract_bool.repeat(1, 1)  # 根据 bool 将所有收缩后的节点都刷新成相同的信息
        ...

举例:
初始化 arys = [torch.zeros(2) + i for i in range(5)],打印 arys

[tensor([0., 0.]),
 tensor([1., 1.]),
 tensor([2., 2.]),
 tensor([3., 3.]),
 tensor([4., 4.])]

修改指针指向 arys[0] = arys[1] = arys[2] = torch.zeros(2) -1,打印 arys

[tensor([-1., -1.]),
 tensor([-1., -1.]),
 tensor([-1., -1.]),
 tensor([3., 3.]),
 tensor([4., 4.])]

修改 arys[0],使用赋值操作 arys[0] = torch.zeros(2) + 0。重新打印 arys,确认指针指向是否正确

[tensor([0., 0.]),       -----> 我把 arys[0] 从-1 改成了 0
 tensor([-1., -1.]),    -----> 发现 arys[1] 没有跟着 arys[0] 一起变成 0,这是错误的
 tensor([-1., -1.]),    -----> 发现 arys[2] 没有跟着 arys[0] 一起变成 0,这是错误的
 tensor([3., 3.]),
 tensor([4., 4.])]

不能使用赋值操作改变指针指向的地址,应该用 arys[0][:] = torch.zeros(2) + 0
重新运行命令 修改指针指向 arys[0] = arys[1] = arys[2] = torch.zeros(2) -1
重新运行命令 修改 arys[0] 的赋值 arys[0][:] = torch.zeros(2) + 0
得到预期中的结果:

[tensor([0., 0.]),
 tensor([0., 0.]),
 tensor([0., 0.]),
 tensor([3., 3.]),
 tensor([4., 4.])]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.