GithubHelp home page GithubHelp logo

mveres01 / pytorch-drl4vrp Goto Github PK

View Code? Open in Web Editor NEW
432.0 14.0 118.0 1.35 MB

Implementation of: Nazari, Mohammadreza, et al. "Deep Reinforcement Learning for Solving the Vehicle Routing Problem." arXiv preprint arXiv:1802.04240 (2018).

Python 100.00%

pytorch-drl4vrp's Introduction

pytorch-drl4vrp

Implementation of: Nazari, Mohammadreza, et al. "Deep Reinforcement Learning for Solving the Vehicle Routing Problem." arXiv preprint arXiv:1802.04240 (2018).

Currently, Traveling Salesman Problems and Vehicle Routing Problems are supported. See the tasks/ folder for details.

Requirements:

  • Python 3.6
  • pytorch=0.4.1
  • matplotlib

To Run

Run by calling python trainer.py

Tasks and complexity can be changed through the "task" and "nodes" flag:

python trainer.py --task=vrp --nodes=10

To restore a checkpoint, you must specify the path to a folder that has "actor.pt" and "critic.pt" checkpoints. Sample weights can be found here

python trainer.py --task=vrp --nodes=10 --checkpoint=vrp10

Differences from paper:

  • Uses a GRU instead of LSTM for the decoder network
  • Critic takes the raw static and dynamic input states and predicts a reward
  • Use demand scaling (MAX_DEMAND / MAX_VEHICLE_CAPACITY), and give the depot for the VRP a negative value proportionate to the missing capacity (Unsure if used or not)

TSP Sample Tours:

Left: TSP with 20 cities

Right: TSP with 50 cities

VRP Sample Tours:

Left: VRP with 10 cities + load 20

Right: VRP with 20 cities + load 30

TSP

The following masking scheme is used for the TSP:

  1. If a salesman has visited a city, it is not allowed to re-visit it.

VRP

The VRP deals with dynamic elements (load 'L', demand 'D') that change everytime the vehicle / salesman visits a city. Each city is randomly generated with random demand in the range [1, 9]. The salesman has an initial capacity that changes with the complexity of the problem (e.g. number of nodes)

The following masking scheme is used for the VRP:

  1. If there is no demand remaining at any city, end the tour. Note this means that the vehicle must return to the depot to complete
  2. The vehicle can visit any city, as long as it is able to fully satisfy demand (easy to modify for partial trips if needed)
  3. The vehicle may not visit the depot more then once in a row (to speed up training)
  4. A vehicle may only visit the depot twice or more in a row if it has completed its route and waiting for other vehicles to finish (e.g. training in a minibatch setting)

In this project the following dynamic updates are used:

  1. If a vehicle visits a city, its load changes according to: Load = Load - Demand_i, and the demand at the city changes according to: Demand_i = (Demand_i - load)+
  2. Returning to the vehicle refills the vehicles load. The depot is given a "negative" demand that increases proportional to the amount of load missing from the vehicle

Results:

Tour Accuracy

This repo only implements the "Greedy" approach during test time, which selects the city with the highest probability. Tour length comparing this project to the corresponding paper is reported below. Differences in tour length may likely be optimized further through hyperparameter search, which has not been conducted here.

Paper ("Greedy") This
TSP20 3.97 4.032
TSP50 6.08 6.226
TSP100 8.44
VRP10 Cap 20 4.84 5.082
VRP20 Cap 30 6.59 6.904
VRP50 Cap 40 11.39
VRP100 Cap 50 17.23

Training Time

On a Tesla P-100 GPU, the following training times are observed. Results were obtained by taking the the total time for the first 100 training iterations (with respective batch sizes), and converting into the appopriate time unit. Note that for the VRP in particular, as models are relatively untrained during this time, this may be slightly inaccurate results and YMMV.

Task Batch Size Sec / 100 Updates Min / Epoch Hours/Epoch 20 Epochs
TSP20 128 8.23 10.71 0.18 3.57
TSP20 256 11.90 7.75 0.13 2.58
TSP20 512 19.10 6.22 0.10 2.07
TSP50 128 21.64 28.17 0.47 9.39
TSP50 256 31.33 20.40 0.34 6.80
TSP50 512 51.70 16.83 0.28 5.61
TSP100 128 48.27 62.85 1.05 20.95
TSP100 256 73.51 47.85 0.80 15.95
Task Batch Size Sec / 100 Updates Min / Epoch Hours/Epoch 20 Epochs
VRP10 128 12.15 15.82 0.26 5.27
VRP10 256 15.75 10.25 0.17 3.42
VRP10 512 23.30 7.58 0.13 2.53
VRP20 128 21.45 27.93 0.47 9.31
VRP20 256 28.29 18.42 0.31 6.14
VRP20 512 43.20 14.06 0.23 4.69
VRP50 128 53.59 69.77 1.16 23.26
VRP50 256 77.25 50.29 0.84 16.76
VRP50 512 127.73 41.58 0.69 13.86
VRP100 128 130.06 169.35 2.82 56.45
VRP100 64 95.03 247.48 4.12 82.49

Acknowledgements:

Thanks to https://github.com/pemami4911/neural-combinatorial-rl-pytorch for insight on bug with random number generator on GPU

pytorch-drl4vrp's People

Contributors

mveres01 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-drl4vrp's Issues

help me

all_demands[visit_idx, 0] = -1. + new_load[visit_idx].view(-1)

HELLO,thanks for your solid work.
Why does the demand for depot need to be reduced? Can you tell me the practical significance and function of this line of code.

application results

Hi,
Sorry to raise an issue again.

I ran this code with checkpoint for vrp20 with load 30 and max demand 9 however the best result is 827(unit) for case A-N33-K5(optimal result is 661) most heuristic algorithm can get that result.

A-N33-K5 is a standard instance which has 33 nodes and vehicle capacity is 100.
you can download the case from this website. http://www.bernabe.dorronsoro.es/vrp/ I scaled down the coordinates to 0,1 range to avoid large numbers.

I tried a trained vrp37 weight with load 44 however the result is even worse.

In the application part I used 1 for the train-size and valid-size. I also tried A-N60-K9 but the result is 1943 and it is also far away from the optimal result( 1354)(I used the trained weight for VRP50)

I wonder how I can improve the result? Change the weight?Retrain the model? Is it due to the batch-size or learning rate? Or the model is not robust enough? ( I got a pretty good result when using the vrp20 weight on a vrp37 real case problem. I used to think the model is quite robust.)

Thank you so much for your time and help.

Different way of computing context

Hi, many thanks for the excellent code. I got one little question (please feel free to correct me):
In the paper, the authors compute context vector with attention vector * embedded inputs (static and dynamic hidden)

image

But in your code, you applied context vector computation with static hidden only and then cat it with static hidden:

image

I'm a little confused about this. May I ask if there is any particular reasons? Thank you!

split-delivery

Hi,
In the paper, the author mentioned if the whole delivery constraint is relaxed the result will get better.

However if I omit the constraint( demands.lt(load) the result is always worse. even if I try to retrain the model.

In the paper even for the greedy, split delivery is better than whole delivery.

Do you know what may cause the difference here?

Thank you very much for your help!

Problem with the results plotting...

So I ran your code without making any modifications for vrp and 10 nodes (as well as 20)...and the graph looks as if there is only two nodes: depot and one more node...what is wrong?

time to train the model and how to use it for new cases

Hello,

I try to train the model but it took a very long time to finish. I wonder what is the final result after the training process. Will it give me a trained model? How can I use this model to calculate my case? For example 100 cities with given demands and their longitude and latitude.

Thank you very much!

Does not work under torch 1.0.1 version

Hi,

When I try to run trainer.py with the task = 'vrp', the error message as below comes up. is it a version issue? (device= 'cpu')

Traceback (most recent call last):
File "D:/LG_RL/pytorch-drl4vrp-master/trainer.py", line 390, in
train_vrp(args)
File "D:/LG_RL/pytorch-drl4vrp-master/trainer.py", line 348, in train_vrp
train(actor, critic, **kwargs)
File "D:/LG_RL/pytorch-drl4vrp-master/trainer.py", line 164, in train
tour_indices, tour_logp = actor(static, dynamic, x0)
File "C:\Users\hsko0\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "D:\LG_RL\pytorch-drl4vrp-master\model.py", line 224, in forward
dynamic = self.update_fn(dynamic, ptr.data)
File "D:\LG_RL\pytorch-drl4vrp-master\tasks\vrp.py", line 135, in update_dynamic
return torch.Tensor(tensor.data, device=dynamic.device)
TypeError: new(): data must be a sequence (got Tensor)

Process finished with exit code 1

Best,
Hyeseon

pytorch 1.0 error

Hello,

I have updated my pytorch to the latest 1.0 version but still use python 3.6
when I run the code I got the following tracebacks , I wonder if I can run the code with the latest pytorch 1.0? Thank you so much for your time and help!

Traceback (most recent call last):
File "C:\Users\Administrator\Desktop\pytorch-drl4vrp-master\trainer.py", line 390, in
train_vrp(args)
File "C:\Users\Administrator\Desktop\pytorch-drl4vrp-master\trainer.py", line 348, in train_vrp
train(actor, critic, **kwargs)
File "C:\Users\Administrator\Desktop\pytorch-drl4vrp-master\trainer.py", line 164, in train
tour_indices, tour_logp = actor(static, dynamic, x0)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "C:\Users\Administrator\Desktop\pytorch-drl4vrp-master\model.py", line 192, in forward
dynamic_hidden = self.dynamic_encoder(dynamic)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "C:\Users\Administrator\Desktop\pytorch-drl4vrp-master\model.py", line 17, in forward
output = self.conv(input)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\conv.py", line 187, in forward
self.padding, self.dilation, self.groups)
RuntimeError: Expected object of scalar type Double but got scalar type Float for argument #2 'weight'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.