GithubHelp home page GithubHelp logo

dotchen / worldonrails Goto Github PK

View Code? Open in Web Editor NEW
160.0 7.0 27.0 7.8 MB

(ICCV 2021, Oral) RL and distillation in CARLA using a factorized world model

Home Page: https://dotchen.github.io/world_on_rails/

License: MIT License

Python 90.08% Ruby 0.08% HTML 1.53% SCSS 0.01% JavaScript 0.09% Dockerfile 0.16% CSS 0.15% XSLT 7.83% Shell 0.08%
reinforcement-learning distillation autonomous-driving carla-simulator iccv2021

worldonrails's Introduction

World on Rails

teaser

Learning to drive from a world on rails
Dian Chen, Vladlen Koltun, Philipp Krähenbühl,
arXiv techical report (arXiv 2105.00636)

PWC

This repo contains code for our paper Learning to drive from a world on rails.

ProcGen code coming soon.

Reference

If you find our repo or paper useful, please cite us as

@inproceedings{chen2021learning,
  title={Learning to drive from a world on rails},
  author={Chen, Dian and Koltun, Vladlen and Kr{\"a}henb{\"u}hl, Philipp},
  booktitle={ICCV},
  year={2021}
}

Updates

  • We have released the pre-computed Q values in our dataset! Check DATASET.md for details.
  • Checkout our website for demo videos!

Getting Started

  • To run CARLA and train the models, make sure you are using a machine with at least a mid-end GPU.
  • Please follow INSTALL.md to setup the environment.

Training

  • Please refer to RAILS.md on how to train our World-on-Rails agent.
  • Please refer to LBC.md on how to train the LBC agent.

Evaluation

If you evaluating the pretrained weights, make sure you are launching CARLA with -vulkan!

Leaderboard routes

python evaluate.py --agent-config=[PATH TO CONFIG]

NoCrash routes

python evaluate_nocrash.py --town={Town01,Town02} --weather={train, test} --agent-config=[PATH TO CONFIG] --resume
  • Use defaults for RAILS, and --agent=autoagents/lbc_agent for LBC.
  • To print a readable table, use
python -m scripts.view_nocrash_results [PATH TO CONFIG.YAML]

Pretrained weights

Dataset

We also release the data we trained for the leaderboard. Checkout DATASET.md for more details.

Acknowledgements

The leaderboard codes are built from the original leaderboard repo. The scenariorunner codes are from the original scenario_runner repo. The waypointer.py GPS coordinate conversion codes are build from Marin Toromanoff's leadeboard submission.

License

This repo is released under the MIT License (please refer to the LICENSE file for details). The leaderboard repo which our leaderboard folder builds upon is under the MIT License.

worldonrails's People

Contributors

dotchen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

worldonrails's Issues

config files for Q-collector

I started collecting data for Phase-1 using RAILS. Everything works except it has configuration files missing. According to logs, it should be in the experiments/ folder. Currently, I created a folder and added the config.yaml, and made changes so it works.

@dotchen : Are the config files for q-collector similar to config.yaml or are there any significant changes? Thanks.

Question regarding the lane images

Hi,
may I know what does the thickness of the lines in lane images in the asset folder mean? For example in Town03, I cannot seem to understand why there are some lines that are thicker than the other.

How to evaluate the dense traffic in nocrash?

Make sure you have read FAQ before posting.
Thanks!
Hello again!
Thank for so much again for helping me fix the error, and now I am glad to tell you that I have successfully trained the no_crash model here. Now I have collected about 150K frames (almost 186GB) files, and test on Town01 and train weather. Here is the report.
https://wandb.ai/sunhaoyi/carla_train_phase2/reports/Project-Dashboard--Vmlldzo3NjY3MzE?accessToken=gwm97gty3n5dvf24l82dk3qxz24ller7ndn5128kzjif4qyerppqlk9wnnwnp220 . And the result are as follows:

Town01,0,1,78,225,71.45,0,409.5
Town01,0,3,78,225,100.0,2,452.25
Town01,0,6,78,225,29.41,0,58.1
Town01,0,8,78,225,27.94,0,63.55
Town01,0,1,103,21,100.0,0,188.0
Town01,0,3,103,21,100.0,0,187.85
Town01,0,6,103,21,0.0,0,180.05
Town01,0,8,103,21,60.82,0,328.35
Town01,0,1,127,87,100.0,0,232.55
Town01,0,3,127,87,100.0,0,233.65
Town01,0,6,127,87,72.93,0,367.55
Town01,0,8,127,87,100.0,0,232.8
Town01,0,1,19,103,28.63,0,243.6
Town01,0,3,19,103,100.0,0,266.1
Town01,0,6,19,103,23.7,0,212.15
Town01,0,8,19,103,100.0,0,356.4
Town01,0,1,230,210,100.0,0,36.95
Town01,0,3,230,210,61.43,0,136.05
Town01,0,6,230,210,100.0,0,36.95
Town01,0,8,230,210,100.0,0,35.65
Town01,0,1,250,190,27.93,0,209.1
Town01,0,3,250,190,100.0,0,140.35
Town01,0,6,250,190,21.53,0,203.2
Town01,0,8,250,190,21.31,0,201.45
Town01,0,1,220,118,57.3,0,273.65
Town01,0,3,220,118,57.3,0,273.65
Town01,0,6,220,118,29.49,0,210.0
Town01,0,8,220,118,55.27,0,238.65
Town01,0,1,200,224,100.0,0,255.25
Town01,0,3,200,224,100.0,0,256.5
Town01,0,6,200,224,0.15,0,183.0
Town01,0,8,200,224,41.71,0,331.05
Town01,0,1,11,17,100.0,0,134.4
Town01,0,3,11,17,100.0,0,134.85
Town01,0,6,11,17,100.0,0,134.0
Town01,0,8,11,17,100.0,0,135.95
Town01,0,1,78,245,100.0,0,153.5
Town01,0,3,78,245,100.0,0,153.75
Town01,0,6,78,245,52.75,0,60.4
Town01,0,8,78,245,48.52,0,65.0
Town01,0,1,3,175,31.46,0,274.3
Town01,0,3,3,175,100.0,0,169.1
Town01,0,6,3,175,23.38,0,231.95
Town01,0,8,3,175,7.7,0,26.0
Town01,0,1,92,112,100.0,0,221.0
Town01,0,3,92,112,100.0,0,220.55
Town01,0,6,92,112,100.0,0,221.75
Town01,0,8,92,112,100.0,0,219.8
Town01,0,1,233,238,100.0,0,223.25
Town01,0,3,233,238,100.0,0,224.25
Town01,0,6,233,238,100.0,0,224.5
Town01,0,8,233,238,100.0,0,223.15
Town01,0,1,4,54,100.0,0,164.9
Town01,0,3,4,54,100.0,0,164.8
Town01,0,6,4,54,100.0,0,164.5
Town01,0,8,4,54,100.0,0,165.05

1. Any suggestion about collecting data?

It seems that the trained model is not good as your pretrained model, I guess maybe the data we collected in not enough. And I only changed the batchsize 64.·Besides, I have checked the sematic segmentation, maybe because of the insufficient data, it is hard to segment Pedestrian and trafficlight, and the average of train town + train weather is about 70-80 score.

2. How to change the nocrash traffic parameters?

After checking the args, I cannot find the [empty, regular and dense] traffic settings, and I see the default is empty. And after checking the code here, we need to change the car_amountsand ped_amounts, is there any way to change them?

3. Why do we need to train the semantic segmentation?

I have read your code and found the loss in main_model, I guess the segmentation helps to train the model here?

loss = act_loss + weight * seg_loss

Due to my poor comprehension, I am a little confused on the policy distillation. This is very similar to the form of loss in knowledge distillation, that is distillation loss + student loss. And both the input data is wide + narr images, I think the output of teacher net is Q-table and output of the student is Policy.Then the distillation loss is KL of act_outputs and act_probs, and student loss is the cross_entropy of wide_seg_outputs and the Ground Truthwide_sems.
What if we only use the act? Such as loss = act_loss + weight * distill_act_loss. Besides, we only need the Q-table to make action and although sometimes segmentation seems bad, it always shows the right decision in the Q-table.(For example, there is a person in front, but the segmentation did not shows red, the Q-table shows brake=0.99).

That's all, I will read your paper and code again. And later I will collect again for the leardership, from the last issue, it really needs 1M frames. Thank you so much!

Training error

First of all, thank you for publishing this work.
I would like to only retrain the Q table again by manipulating the reward function. So, I download the 100GB dataset and ran the data_phase2 module. However, because the dataset that you have provided doesn't have any *.mdb files, lmdb produces an error here because the read-only flag is true and says the no such file or directory. The path that I set is the root of this 1M record dataset.

        # Load dataset
        for full_path in glob.glob(f'{data_dir}/**'):
            txn = lmdb.open(
                full_path,
                max_readers=1, readonly=True,
                lock=False, readahead=False, meminit=False).begin(write=False)

Is there anything that I am doing wrong? Thanks.

Stage 2 training

Hi, I was wondering how much time it took to train the main model with 1M frames. Currently, I am using 4 GPU, with 128 batch size. Currently, one epoch took 20 hr with 35 % complete.

Will increasing the batch size help for faster computation or would it affect the training as well?

python -m rails.data_phase0 --num-runners=[NUM RUNNERS] --port=[WORLD PORT] exit no outdata; when change ray local_mode=True ok

i run carla and worldinrails in server ;
ubuntu 184, aioredis==1.3.0

run carla: DISPLAY= ${CARLA_ROOT}/CarlaUE4.sh -world-port=$port -opengl -quality-level=low -world-port=$port -resx=400 -resy=300 $ change in file scripts/launch_carla.sh

pygame no screen change settint :
+++ b/leaderboard/leaderboard/leaderboard_evaluator.py
@@ -23,7 +23,7 @@ import pkg_resources
import sys
import carla
import signal

+os.environ['SDL_VIDEODRIVER'] = 'dummy'

when run: python -m rails.data_phase0 --num-runners=[NUM RUNNERS] --port=[WORLD PORT]
program run and exit no outdata and no error

when rails/data_phase0.py change this line
ray.init(logging_level=40, local_mode=True, log_to_driver=False)

local_mode=True from default False to True

it run ok;

what happen?

lbc_agent

Make sure you have read FAQ before posting.
Thanks!

Hi,

I want to test the lbc_agent and changed the "--agent=autoagents/lbc_agent" for LBC. But when I was creating the "config_lbc.yaml", I could not find the weights for the LBC agent. So, is it provided? Or can you provide the weights for running the lbc agent?

Configure environment variables

Make sure you have read FAQ before posting.
Thanks!
Hi and Thank you for your amazing work!
I just wanted to run your code but got miss understanding in Configure environment variables part and don't know what should I exactly do! can you explain it more please?
I'm waiting for your answer and appreciate it.
thank you
Zahra

Ground truth appears to be occasionally incorrect at intersections

Hi Dian,

It sometimes looks like the ground truth computed from by the action labeler gives incorrect supervision at red lights. See the below example:

media_images_train_image_85_a9a91a81

I don't think I've changed any of the action labeling logic, but it looks like the ground truth assigns a very low probability to braking even though that would be the correct action in this situation. Have you seen anything like this/have any suggestions for fixing the issue? In general, it looks like red light infractions make up the vast majority of infractions, and they often lead to other infractions like vehicle collisions or route deviations.

infraction_pie

Small issues in running code

Hey Dian,

I'm working with your code right now and I thought I'd quickly point out some small issues I encountered while working on it. These are mostly small things and I may be missing other small issues I found:

  1. during environment setup, I had to do the following to get the right environment: (1) create conda environment from environment.yaml file (as stated in INSTALL.md), (2) pip install -r requirements.txt for the requirements files in scenario_runner and leaderboard directories, (3) uninstall the CARLA 0.9.5 that gets installed as part of one of the requirements.txt files (pip uninstall carla), (4) easy_install the CARLA .egg file that comes with the CARLA 0.9.10.1 repo
  2. the carla server launch script doesn't exactly line up with the INSTALL.md specification (accepts $2 and $3 args but there's no $1 arg?). I rewrote the script to add a third argument that specifies how many GPUs you're running with, and sets CUDA_VISIBLE_DEVICES appropriately. So for 4 runners and 4 GPUS, it starts 1 CARLA server per GPU. Slightly modified script here
  3. when you run data_phase1, there's a map_utils.py bit of code that doesn't fly, which relies upon code included in CARLA's PythonAPI - (here is one of the offending lines). The method expects 3 args but 4 are provided. I kept the last arg and removed the third arg since I think the 3rd arg is supposed to represent the maximum lookahead distance or something.
  4. python -m rails.data_phase2 isn't able to import WORActionLabeler and running a grep from the repo root doesn't turn up any code that has that. I did find a RAILSActionLabeler though, which I'm assuming is what is supposed to be imported?

I'll update this post/thread if I find anything else!

Question about converter between world and camera frame

Hi,

I would like to know what does the parameter offset in lbc.models.converter::Converter mean physically.
Say I want to build a agent with front camera RGB input (with H=180, W=320, crop_top=crop_bottom=10) and I set up the converter in the following way:
self.converter = Converter(w=320, h=160, fov=50,offset=6.0, scale=[1.5,1.5]).to(self.device).
But I am not sure whether the offset parameter is set up in the right way.

about data collecting in data_phase0

Hi Chen. Thanks for a lot of open source works and I am trying to follow them.
I jump from the data collecting about LAV as you said there are similarities between them. I found that the random collector return a random control including throttle and steer. I feel confused how could the agent arrived the goal under the random control? Does WOR and LAV use the same agent?
How does it chose random routes?

Data collection for LBC

Hi,
is there a way to modify the autoagent collector to use the CARLA autopilot as the expert for collecting data instead of using the computational expensive qcollecter agent?

Thank you very much.

Error in train_phase0 - config.balanced_cmd

Hi,

After the data collection process using data_phase0, I am trying to run train_phase0. However, I am getting the following error

Traceback (most recent call last):
  File "/home/himangi/anaconda3/envs/world_on_rails/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/himangi/anaconda3/envs/world_on_rails/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/himangi/carla/WorldOnRails/rails/train_phase0.py", line 53, in <module>
    main(args)
  File "/home/himangi/carla/WorldOnRails/rails/train_phase0.py", line 11, in main
    data = data_loader('ego', args)
  File "/home/himangi/carla/WorldOnRails/rails/datasets/__init__.py", line 17, in data_loader
    if config.balanced_cmd:
AttributeError: 'Namespace' object has no attribute 'balanced_cmd'

At the line 11 of train_phase0, the data loader is called and the init.py file has an if statement of config.balanced_cmd. I am unable to find this argument in the config.yaml file. Can you please let me know whether balanced_cmd has to be set to True or False?

Thanks,
Himangi

lmdb.error

Hi,

When I try to use train_phase0 for RAIL algorithm, I am using the correct path after downloading the dataset from box. LMDB gives an error saying
lmdb.Error: /rails_dataset/main_trajs6_converted2/lkshghbpjg: No such file or directory

What's the right way to stop data_phase runs when enough data has been generated?

Hey Dian,

It's me again - had a question about how to correctly stop the data_phase methods once enough data has been collected. The workflow I've been going with is: spin up CARLA servers with launch_carla.sh, and run python -m data_phase1 for example. I check the target directory that data is written to and call ray stop once enough has been generated. But, I'm not sure if this is okay.

I'm asking because I collected a data_phase1 dataset and ran data_phase2, but got the following:

(wor) [aaronhua@trinity-0-11 WorldOnRails]$ python -m rails.data_phase2 --num-workers=4
Traceback (most recent call last):
  File "/home/aaronhua/anaconda3/envs/wor/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/aaronhua/anaconda3/envs/wor/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/aaronhua/WorldOnRails/rails/data_phase2.py", line 59, in <module>
    main(args)
  File "/home/aaronhua/WorldOnRails/rails/data_phase2.py", line 13, in main
    total_frames = ray.get(dataset.num_frames.remote())
  File "/home/aaronhua/anaconda3/envs/wor/lib/python3.7/site-packages/ray/worker.py", line 1379, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TypeError): ray::RemoteMainDataset.num_frames() (pid=13194, ip=10.1.1.11)
  File "python/ray/_raylet.pyx", line 422, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 456, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 459, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 415, in ray._raylet.execute_task.function_executor
  File "/home/aaronhua/WorldOnRails/rails/datasets/main_dataset.py", line 216, in __init__
    super().__init__(*args, **kwargs)
  File "/home/aaronhua/WorldOnRails/rails/datasets/main_dataset.py", line 123, in __init__
    n = int(txn.get('len'.encode()))
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

I've double checked that the main_data_dir specified in config.yaml and the data-dir argument in rails/data_phase2 point to the correct directory, and each of the runs within the data directory are at least a couple tens of megabytes. Strangely, I was previously just Ctrl-C'ing the processes which seemed to work fine (running train_phase2 on a different set of data currently with no issue). I was under the impression that ray stop would be the "correct" way to stop data processes.

Question about LBC dataset

Hi, dian! Thanks for releasing this wonderful work. Currently I have some questions about the setup of LBC dataset (for training the cheating agent).
In your original paper, the expert of the privileged agent is autopilot policy. So I think the agent used to collect dataset should be autopilot agent.
But according to the README of LBC in this repo, the dataset follows RAILS' setup procedure and seems to be built by (1) collecting dataset with random agents (2) Q learning (3) collecting dataset with Q-agent. This procedure is different as specified in the original paper.

I'm wondering if I've misunderstood something. If yes, please kindly correct me. If not, does these two types of dataset bring performance gap on the privileged agent?

Minor errors regarding the released dataset

Make sure you have read FAQ before posting.
Thanks!
Hello,
Sorry to disrupt you again. After reading the FAQ carefully, I downloaded your released datasets, there are some small mistakes here. And there is some problem about launching carla, I guess this problem leads to the errors about training the ego model.

1 datasets lose key and value

For example,

real_file='/home/shy/Desktop/aazijolwvf'
json_file=real_file+'/data.json'
data = json.load(open(json_file))
data['0']
{'loc': [655.82763671875, 30.739931106567383, 0.03367812931537628],
 'rot': [-91.5350341796875],
 'spd': [2.312199831008911],
 'cmd': [4.0],
 'lbl_00': '/ssd2/dian/challenge_data/main_trajs6_converted/aazijolwvf/rgbs/lbl_00_00000.png',
 'lbl_01': '/ssd2/dian/challenge_data/main_trajs6_converted/aazijolwvf/rgbs/lbl_01_00000.png',
 'lbl_02': '/ssd2/dian/challenge_data/main_trajs6_converted/aazijolwvf/rgbs/lbl_02_00000.png',
 'lbl_03': '/ssd2/dian/challenge_data/main_trajs6_converted/aazijolwvf/rgbs/lbl_03_00000.png',
 'lbl_04': '/ssd2/dian/challenge_data/main_trajs6_converted/aazijolwvf/rgbs/lbl_04_00000.png',
 'lbl_05': '/ssd2/dian/challenge_data/main_trajs6_converted/aazijolwvf/rgbs/lbl_05_00000.png',
 'lbl_06': '/ssd2/dian/challenge_data/main_trajs6_converted/aazijolwvf/rgbs/lbl_06_00000.png',
 'lbl_07': '/ssd2/dian/challenge_data/main_trajs6_converted/aazijolwvf/rgbs/lbl_07_00000.png',
 'lbl_08': '/ssd2/dian/challenge_data/main_trajs6_converted/aazijolwvf/rgbs/lbl_08_00000.png',
 'lbl_09': '/ssd2/dian/challenge_data/main_trajs6_converted/aazijolwvf/rgbs/lbl_09_00000.png',
 'lbl_10': '/ssd2/dian/challenge_data/main_trajs6_converted/aazijolwvf/rgbs/lbl_10_00000.png',
 'lbl_11': '/ssd2/dian/challenge_data/main_trajs6_converted/aazijolwvf/rgbs/lbl_11_00000.png',
 'wide_rgb_{}': '/rgbs/narr_rgb_2_00000.jpg',
 'wide_sem_{}': '/rgbs/wide_sem_2_00000.png',
 'narr_sem_{}': '/rgbs/narr_sem_2_00000.png'}

It seems that the narr_rgb_{} key and value is lost, and the file name does not correspond to the value in this json file. Besides, the value of wide_rgb_{} is also needs to be changed. So the code here is provided:

import json
real_file='/home/shy/Desktop/aazijolwvf'
#json_file ='/home/shy/Desktop/aazijolwvf/data.json'
json_file=real_file+'/data.json'
json_save_file=real_file+'/data_new.json'
data = json.load(open(json_file))

for i in range(data['len']):
    idx = str(i)
    data[idx]['narr_rgb_{}']=data[idx]['wide_rgb_{}']
    wide_value = data[idx]['wide_rgb_{}'][:6]+'wide'+data[idx]['wide_rgb_{}'][10:]
    #print("wide value is", wide_value)
    data[idx]['wide_rgb_{}']=wide_value

    # data['0'].keys()
    key_change_lbl = [key for key in data[idx].keys()][4:16]
    key_change_rgb=[ [key for key in data[idx].keys()][16],[key for key in data[idx].keys()][19] ]
    key_change_sem=[ [key for key in data[idx].keys()][17],[key for key in data[idx].keys()][18] ]
    for key in key_change_lbl:
        #print("data",data['0'][key])
        real_str = data[idx][key][58:]
        data_new=real_file+real_str
        data[idx][key] = data_new
    for key in key_change_rgb:
    # remove_rgb
        real_str_first= data[idx][key][:10]
        real_str_last= data[idx][key][14:]
        data_new=real_file+real_str_first+real_str_last
        data[idx][key] = data_new

    for key in key_change_sem:
        real_str= data[idx][key]
        data_new=real_file+real_str
        data[idx][key] = data_new
    

with open(json_save_file, 'w') as outfile:
    json.dump(data, outfile)

And for testing, we set a random number, the code is following:

import numpy as np
idx = np.random.randint(0,200)
print("The index is",idx)
idx = str(idx)
key_images= [key for key in data[idx].keys()][4:]
for key in key_images[12:]:
    print(data[idx][key])
    x = plt.imread(data[idx][key])
   # print("x.shape",x.shape)
    plt.figure()
    plt.imshow(x)

And finally the datasets can be loaded.

2 what does 6+:lane centers mean?

I found 6-11 are lane centers,what is the difference between them or how to collect it? I thought the direction of no.11 images seems different, I guess it maybe the people's lane center?
Kazam_screenshot_00001

3 Is Vulcan necessary for Carla?

I failed to launch Carla by running the launch Carla shell. In FAQ, it might be the error of Vulcan, However, I can still launch Carla using the OpenGL. And I successfully test the examples in PythonAPI. After launching Carla, I decided to train the ego_model. And first of all, I need to collect the datasets.

$ ./CarlaUE4.sh -fps 10 -world-port 2000

Then I tried

$ python -m rails.data_phase0 --num-runners=1 --port=2000
(world_on_rails) shy@carla:~/WorldOnRails0526$ python -m rails.data_phase0 --num-runners=1 --port=2000
pygame 2.0.1 (SDL 2.0.14, Python 3.7.9)
Hello from the pygame community. https://www.pygame.org/contribute.html
Traceback (most recent call last):
  File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/shy/WorldOnRails0526/rails/data_phase0.py", line 53, in <module>
    main(args)
  File "/home/shy/WorldOnRails0526/rails/data_phase0.py", line 16, in main
    runner = ScenarioRunner.remote(args, scenario, route, port=port, tm_port=tm_port)
  File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/actor.py", line 407, in remote
    return self._remote(args=args, kwargs=kwargs)
  File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/actor.py", line 658, in _remote
    kwargs)
  File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/signature.py", line 116, in flatten_args
    raise TypeError(str(exc)) from None
TypeError: missing a required argument: 'route'

I am not sure how to fix this problem, I just do as follows:(Did I set the right scenario_class?)

runner = ScenarioRunner.remote(args,scenario_class='route_scenario', scenario=scenario, route=route, port=port, tm_port=tm_port)

After that, there is no error and it seems that the code is running at ray.wait(jobs, num_returns=args.num_runners), and no data are saved. Then I turned to phase1,

python -m rails.data_phase1 --scenario=train_scenario --num-runners=1 --port 2000
(world_on_rails) shy@carla:~/WorldOnRails0526$ python -m rails.data_phase1 --scenario=train_scenario --num-runners=1 --port 2000
pygame 2.0.1 (SDL 2.0.14, Python 3.7.9)
Hello from the pygame community. https://www.pygame.org/contribute.html
(pid=15754) pygame 2.0.1 (SDL 2.0.14, Python 3.7.9)
(pid=15754) Hello from the pygame community. https://www.pygame.org/contribute.html
(pid=raylet) Traceback (most recent call last):
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/new_dashboard/agent.py", line 317, in <module>
(pid=raylet)     raise e
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/new_dashboard/agent.py", line 306, in <module>
(pid=raylet)     loop.run_until_complete(agent.run())
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
(pid=raylet)     return future.result()
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/new_dashboard/agent.py", line 180, in run
(pid=raylet)     agent_ip_address=self.ip))
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/grpc/aio/_call.py", line 286, in __await__
(pid=raylet)     self._cython_call._status)
(pid=raylet) grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
(pid=raylet) 	status = StatusCode.UNAVAILABLE
(pid=raylet) 	details = "DNS resolution failed for service: https:"
(pid=raylet) 	debug_error_string = "{"created":"@1622118924.664088343","description":"Resolver transient failure","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":2132,"referenced_errors":[{"created":"@1622118924.664086974","description":"DNS resolution failed for service: https:","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc","file_line":361,"grpc_status":14,"referenced_errors":[{"created":"@1622118924.664067929","description":"C-ares status is not ARES_SUCCESS qtype=A name=https is_balancer=0: Could not contact DNS servers","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc","file_line":716,"referenced_errors":[{"created":"@1622118924.663999265","description":"C-ares status is not ARES_SUCCESS qtype=AAAA name=https is_balancer=0: Could not contact DNS servers","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc","file_line":716}]}]}]}"
(pid=raylet) >
(pid=raylet) Traceback (most recent call last):
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/new_dashboard/agent.py", line 317, in <module>
(pid=raylet)     raise e
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/new_dashboard/agent.py", line 306, in <module>
(pid=raylet)     loop.run_until_complete(agent.run())
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
(pid=raylet)     return future.result()
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/new_dashboard/agent.py", line 180, in run
(pid=raylet)     agent_ip_address=self.ip))
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/grpc/aio/_call.py", line 286, in __await__
(pid=raylet)     self._cython_call._status)
(pid=raylet) grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
(pid=raylet) 	status = StatusCode.UNAVAILABLE
(pid=raylet) 	details = "DNS resolution failed for service: https:"
(pid=raylet) 	debug_error_string = "{"created":"@1622118926.269346249","description":"Resolver transient failure","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":2132,"referenced_errors":[{"created":"@1622118926.269345149","description":"DNS resolution failed for service: https:","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc","file_line":361,"grpc_status":14,"referenced_errors":[{"created":"@1622118926.269326571","description":"C-ares status is not ARES_SUCCESS qtype=A name=https is_balancer=0: Could not contact DNS servers","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc","file_line":716,"referenced_errors":[{"created":"@1622118926.269311059","description":"C-ares status is not ARES_SUCCESS qtype=AAAA name=https is_balancer=0: Could not contact DNS servers","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc","file_line":716}]}]}]}"
(pid=raylet) >
(pid=raylet) Traceback (most recent call last):
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/new_dashboard/agent.py", line 317, in <module>
(pid=raylet)     raise e
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/new_dashboard/agent.py", line 306, in <module>
(pid=raylet)     loop.run_until_complete(agent.run())
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
(pid=raylet)     return future.result()
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/new_dashboard/agent.py", line 180, in run
(pid=raylet)     agent_ip_address=self.ip))
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/grpc/aio/_call.py", line 286, in __await__
(pid=raylet)     self._cython_call._status)
(pid=raylet) grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
(pid=raylet) 	status = StatusCode.UNAVAILABLE
(pid=raylet) 	details = "DNS resolution failed for service: https:"
(pid=raylet) 	debug_error_string = "{"created":"@1622118927.908004098","description":"Resolver transient failure","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":2132,"referenced_errors":[{"created":"@1622118927.908003298","description":"DNS resolution failed for service: https:","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc","file_line":361,"grpc_status":14,"referenced_errors":[{"created":"@1622118927.907989921","description":"C-ares status is not ARES_SUCCESS qtype=A name=https is_balancer=0: Could not contact DNS servers","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc","file_line":716,"referenced_errors":[{"created":"@1622118927.907964504","description":"C-ares status is not ARES_SUCCESS qtype=AAAA name=https is_balancer=0: Could not contact DNS servers","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc","file_line":716}]}]}]}"
(pid=raylet) >
(pid=raylet) Traceback (most recent call last):
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/new_dashboard/agent.py", line 317, in <module>
(pid=raylet)     raise e
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/new_dashboard/agent.py", line 306, in <module>
(pid=raylet)     loop.run_until_complete(agent.run())
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
(pid=raylet)     return future.result()
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/new_dashboard/agent.py", line 180, in run
(pid=raylet)     agent_ip_address=self.ip))
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/grpc/aio/_call.py", line 286, in __await__
(pid=raylet)     self._cython_call._status)
(pid=raylet) grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
(pid=raylet) 	status = StatusCode.UNAVAILABLE
(pid=raylet) 	details = "DNS resolution failed for service: https:"
(pid=raylet) 	debug_error_string = "{"created":"@1622118929.553582429","description":"Resolver transient failure","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":2132,"referenced_errors":[{"created":"@1622118929.553581561","description":"DNS resolution failed for service: https:","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc","file_line":361,"grpc_status":14,"referenced_errors":[{"created":"@1622118929.553567170","description":"C-ares status is not ARES_SUCCESS qtype=A name=https is_balancer=0: Could not contact DNS servers","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc","file_line":716,"referenced_errors":[{"created":"@1622118929.553544142","description":"C-ares status is not ARES_SUCCESS qtype=AAAA name=https is_balancer=0: Could not contact DNS servers","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc","file_line":716}]}]}]}"
(pid=raylet) >
(pid=raylet) Traceback (most recent call last):
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/new_dashboard/agent.py", line 317, in <module>
(pid=raylet)     raise e
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/new_dashboard/agent.py", line 306, in <module>
(pid=raylet)     loop.run_until_complete(agent.run())
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
(pid=raylet)     return future.result()
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/new_dashboard/agent.py", line 180, in run
(pid=raylet)     agent_ip_address=self.ip))
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/grpc/aio/_call.py", line 286, in __await__
(pid=raylet)     self._cython_call._status)
(pid=raylet) grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
(pid=raylet) 	status = StatusCode.UNAVAILABLE
(pid=raylet) 	details = "DNS resolution failed for service: https:"
(pid=raylet) 	debug_error_string = "{"created":"@1622118931.222429438","description":"Resolver transient failure","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":2132,"referenced_errors":[{"created":"@1622118931.222428625","description":"DNS resolution failed for service: https:","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc","file_line":361,"grpc_status":14,"referenced_errors":[{"created":"@1622118931.222411772","description":"C-ares status is not ARES_SUCCESS qtype=A name=https is_balancer=0: Could not contact DNS servers","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc","file_line":716,"referenced_errors":[{"created":"@1622118931.222371125","description":"C-ares status is not ARES_SUCCESS qtype=AAAA name=https is_balancer=0: Could not contact DNS servers","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc","file_line":716}]}]}]}"
(pid=raylet) >
(pid=raylet) Traceback (most recent call last):
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/new_dashboard/agent.py", line 317, in <module>
(pid=raylet)     raise e
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/new_dashboard/agent.py", line 306, in <module>
(pid=raylet)     loop.run_until_complete(agent.run())
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
(pid=raylet)     return future.result()
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/new_dashboard/agent.py", line 180, in run
(pid=raylet)     agent_ip_address=self.ip))
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/grpc/aio/_call.py", line 286, in __await__
(pid=raylet)     self._cython_call._status)
(pid=raylet) grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
(pid=raylet) 	status = StatusCode.UNAVAILABLE
(pid=raylet) 	details = "DNS resolution failed for service: https:"
(pid=raylet) 	debug_error_string = "{"created":"@1622118932.839846720","description":"Resolver transient failure","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":2132,"referenced_errors":[{"created":"@1622118932.839845839","description":"DNS resolution failed for service: https:","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc","file_line":361,"grpc_status":14,"referenced_errors":[{"created":"@1622118932.839830135","description":"C-ares status is not ARES_SUCCESS qtype=A name=https is_balancer=0: Could not contact DNS servers","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc","file_line":716,"referenced_errors":[{"created":"@1622118932.839817847","description":"C-ares status is not ARES_SUCCESS qtype=AAAA name=https is_balancer=0: Could not contact DNS servers","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc","file_line":716}]}]}]}"
(pid=raylet) >
(pid=raylet) Traceback (most recent call last):
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/new_dashboard/agent.py", line 317, in <module>
(pid=raylet)     raise e
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/new_dashboard/agent.py", line 306, in <module>
(pid=raylet)     loop.run_until_complete(agent.run())
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
(pid=raylet)     return future.result()
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/new_dashboard/agent.py", line 180, in run
(pid=raylet)     agent_ip_address=self.ip))
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/grpc/aio/_call.py", line 286, in __await__
(pid=raylet)     self._cython_call._status)
(pid=raylet) grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
(pid=raylet) 	status = StatusCode.UNAVAILABLE
(pid=raylet) 	details = "DNS resolution failed for service: https:"
(pid=raylet) 	debug_error_string = "{"created":"@1622118934.437792694","description":"Resolver transient failure","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":2132,"referenced_errors":[{"created":"@1622118934.437791878","description":"DNS resolution failed for service: https:","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc","file_line":361,"grpc_status":14,"referenced_errors":[{"created":"@1622118934.437777832","description":"C-ares status is not ARES_SUCCESS qtype=A name=https is_balancer=0: Could not contact DNS servers","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc","file_line":716,"referenced_errors":[{"created":"@1622118934.437756242","description":"C-ares status is not ARES_SUCCESS qtype=AAAA name=https is_balancer=0: Could not contact DNS servers","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc","file_line":716}]}]}]}"
(pid=raylet) >
(pid=raylet) Traceback (most recent call last):
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/new_dashboard/agent.py", line 317, in <module>
(pid=raylet)     raise e
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/new_dashboard/agent.py", line 306, in <module>
(pid=raylet)     loop.run_until_complete(agent.run())
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
(pid=raylet)     return future.result()
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/new_dashboard/agent.py", line 180, in run
(pid=raylet)     agent_ip_address=self.ip))
(pid=raylet)   File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/grpc/aio/_call.py", line 286, in __await__
(pid=raylet)     self._cython_call._status)
(pid=raylet) grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
(pid=raylet) 	status = StatusCode.UNAVAILABLE
(pid=raylet) 	details = "DNS resolution failed for service: https:"
(pid=raylet) 	debug_error_string = "{"created":"@1622118936.032766161","description":"Resolver transient failure","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":2132,"referenced_errors":[{"created":"@1622118936.032765376","description":"DNS resolution failed for service: https:","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc","file_line":361,"grpc_status":14,"referenced_errors":[{"created":"@1622118936.032752221","description":"C-ares status is not ARES_SUCCESS qtype=A name=https is_balancer=0: Could not contact DNS servers","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc","file_line":716,"referenced_errors":[{"created":"@1622118936.032727375","description":"C-ares status is not ARES_SUCCESS qtype=AAAA name=https is_balancer=0: Could not contact DNS servers","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc","file_line":716}]}]}]}"
(pid=raylet) >
2021-05-27 20:35:37,323	ERROR worker.py:980 -- Possible unhandled error from worker: ray::ScenarioRunner.run() (pid=15754, ip=219.216.98.5)
  File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 415, in ray._raylet.execute_task.function_executor
  File "/home/shy/WorldOnRails0526/runners/scenario_runner.py", line 27, in run
    return self.runner.run(self.args)
  File "/home/shy/WorldOnRails0526/leaderboard/leaderboard/leaderboard_evaluator.py", line 417, in run
    self.statistics_manager.clear_record(args.checkpoint)
  File "/home/shy/WorldOnRails0526/leaderboard/leaderboard/utils/statistics_manager.py", line 355, in clear_record
    with open(endpoint, 'w') as fd:
FileNotFoundError: [Errno 2] No such file or directory: 'results/00_simulation_results.json'

The same error can be seen in the phase2. I guess these errors are caused by the ray.Now I am confused by these data collecting errors. And I would appreciate it if you could help me fixed these errors. Thank you again!

Requesting some more training details

Hey again, just wanted to ask for a bit more training detail. I'm trying to reproduce results (computing benchmark metrics on the Leaderboard validation routes), and the pretrained weights seem to do better than the weights I train on my own setup. I have two questions about how main_model_10.th is trained

  1. How large was the main_dataset used to train that model?
  2. How long did it take to train the model? I'm assuming it was 10 epochs total, which would've taken multiple days for the 100 Gb main dataset I was using

Also, thank you for the reply on the other issue - I'll need to mull over your answers and read through the code some more to understand what's going on. I'll reply to that issue once I've thought about it some more!

Project permission

Make sure you have read FAQ before posting.
Thanks!
Hi. I wanted to run your code but I got the error says I need your permission to rub the project in Wandb.
Can you please give me the permission?
Thank you

LeaderBoard model evaluation

Hi, I am using the dataset provided in the repository. I used the same data provided to compute q-values and then trained the policy network. I evaluated the model I trained on the leaderboard test routes and compared them to the model that was provided in the repo(https://utexas.app.box.com/s/8lcl7istkr23dtjqqiyu0v8is7ha5u2r). The evaluation results vary a lot between these models. Basically, I am comparing the model I trained after 10 epochs to match with the baseline.

Screenshot from 2021-09-27 07-39-51

Results on the left are from my model and on the right are from your model. Can you please let me know why there might be a difference in the results?

Question about the scenario file in this code

Make sure you have read FAQ before posting.
Thanks!

Hi Dian,

Thank you for your great work! I have one simple question regarding this file:
https://github.com/dotchen/WorldOnRails/blob/release/assets/all_towns_traffic_scenarios.json

This file seems to have much more scenarios than the Carla official scenario files, including the old one and also the new one updated 4 months ago:
the old one (with 38894 lines): https://github.com/bradyz/leaderboard/blob/7104247204e0e591d43640d2c98490533bc5bbb8/data/all_towns_traffic_scenarios_public.json
the new one (with 14986 lines): https://github.com/carla-simulator/leaderboard/blob/master/data/all_towns_traffic_scenarios_public.json

So my question is did you create this file with 102605 lines (https://github.com/dotchen/WorldOnRails/blob/release/assets/all_towns_traffic_scenarios.json) yourself? Or where did you get this scenario file?

Thanks in advance!
Xinshuo

LBC_ agent error

hi,Dian
About NoCrash routes. I try to use LBC , but the error is as follows:

Could not set up the required agent:

'LBCAgent' object has no attribute 'crop_top'

Traceback (most recent call last):
File "/home//carla1/WorldOnRails-release/leaderboard/leaderboard/nocrash_evaluator.py", line 239, in _load_and_run_scenario
self.agent_instance = getattr(self.module_agent, agent_class_name)(args.agent_config)
File "/home/
/carla1/WorldOnRails-release/leaderboard/leaderboard/autoagents/autonomous_agent.py", line 45, in init
self.setup(path_to_conf_file)
File "autoagents/lbc_agent.py", line 51, in setup
height=240-self.crop_top-self.crop_bottom, width=480,
AttributeError: 'LBCAgent' object has no attribute 'crop_top'

I really didn't find that attribute.
Is there anything I missed?

thank you in advance!

Increasing the reward for staying in lane

A lot of runs are failed prematurely because the image agent will go the wrong way and cause a terminating route deviation infraction. This could be something like taking a wrong turn at an intersection, or taking an exit on the highway when the agent should have continued going straight.

In the paper, "the agent receives a reward of +1 for staying in the target lane at the desired position, orientation and speed, and is smoothly penalized for deviating from the lane down to a value of 0". I would like to try increasing this reward component to have a maximal value greater than +1 (perhaps +3 or something).

How would you recommend I go about this?

Change bird view to dashboard view

Hello!
I want to change the camera I get when I run evaluate_nocrash.py from bird view to a dashboard camera.
Things I tried to do:
-I tried to modify the code so I got the view from a dashboard sensor but I couldn't exactly pinpoint what I needed to change and where.
-I thought about recording the session and then viewing it by the sensor I wanted, but the --record argument is only available to the nocrash_evaluator.py and I am having a hard time running it as well.
So can you guide me to the right direction?

Issues with running the validation routes

Hey Dian,

I've noticed an issue when deploying agents on the validation routes. On some of them, non-ego agents will be stuck in front of the ego agent which causes it to halt until timeout. The following shows route completion/driving score of the driving agent on all validation routes, each run for 5 repetitions

overall_score_metrics_0

Some of them, like route_09 and route_20 reach the same score every single time. Taking a look at route_20's debug videos, you see stuff like this

image

This vehicle just sits there for ages without doing anything, and causes the ego agent to timeout even though the ego agent didn't do anything wrong. Have you seen anything like this in your own experiments?

compatibility with new versions

Hello,
I am very interested in your project. The new version of Carla enriches many functions and fixes some bugs. The project is great. If the project is compatible with the latest version of Carla, it will have more extensions. Will it be possible to release the training code or model under carla0.9.12+ ? Or I just wonder if the code will also work in carla0.9.12+ or is it relatively easy to extend it for that?

Thanks, in advance!

VisualizationDataset Class in main_dataset.py

Make sure you have read FAQ before posting.
Thanks!
Hey Dian,
I successfully collected the dataphase1 by installing the torch 1.4.0, and I am confused by these function in the main_dataset.py.

1. Maindataset Class

full_path='/home/shy/Desktop/WorldOnRails/DataAndModel/PhaseData_12/'
config_path = '/home/shy/Desktop/WorldOnRails/config.yaml'
dataset = MainDataset(full_path, config_path)
wide_rgb, wide_sem, narr_rgb, lbls, locs, rots, spds, cmd = dataset[30]

Then I can load these data, and I see it returns the right value. So do I need to delete the small dataset by myself?

/home/shy/Desktop/WorldOnRails/DataAndModel/PhaseData_12/udmprkyurt  is too small. consider deleting it.
/home/shy/Desktop/WorldOnRails/DataAndModel/PhaseData_12/: 17768 frames (x3)

Besides, the dataset collected includes wide_rgb, wide_sem and narr_rgb ,but there is no narr_sem here. And I found narr_sem is in the Class LabeledMainDataset. So is narr_sem is necessary for training?

2. VisualizationDataset Class and LabeledMainDataset

I failed to run these class, which returns a bytes-like object is required, not 'NoneType' , and I checked the class in this respository, it seems the class is not adpoted. I guess these class are used to generate the rgb sem images.

Vdataset = VisualizationDataset(full_path, config_path)
idx=30
lmdb_txn = Vdataset.txn_map[idx]
index = Vdataset.idx_map[idx]

locs = Vdataset.__class__.access('loc', lmdb_txn, index, 6, dtype=np.float32)
rots = Vdataset.__class__.access('rot', lmdb_txn, index, 5, dtype=np.float32)
spds = Vdataset.__class__.access('spd', lmdb_txn, index, 5, dtype=np.float32).flatten()
lbls = Vdataset.__class__.access('lbl', lmdb_txn, index+1, 5, dtype=np.uint8).reshape(-1,96,96,12)
#maps = Vdataset.__class__.access('map', lmdb_txn, index+1, 5, dtype=np.uint8).reshape(-1,1536,1536,12)

#rgb = Vdataset.__class__.access('rgb',  lmdb_txn, index, 1, dtype=np.uint8).reshape(720,1280,3)
cmd = Vdataset.__class__.access('cmd', lmdb_txn, index, 1, dtype=np.float32).flatten()
#act = Vdataset.__class__.access('act', lmdb_txn, index, 1, dtype=np.float32).flatten()

After testing, I found maps, rgb and act can not be loaded successfully. (Might the class is no needed). Thank you again!

Issues about the map

Hello, I want to know more about the map.
First, does the channel 'stop' represent the location information of traffic lights ?
Second, What information does the 6 channels of 'waypoints' represent ?
Thank you!
image

Iussues about ego vehicle

thank u for your work!
I want to change the physical parameters of the ego vehicle.
Before running, I want to use my custom configured vehicle.
To call a function,like vehicle.get_physics_control().
where I need to find?
look forward to your answer

Waypoint Rewards related issue

I am not able to find where are the waypoint rewards smoothly penalized based on the distance as written in the paper. I see that the orientation and speeds are interpolated while calculating the waypoint rewards but I do not see any processing related to penalizing based on the distance down to a value of 0.

It will be very helpful if somebody can point me in the right direction.
Thanks

Some import erros.

Make sure you have read FAQ before posting.
Thanks!

Hello,
Thanks for you code here. I have downloaded the files and successfully evaluated this model. The final results shows excellent performance. However, I met some problems when training the model from the beginning.

1. Import errors

There are some import errors, such as from common.augmenter import augment ModuleNotFoundError: No module named 'common'.
For example, I have fixed some errors by adding sys to find the import files:

import ray
import math
import numpy as np
import yaml
import torch
import torch.nn.functional as F
from torch import nn, optim
from models import EgoModel, CameraModel
from bellman import BellmanUpdater
from datasets.main_dataset import MainDataset
import sys
sys.path.append("/home/shy/WorldOnRails-nomodel/utils")
import __init__ 

It would be more convenient if you could fix these problems.
2. How to use your released dataset

After downloading your dataset, and I tried to trained the models. However, it fails to load these data, and I have checked the FAQ.md, which shows that we need to write custom data loader. I'm sorry I'm not familiar with carla data loading. Can you provide some ideas on how to use your release dataset? I would appreciate it if you could provide this code here.

3. What does the act_val mean?
In rails.py line 55, I am confused with this act_val. Why it has dimension 4, my understanding is that this action is the three dimensions of steering, throttle and brake and discretize them and finally by softmax to see the maximum probability for the action at this point.

Reproducing Results

Hi,

Thank you providing the code. I was trying to reproduce the results on the noCrash benchmark from the paper and my reproduced results (Column 4) are as follows in comparison to the results given by your implementation (Column 3). I am using the pre-trained model for the noCrash benchmark.

Town, weather Traffic RAILS Result (from paper) Reproduced result
Town 01, Train Weather Empty 98 100
Regular 100 95
Dense 96 82
Town 01, Test Weather Empty 90 80
Regular 90 39
Dense 84 34
Town 02, Train Weather Empty 94 78
Regular 89 63
Dense 74 46
Town 02, Test Weather Empty 78 36
Regular 82 34
Dense 66 24

Can you please let me know what can be the issue? I followed the installation instructions and I am using the evaluate_nocrash.py command under the noCrash routes.

Any help would be greatly appreciated.

Thanks,
Himangi

RAILS ego model training

Hi, I completed collecting data from phase_0. I was able to collect around 178 episodes with 6 workers. Next, I started training the ego model with the collected data and I am encountering the following issue.

image

@dotchen: Should I perform any pre-processing on the collected data that is in wandb? Thanks, in advance!

Issues with running leaderboard routes

Hello! Thanks for your working, but I have some questions:

  1. How to control the number of cars and pedestrians?
  2. How to create my own route instead of training or testing route?

Missing citation

Dear authors,

great work! Please add a reference to WildDash in your paper next to images from the dataset to satisfy WildDash's CC-BY-NC license.

Greetings,
Oliver

Question about data collecting of LBC in this version

Make sure you have read FAQ before posting.
Thanks!
Hi,
I would like to know How to set stop conditions of LBC agent during collecting data .By reading the code, I found that the 'q_collecter' and 'random_collecter' will stop collecting and save the data until the collision. Now, I want to try to train this version of LBC agent. I think the termination condition for LBC agent to collect data is arriving the destination.But I don't know how to implement it. Can you give me some suggestions or share this part of your code . Thank you , and I will be looking forward to your reply.

Continuously writing data to disk rather than doing it all in one go

I asked about the set_synchronous_mode timeout issue in the CARLA repository and it might be a lot of hassle to address.

I'd like to instead try changing how the data is written to disk during data_phase1. Currently, various arrays keep track of RGB images and other data and lmdb writes them to disk at the end of the run or after collision. Is it possible to use lmdb to continuously write data to disk during the run, rather than doing it all at the end?

I'm not very familiar with the library so any help is appreciated!

data_phase2 ray actor dies

Hey Dian,

Trying to run data_phase2 and I get the following Ray error (seems to have issue with RemoteMainDataset constructor?). I did some debugging by replacing all the @ray.remote stuff and .remote() commands with the non-ray versions, and the code runs with no issue (although the progress bar didn't progress past 0 frames after a minute or two, not quite sure if it's supposed to take that long or not).

Did you ever see anything like this/know what I should do?

(wor) aaron@Aarons-Machine:~/workspace/carla/WorldOnRails$ RAY_PDB=1 python -m rails.data_phase2 --num-workers=12
Traceback (most recent call last):
  File "/home/aaron/anaconda3/envs/wor/lib/python3.7/runpy.py", line 193, in _run_module_as_main
2021-05-29 14:45:49,862 WARNING worker.py:1034 -- Traceback (most recent call last):
  File "/home/aaron/anaconda3/envs/wor/lib/python3.7/site-packages/ray/function_manager.py", line 251, in get_execution_info
    info = self._function_execution_info[job_id][function_id]
KeyError: FunctionID(41f68a98bcf1c9ebc84e01b0819040089631493c)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "python/ray/_raylet.pyx", line 550, in ray._raylet.task_execution_handler
  File "python/ray/_raylet.pyx", line 364, in ray._raylet.execute_task
  File "/home/aaron/anaconda3/envs/wor/lib/python3.7/site-packages/ray/function_manager.py", line 256, in get_execution_info
    raise KeyError(message)
KeyError: 'Error occurs in get_execution_info: job_id: JobID(01000000), function_descriptor: {type=PythonFunctionDescriptor, module_name=rails.datasets.main_dataset, class_name=RemoteMainDataset, function_name=__init__, function_hash=084f10af-7af1-46d7-8dda-ada171c2aad9}. Message: FunctionID(41f68a98bcf1c9ebc84e01b0819040089631493c)'
An unexpected internal error occurred while the worker was executing a task.
    "__main__", mod_spec)
  File "/home/aaron/anaconda3/envs/wor/lib/python3.7/runpy.py", line 85, in _run_code
2021-05-29 14:45:49,862 WARNING worker.py:1034 -- A worker died or was killed while executing task ffffffffffffffffcb230a5701000000.
    exec(code, run_globals)
  File "/home/aaron/workspace/carla/WorldOnRails/rails/data_phase2.py", line 67, in <module>
    main(args)
  File "/home/aaron/workspace/carla/WorldOnRails/rails/data_phase2.py", line 13, in main
    total_frames = ray.get(dataset.num_frames.remote())
  File "/home/aaron/anaconda3/envs/wor/lib/python3.7/site-packages/ray/worker.py", line 1381, in get
    raise value
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
(wor) aaron@Aarons-Machine:~/workspace/carla/WorldOnRails$

Traffic Manager times out when setting synchronous mode

Raising a new issue:

While running data_phase1 in parallel, I often get the following error, always when trying to turn traffic_manager's sync mode on or off.

File "/home/aaronhua/WorldOnRails/leaderboard/leaderboard/leaderboard_evaluator.py", line 162, in _cleanup
self.traffic_manager.set_synchronous_mode(False)
RuntimeError: rpc::timeout: Timeout of 2000ms while calling RPC function 'set_synchronous_mode'

(pid=55313) File "/home/aaronhua/WorldOnRails/leaderboard/leaderboard/leaderboard_evaluator.py", line 238, in _load_and_wait_for_world
(pid=55313) self.traffic_manager.set_synchronous_mode(True)
(pid=55313) RuntimeError: rpc::timeout: Timeout of 2000ms while calling RPC function 'set_synchronous_mode'

My guess is that when running in parallel, if one worker happens to be writing data to disk at the end of a run while another worker is trying to reset the simulator, the entire system slows down and triggers the 2 second traffic_manager timeout. I modified the scripts/launch_carla to default to a server timeout of like 20 minutes but still get the above 2s timeout message. I've tried looking around to see if we can specify the traffic_manager specific timeout, but there's no comprehensive list of CarlaUE4.sh command line options.

The closest I can find is here, implying that the C++ implementation of the TrafficManager has a method that sets the set_synchronous_mode timeout. The Python bindings don't appear to have this method though. Do you have an idea of how I'd resolve this issue?

EDIT: as for your comment in the old thread, I've tried looking for errors that precede this one but haven't been able to find any. The only thing is ERROR: failed to destroy actor 11571 : unable to destroy actor: not found and the like, which appears when cleaning up a completed run and doesn't appear to actually break anything. I thought it was a timeout issue because I'm usually able to progress through a few routes and repetitions and gather valid data before workers will start to fail because of the error.

No --agent-config in evaluate.py

Make sure you have read FAQ before posting.
Thanks!
Hi. First of all thank you for your amazing opensource implementation.
In the evaluation part, when I run the 'python evaluate.py --agent-config=[PATH TO CONFIG]' command I got this error:
pygame 2.5.2 (SDL 2.28.2, Python 3.7.12)
Hello from the pygame community. https://www.pygame.org/contribute.html
usage: evaluate.py [-h] [--host HOST]
[--trafficManagerSeed TRAFFICMANAGERSEED]
[--timeout TIMEOUT] [--port PORT]
[--repetitions REPETITIONS] [--track TRACK]
[--resume RESUME] [--checkpoint CHECKPOINT]
evaluate.py: error: unrecognized arguments: --agent-config=/home/zahra/WorldOnRails/main_model_10.th

And there is no --agent-config for me.

I also have a question about using test mode and pretrained weights.
How should we use them and where should we mention them to evaluate?

I'm waiting for your response.
Thanks

Origin leaderboard submit evaluate

Make sure you have read FAQ before posting.
Thanks!

When I tried to use pretrained model to origin leaderboard, the image_agent.py sensors config is not correct with WOR team submit on leaderboard benchmark
image

And for the image_agent posted on this repo:

  1. the collision and stitch_camera is not allowed in origin leaderboard
  2. the width and height is fixed by origin leaderboard
        sensors = [
            {'type': 'sensor.collision', 'id': 'COLLISION'},
            {'type': 'sensor.speedometer', 'id': 'EGO'},
            {'type': 'sensor.other.gnss', 'x': 0., 'y': 0.0, 'z': self.camera_z, 'id': 'GPS'},
            {'type': 'sensor.stitch_camera.rgb', 'x': self.camera_x, 'y': 0, 'z': self.camera_z, 'roll': 0.0, 'pitch': 0.0, 'yaw': 0.0,
            'width': 160, 'height': 240, 'fov': 60, 'id': f'Wide_RGB'},
            {'type': 'sensor.camera.rgb', 'x': self.camera_x, 'y': 0, 'z': self.camera_z, 'roll': 0.0, 'pitch': 0.0, 'yaw': 0.0,
            'width': 384, 'height': 240, 'fov': 50, 'id': f'Narrow_RGB'},
        ]

I want to see from the origin leaderboard because I want to know about the failure situation on WOR and LBC since the result analysis about fail is lack.

If it's possible could you pls give the config/agent and pretrained model files submit to leaderboard (like using another branch for origin leaderboard) or upload the result files after submiting leaderboard , either one is ok for analysis fail situation.

Error in data collection - data_phase0.py

Hi,

When I try to run the data_phase0.py, I am getting an error - TypeError: missing a required argument: 'route' at the line 16, ScenarioRunner.remote function. One of the arguments in the remote function seems missing in the remote function. Maybe, it is scenario_class. Can you please let me know how to add the scenario_class argument in the function? Is it similar to that of data_phase1.py?

Any help would be greatly appreciated.

Thanks,
Himangi

Is it correct in policy()?


  if self.all_speeds:
            act_output =self.act_head(embed).view(-1,self.num_cmds,self.num_speeds,self.num_steers+self.num_throts+1)
            act_output = **action_logits**(act_output, self.num_steers, self.num_throts)
            
            # Action logits
            steer_logits = act_output[0,cmd,:,:self.num_steers]
            throt_logits = act_output[0,cmd,:,self.num_steers:self.num_steers+self.num_throts]
            brake_logits = act_output[0,cmd,:,-1]

  def action_logits(raw_logits, num_steers, num_throts):
      
    steer_logits = raw_logits[...,:num_steers]
    throt_logits = raw_logits[...,num_steers:num_steers+num_throts]
    brake_logits = raw_logits[...,-1:]
    
    steer_logits = steer_logits.repeat(1,1,1,num_throts)                (1,6,4,27)
    throt_logits = throt_logits.repeat_interleave(num_steers,-1)   (1,6,4,27)
    
    act_logits = torch.cat([steer_logits + throt_logits, brake_logits], dim=-1)
    
    return act_logits

before action_logits() act_output's shape is (1,6,4,13), after action_logits() act_output's shape is (1,6,4,28), thus throt_logits = act_output[0,cmd,:,9:9+3] is not correct

Extending agent for Town03

Hello,

About NoCrash routes.
I am trying to run the agent (pre-trained RAILS) in the Town03.
I modified part of the code to enable the program to load the Town03. But I found that when I reset the starting point and the ending point (a straight line), the agent can't go straight. Sometimes it always drives to the rightmost Lane in two lanes,even if the start and end points are in the left lane. When the vehicle is driving in the rightmost lane, it cannot overtake if there is a stationary vehicle ahead.
Is there something wrong?
I would appreciate it if you could give me some advice.

Look forward to your reply.

Some theory questions

Hey Dian, bit of a different issue. I'm trying to match up WOR as described in the paper to the code. Equation 2 in the paper shows how you compute the Q values at a given state by using the ego model to forward simulate different actions, and using the reward structure to compute rewards for a particular imagined trajectory. Two questions here:

  1. I get how you can use the forward model and explicit reward structure to compute r(L^ego, L^world, a), but how is the cost-to-go term V_{t+1} computed?
  2. With a 5 step planning horizon and 28 discrete actions, I would think this becomes pretty expensive to fully tabularize (28^5 possible trajectories I think) - is that what happens? Or do you assume that the candidate action is repeated over 5 steps?

Also please let me know if you'd rather I email these questions instead of making this a Github issue.

Bars of tqdm seems fixed at 0/50000 and fail to continue the Q-labeling

Make sure you have read FAQ before posting.
Thanks!
Hello,
After running all the above programs correctly as you suggested,I have trained the ego-model and successfully collected the nocrash data about 186GB, then what I need to do is label Q. So I run with $python -m rails.data_phase2 --num-workers=4, it shows as follows:

|         | 0/53267 [00:00<?, ?it/s

And I have checked my GPU, and it shows the ray and running now.

+-------------------------------+----------------------+----------------------+
|   2  GeForce RTX 2080    Off  | 00000000:86:00.0 Off |                  N/A |
| 50%   71C    P2   153W / 215W |   2480MiB /  7952MiB |     98%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce RTX 2080    Off  | 00000000:AF:00.0 Off |                  N/A |
| 51%   72C    P2   152W / 215W |   2480MiB /  7952MiB |     98%      Default |

|    2     14845      C   ray::RAILSActionLabeler.run()               1229MiB |
|    2     14855      C   ray::RAILSActionLabeler.run()               1229MiB |
|    3     14898      C   ray::RAILSActionLabeler.run()               1229MiB |
|    3     14919      C   ray::RAILSActionLabeler.run()               1229MiB 

When I tried CTRL+C, it shows:

raceback (most recent call last):
  File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/shy/Desktop/WorldOnRails/rails/data_phase2.py", line 58, in <module>
    main(args)
  File "/home/shy/Desktop/WorldOnRails/rails/data_phase2.py", line 24, in main
    current_frames = ray.get(logger.total_frames.remote())
  File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/worker.py", line 1372, in get
    object_refs, timeout=timeout)
  File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/worker.py", line 304, in get_objects
    object_refs, self.current_task_id, timeout_ms)
  File "python/ray/_raylet.pyx", line 869, in ray._raylet.CoreWorker.get_objects
  File "python/ray/_raylet.pyx", line 142, in ray._raylet.check_status
KeyboardInterrupt
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/site-packages/ray/node.py", line 868, in _kill_process_type
    process.wait(timeout_seconds)
  File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/subprocess.py", line 1019, in wait
    return self._wait(timeout=timeout)
  File "/home/shy/anaconda3/envs/world_on_rails/lib/python3.7/subprocess.py", line 1647, in _wait
    time.sleep(delay)
KeyboardInterrupt
  0%|                              

It seems it just waiting now? (But we donot need to launch carla in this phase)
And the data-dir is set to the collected data direction, the config.yaml is set to the no-crash config.
( default='/home/shy/Desktop/WorldOnRails/experiments/config_nocrash.yaml')), just cp config.yaml to experiments file. Thanks a lot!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.