google-research / circuit_training Goto Github PK

License: Apache License 2.0

Python 98.40% Shell 1.60%

circuit_training's Introduction

Google Research

This repository contains code released by Google Research.

All datasets in this repository are released under the CC BY 4.0 International license, which can be found here: https://creativecommons.org/licenses/by/4.0/legalcode. All source files in this repository are released under the Apache 2.0 license, the text of which can be found in the LICENSE file.

Because the repo is large, we recommend you download only the subdirectory of interest:

SUBDIR=foo
svn export https://github.com/google-research/google-research/trunk/$SUBDIR

If you'd like to submit a pull request, you'll need to clone the repository; we recommend making a shallow clone (without history).

git clone [email protected]:google-research/google-research.git --depth=1

Disclaimer: This is not an official Google product.

Updated in 2023.

circuit_training's People

Contributors

Stargazers

Watchers

Forkers

sguada isabella232 inarikami tfboyd thesukantadey tspyrou preritt proppy mrrobot2211 lusu2004 yangxiaojun1230 loesterfranco mayhuifu muklek mrinal18 sungreong ricklentz icas-sjtu gohil-vasudev tttiko delugefrank pranjalvyas1507 yangwang92 ytchx1999 animeshbchowdhury wxjzte wooshik-m jiyoonlim123 doem happycat2 xingquan-li zhaoxueyan1 ljk314 tamood lapchiu-super hedy14 rayhaorui diziww jammy-li irenerose1 chenghao-yang davymorgan agoldie mr-fang-vlsi python-repository-hub supermanabu wangyaobsz tanvir-a forschumi lyx-soul abhijithanil neuroidss gabewb payrim llshao satadrudas santakd stevenjwchen qiwang-sjtu rogerity shubhamguptaiitd qianmuluo sakundu zebrajack oriskunk rickyhong thetrident maria-uet dinple leeloolee ai4verification simplexgoel imsanko seungju-k1m rustamc qscloud-l skywalker-matthew lucifer2288 yazdanbakhsh deyh2020 1z2s3e4v dhiyu jason-nguyentronghieu egdon tiancsgegeburu ikwon2002 tongzhongkai gbyfield ydeh22 chenmoon2bird sahilm1992 jasonjabbour ivannz zhiangwang033 qianxu1998 zstreet87 muzafferkal dance-wavegraph-fun yzh20020301 garyson

circuit_training's Issues

What is the 'Routes per micron, Routes used by macros' in initial.plc?

Hello

Thank for sharing your project.

Your documentations are intuitive but there are somethings I can not understand.

In initial.plc file, you define the term Routes per micron and Routes used by macros.

I think these terms are about a kind of route capacity of each macro and standard cell.

I would appreciate it if you provide it in detail.

Also, I want to extract these terms from lef/def file format.

It would be very helpful if you could give me some advice on extracting them.

Thanks

ppo_train is erroring out when dreamplace_core is imported

Hi,

I am trying run CT with dreamplace and I am getting the following error from the training log.

W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at matmul_op_impl.h:622 : INTERNAL: cublas error

Here is the screenshot of the error:

So I tried running CT using FD and I was still getting the same error. Then I observed if I do not import dreamplace_core I do not have any such problem.

Can you please help me to figure out why I am getting this error when I import dreamplace_core? Also, how to fix it?

Thanks,
Sayak

How do you generate .plc file using LEF/DEF translator?

I have been attempting to use the LEF/DEF translator to use in circuit training, but the script only generates the .pb.txt file upon completion with the example that this given. From my understanding, the circuit training requires both the .plc and the .pb.txt file to work unless otherwise stated. I would really appreciate some assistance/guidance for this.

netlist.pb and plc files' format

We are trying to write a tcl to do the transform from LEF/DEF to pb.netlist, so there are a few questions want to confirm first.

pb.netlist and its generated plc files are all the input needed?
node rst_ni seems defined as a port and connected to several clusters (Grp_1204/Grp_1203...), but why clk_i not? example as:

name: "clk_i"
attr {
key: "side"
...
node {
name: "rst_ni"
input: "Grp_1204/Pinput"
input: "Grp_1203/Pinput"
input: "Grp_791/Pinput"

Can't find the method of plc

Such as this,I can't find unplace_node.
Thank you

Docker smoke test test_save_file_train_step fails: 'PlaceDB' object has no attribute 'flatten_nested_map'

Here is the log out of the smoke test:

root@882a26d2ae64:/workspace# python3.9 -m circuit_training.environment.environment_test
2023-04-12 02:40:09.994638: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-04-12 02:40:10.246438: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-04-12 02:40:11.468534: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-04-12 02:40:11.475281: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-12 02:40:14.729153: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.25.8) or chardet (5.1.0) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
Running tests under Python 3.9.16: /usr/bin/python3.9
[ RUN      ] EnvironmentTest.test_action_space
I0412 02:40:28.219354 139657645123392 environment.py:216] ***Num node to place***:2
I0412 02:40:28.223868 139657645123392 placement_util.py:447] node_order: descending_size_macro_first
I0412 02:40:28.245094 139657645123392 observation_extractor.py:320] Pad a tensor with shape (5,) by 42000
I0412 02:40:28.255223 139657645123392 observation_extractor.py:334] Pad a tensor with shape (5,) by 3500
INFO:tensorflow:time(__main__.EnvironmentTest.test_action_space): 0.34s
I0412 02:40:28.288695 139657645123392 test_util.py:2467] time(__main__.EnvironmentTest.test_action_space): 0.34s
[       OK ] EnvironmentTest.test_action_space
[ RUN      ] EnvironmentTest.test_create_and_obs_space
I0412 02:40:28.368595 139657645123392 environment.py:216] ***Num node to place***:2
I0412 02:40:28.370536 139657645123392 placement_util.py:447] node_order: descending_size_macro_first
I0412 02:40:28.376635 139657645123392 observation_extractor.py:320] Pad a tensor with shape (5,) by 42000
I0412 02:40:28.377619 139657645123392 observation_extractor.py:334] Pad a tensor with shape (5,) by 3500
INFO:tensorflow:time(__main__.EnvironmentTest.test_create_and_obs_space): 0.11s
I0412 02:40:28.401192 139657645123392 test_util.py:2467] time(__main__.EnvironmentTest.test_create_and_obs_space): 0.11s
[       OK ] EnvironmentTest.test_create_and_obs_space
[ RUN      ] EnvironmentTest.test_infisible
I0412 02:40:28.482485 139657645123392 environment.py:216] ***Num node to place***:2
I0412 02:40:28.485958 139657645123392 placement_util.py:447] node_order: descending_size_macro_first
I0412 02:40:28.503776 139657645123392 observation_extractor.py:320] Pad a tensor with shape (5,) by 42000
I0412 02:40:28.505141 139657645123392 observation_extractor.py:334] Pad a tensor with shape (5,) by 3500
INFO:tensorflow:time(__main__.EnvironmentTest.test_infisible): 0.12s
I0412 02:40:28.524590 139657645123392 test_util.py:2467] time(__main__.EnvironmentTest.test_infisible): 0.12s
[       OK ] EnvironmentTest.test_infisible
[ RUN      ] EnvironmentTest.test_save_file_train_step
I0412 02:40:28.815208 139657645123392 environment.py:216] ***Num node to place***:2
I0412 02:40:28.817175 139657645123392 placement_util.py:447] node_order: descending_size_macro_first
I0412 02:40:28.824875 139657645123392 observation_extractor.py:320] Pad a tensor with shape (5,) by 42000
I0412 02:40:28.828542 139657645123392 observation_extractor.py:334] Pad a tensor with shape (5,) by 3500
I0412 02:40:28.889569 139657645123392 placement_util.py:447] node_order: random
I0412 02:40:28.895797 139657645123392 dreamplace_util.py:104] Update num_bins_x and num_bins_y: (128, 128)
I0412 02:40:28.902512 139657645123392 plc_converter.py:141] Node 8 is placed at (170.000000, 230.000000).
I0412 02:40:28.907457 139657645123392 plc_converter.py:141] Node 2 is placed at (375.000000, 375.000000).
I0412 02:40:28.909322 139657645123392 plc_converter.py:141] Node 3 is placed at (125.000000, 125.000000).
I0412 02:40:28.911084 139657645123392 plc_converter.py:141] Node 0 is placed at (0.000000, 100.000000).
I0412 02:40:28.912482 139657645123392 plc_converter.py:141] Node 1 is placed at (499.000000, 499.000000).
INFO:tensorflow:time(__main__.EnvironmentTest.test_save_file_train_step): 0.4s
I0412 02:40:28.929174 139657645123392 test_util.py:2467] time(__main__.EnvironmentTest.test_save_file_train_step): 0.4s
[  FAILED  ] EnvironmentTest.test_save_file_train_step
[ RUN      ] EnvironmentTest.test_session
[  SKIPPED ] EnvironmentTest.test_session
[ RUN      ] EnvironmentTest.test_validate_circuite_env
I0412 02:40:29.045726 139657645123392 environment.py:216] ***Num node to place***:2
I0412 02:40:29.048648 139657645123392 placement_util.py:447] node_order: descending_size_macro_first
I0412 02:40:29.060537 139657645123392 observation_extractor.py:320] Pad a tensor with shape (5,) by 42000
I0412 02:40:29.064798 139657645123392 observation_extractor.py:334] Pad a tensor with shape (5,) by 3500
INFO:tensorflow:time(__main__.EnvironmentTest.test_validate_circuite_env): 0.25s
I0412 02:40:29.188475 139657645123392 test_util.py:2467] time(__main__.EnvironmentTest.test_validate_circuite_env): 0.25s
[       OK ] EnvironmentTest.test_validate_circuite_env
[ RUN      ] EnvironmentTest.test_wrap_tfpy_environment
I0412 02:40:29.263710 139657645123392 environment.py:216] ***Num node to place***:2
I0412 02:40:29.265571 139657645123392 placement_util.py:447] node_order: descending_size_macro_first
I0412 02:40:29.270780 139657645123392 observation_extractor.py:320] Pad a tensor with shape (5,) by 42000
I0412 02:40:29.271743 139657645123392 observation_extractor.py:334] Pad a tensor with shape (5,) by 3500
/usr/lib/python3.9/multiprocessing/pool.py:265: ResourceWarning: unclosed running multiprocessing pool <multiprocessing.pool.ThreadPool state=RUN pool_size=1>
  _warn(f"unclosed running multiprocessing pool {self!r}",
ResourceWarning: Enable tracemalloc to get the object allocation traceback
INFO:tensorflow:time(__main__.EnvironmentTest.test_wrap_tfpy_environment): 0.13s
I0412 02:40:29.319782 139657645123392 test_util.py:2467] time(__main__.EnvironmentTest.test_wrap_tfpy_environment): 0.13s
[       OK ] EnvironmentTest.test_wrap_tfpy_environment
======================================================================
ERROR: test_save_file_train_step (__main__.EnvironmentTest)
EnvironmentTest.test_save_file_train_step
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/workspace/circuit_training/environment/environment_test.py", line 149, in test_save_file_train_step
    obs, _, done, _ = env.step(action)
  File "/workspace/circuit_training/environment/environment.py", line 575, in step
    cost, info = self.call_analytical_placer_and_get_cost()
  File "/workspace/circuit_training/environment/environment.py", line 459, in call_analytical_placer_and_get_cost
    self._save_placement(cost)
  File "/workspace/circuit_training/environment/environment.py", line 427, in _save_placement
    self._run_cd()
  File "/workspace/circuit_training/environment/environment.py", line 389, in _run_cd
    cd = cd_placer.CoordinateDescentPlacer(plc=self._plc, cost_fn=cost_fn)
  File "/usr/local/lib/python3.9/dist-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/usr/local/lib/python3.9/dist-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/usr/local/lib/python3.9/dist-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/workspace/circuit_training/environment/coordinate_descent_placer.py", line 149, in __init__
    self._dreamplace = dreamplace_core.SoftMacroPlacer(
  File "/usr/local/lib/python3.9/dist-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/usr/local/lib/python3.9/dist-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/usr/local/lib/python3.9/dist-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/workspace/circuit_training/dreamplace/dreamplace_core.py", line 34, in __init__
    self.placedb_plc = placedb_plc.PlacedbPlc(plc, params, hard_macro_order)
  File "/workspace/circuit_training/dreamplace/placedb_plc.py", line 31, in __init__
    self.placedb = self.converter.convert(plc, hard_macro_order)
  File "/workspace/circuit_training/dreamplace/plc_converter.py", line 524, in convert
    convert_to_ndarray(db)
  File "/workspace/circuit_training/dreamplace/plc_converter.py", line 404, in convert_to_ndarray
    db.flat_node2pin_map, db.flat_node2pin_start_map = db.flatten_nested_map(
AttributeError: 'PlaceDB' object has no attribute 'flatten_nested_map'
  In call to configurable 'SoftMacroPlacer' (<class 'circuit_training.dreamplace.dreamplace_core.SoftMacroPlacer'>)
  In call to configurable 'CoordinateDescentPlacer' (<class 'circuit_training.environment.coordinate_descent_placer.CoordinateDescentPlacer'>)

Looks like it is calling a wrong version of DREAMPlace?

Continuing the same thread as in issue #6

Continuing the same thread as in #6

Hello, @tfboyd thank you so much for the information, can you also suggest from where you extracted the dataset/macros? because Ariane RISC-V open-source Github link is outdated (According to this riscv-software-src/riscv-tools#333)

Thanks,
Mrinal

Request to provide a GPU-Accelerated Base Docker Image for CT

Hi,

When I use the base_image=nvidia/cuda:11.4.2-cudnn8-devel-ubuntu20.04 to build the docker environment following the steps given here, I encounter the subsequent error:

2023-04-03 17:42:04.452598: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:417] Loaded runtime CuDNN library: 8.5.0 but source was compiled with: 8.6.0.  CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
2023-04-03 17:42:04.455368: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at conv_grad_input_ops.cc:385 : UNIMPLEMENTED: DNN library is not found.

Could you kindly recommend a suitable base_image that is compatible with tf-agent[reverb] v0.16?

Thanks,
Sayak

e2e_smoke test failed (19 Illegal instruction (core dumped) "$@")

Environment

OS: Ubuntu 20.04 Focal
CPU: Ryzen ThreadRipper
GPU: GPU Passthrough Nvidia 3090

Code Used

$ export CT_VERSION=0.0.3
$ git clone https://github.com/google-research/circuit_training.git
$ git -C $(pwd)/circuit_training checkout r${CT_VERSION}

$ export REPO_ROOT=$(pwd)/circuit_training
$ export TF_AGENTS_PIP_VERSION=tf-agents[reverb]
$ export PYTHON_VERSION=python3.9
$ export DREAMPLACE_PATTERN=dreamplace_20230414_2835324_${PYTHON_VERSION}.tar.gz
$ mkdir -p ${REPO_ROOT}/logs
$ docker build --pull --no-cache --tag circuit_training:core \
    --build-arg tf_agents_version="${TF_AGENTS_PIP_VERSION}" \
    --build-arg dreamplace_version="${DREAMPLACE_PATTERN}" \
    --build-arg placement_cost_binary="plc_wrapper_main_${CT_VERSION}" \
    -f "${REPO_ROOT}"/tools/docker/ubuntu_circuit_training ${REPO_ROOT}/tools/docker/
$ docker run --rm -v ${REPO_ROOT}:/workspace --workdir /workspace circuit_training:core \
    bash tools/e2e_smoke_test.sh --root_dir /workspace/logs

Code Error given

test@gpu:~$ docker run --rm -v ${REPO_ROOT}:/workspace --workdir /workspace circuit_training:core \
>     bash tools/e2e_smoke_test.sh --root_dir /workspace/logs
--root_dir
/workspace/logs
FYI: Local logs (--script_logs) cannot write to gcs. It is just a pipe.
Reverb server set to 127.0.0.1:8008
Starting Reveb Server in the background.
Logging reverb job  to /workspace/logs/reverb.log.
Starting 4 collect jobs.
Start collect job 1 in the background...
Logging collect job 1 to /workspace/logs/collect_1.log.
Start collect job 2 in the background...
Logging collect job 2 to /workspace/logs/collect_2.log.
Start collect job 3 in the background...
Logging collect job 3 to /workspace/logs/collect_3.log.
Start collect job 4 in the background...
Logging collect job 4 to /workspace/logs/collect_4.log.
Start Training job in the background but logging to console.
It has been ~0m. Sleeping 60s waiting for error or end.
tools/e2e_smoke_test.sh: line 132:    19 Illegal instruction     (core dumped) "$@"
Collect job failed (SIGUSR1). Check /workspace/logs/collect_*.log.
Exiting with code 8.
test@gpu:~$

Log Files

test@gpu:~$ cat circuit_training/logs/collect_*.log
tools/e2e_smoke_test.sh: line 126:    13 Illegal instruction     (core dumped) "$@"
tools/e2e_smoke_test.sh: line 126:    15 Illegal instruction     (core dumped) "$@"
tools/e2e_smoke_test.sh: line 126:    17 Illegal instruction     (core dumped) "$@"
tools/e2e_smoke_test.sh: line 126:    18 Illegal instruction     (core dumped) "$@"

Thoughts

This issue seems to persist solely on this specific setup/computer, I have tried running it on other computers and has no core dump issues. I would really appreciate the help if I can get this resolved.

Modifying Circuit Training Code

Hi,

MacroPlacement Team has an open sourced version of plc_wrapper_main (under our interpretation). We provide guidence on how to plug it into CircuitTraining in our github documentation but this requires us to modify and publish circuit_training code with ~100 lines of modification.

We do have reference/citation and disclaimer in our github page, but we would like to know if this is permissible and is there any further action we need to take?

source of x,y coordinates in netlist.pb.txt and plc.get_node_width_height()?

I plotted the nodes from Ariane RISC example netlist file

In summary:

Parsed the netlist.pb.txt file using plc client.
Got the canvas width and height with w,h = plc.get_canvas_width_height()
Got the x,y coordinates of each node in the netlist with n_w, n_h = plc.get_node_width_height()
Plotted each node on the canvas of size (w,h).

The resultant plot is:

Where the blue blocks are macros while the red blocks are clusters of standard cells. Can you please share what is the source of the coordinates of the nodes here?

Possibly the question is related to issue #25

Support LEF/DEF

Please add a comment if you need/want LEF/DEF support so we can prioritize asks.

I have a question about coordinator.

Hello. I am Seungju Kim.

Thanks for your repository.

I have a question about coordinator system.

I want to exactly know where the origin (0,0) is.

I guess that origin is at one of the square vertices.

It would be appreciate if you explain it as vertices (upper left, upper right, lower left, lower right).

Thanks

plc_client.get_canvas_width_height unexplained behavior

Hi!
Great work on circuit training.

I came across this issue while parsing the Ariane RISC netlist.pb.txt file with plc_client. plc_client.get_canvas_width_height() gives the canvas size: (408.06, 408.06). But I cannot find the source/logic behind this canvas size.
Note that:

The netlist.pb.txt file has no explicit mention of canvas width and height.
The initial.plc file has a different size: # Width : 356.592 Height : 356.640 Also, I have tried changing the canvas size in initial.plc to random numbers but this does not affect the outcome of plc_client.get_canvas_width_height() proving that canvas size in initial.plc is not linked to canvas size from plc_client.get_canvas_width_height()
Also the macro_10x10 netlist file, gets the canvas size is: (645.49, 645.49) from plc client. This proves that the canvas size does vary with the netlist file and is not a fixed initial value.

So, where do canvas sizes like (408.06, 408.06) and (645.49, 645.49) come from?

Request provide an alternative link or the data for the Ariane tensorboard link.

Hi,

Since TensorBoard.dev is shutting down, could you please export and share the TensorBoard details for the example Ariane runs provided here?

Thanks,
Sayak

plc_client.py behavior

Hi,

It appears to me that plc_client is not fully open-sourced. However, I don't fully understand its behaviour. It seems to me that it is trying to establish some socket connection to a tempfile and retrieve function returns from that socket connection.

Is it connected to some Google-end server or some binary file on the host machine?

How to load and use a pre-trained model (weights)?

Hi!

I've been playing around with circuit_training and now there's one question that's bothering me a lot.

Let's say, I trained model from scratch for Ariane. And now I want to use the obtained pre-trained weights for macro_tiles_10x10. What are my next steps?

I've tried to find any options that allow loading a pre-trained model, but I failed.

Dear author，I still have some questions.

As I show below,how do you normalize the wirelength?Whether or not the function of plc.get_cost considerd area?

In addition, I notice that the values of wirelength and density are of different orders of magnitude, but they are multiplied by the same weight. Is it a problem here?

In the experiment, the cost result is calculated by the weighted sums of wirelength, congestion and density. However, the values of different netlists are different. Since the weighted sum cost of a large netlist may be larger than that of a small netlist, how do you ensure the rewards of a large netlist should not always less than of a small netlist?

Thank you so much in advance.

Information on Dataset

if we want to reproduce the results, from where should we take the data from? there is no such information present in README.

How to use this code with less computing resource?

Hi,

I have a question about circuit training for Araine RISC-V, in the guidance you have provided. You used a lot of computing resources including 20 96vCPUs and 8xV100 GPUs, however we do not have that much computing resources, is that means we can not get a reasonable training result(episode return increase with training process) like you have presented in your tensorboard? If we can get a good training result with our server configuration, would you please tell us how to adjust hyperparameters in your source code, thanks!

[src/tcmalloc.cc:284] Attempt to free invalid pointer

When I'm running the train and collect job. I'm getting this error. Any idea why is that so?

SHA256 optimized asic

Do you recommend this framework for designing an ASIC that is optimized for sha256?

Related question: In the paper accompanying this framework, it's stated that "We group millions of standard cells into a few thousand clusters using hMETIS (Karypis & Kumar, 1998)". I was wondering if this is too coarse of an approximation for optimizing SHA256 specifically.

Appreciate your help, thank you

Grouping no longer supports large protobuf netlist

Hi,
After this commit, I see the grouping code no longer supports two smaller split protobuf netlists as input as mentioned here. So I can not run the latest grouping code for the large protobuf netlist after splitting it into smaller netlist using split_proto_netlist_main.py.

What is the "weight" attribute in the pb.txt input file?

As stated in the title, I am not sure what is the weight attribute. I believe it is only mentioned in the ariane test case. Is it the edge weight between groups or is it something else?

Docker smoke test ModuleNotFoundError No module named 'absl'

While trying out the docker Image I get this error

~/work/circuit-training$ docker run -it --rm -v $(pwd):/workspace --workdir /workspace circuit_training:core bash
root@7fd85732683b:/workspace/circuit-training# python3 -m circuit_training.environment.environment_test
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/workspace/circuit-training/circuit_training/environment/environment_test.py", line 19, in <module>
    from absl import flags
ModuleNotFoundError: No module named 'absl'

I think the build command went through (here cached):

~/work/circuit-training$ docker build --tag circuit_training:core -f tools/docker/ubuntu_circuit_training tools/docker/
[+] Building 1.5s (20/20) FINISHED                                                                                                                                                                          
 => [internal] load build definition from ubuntu_circuit_training                                                                                                                                      0.0s
 => => transferring dockerfile: 3.85kB                                                                                                                                                                 0.0s
 => [internal] load .dockerignore                                                                                                                                                                      0.0s
 => => transferring context: 2B                                                                                                                                                                        0.0s
 => [internal] load metadata for docker.io/library/ubuntu:20.04                                                                                                                                        1.4s
 => [ 1/16] FROM docker.io/library/ubuntu:20.04@sha256:24a0df437301598d1a4b62ddf59fa0ed2969150d70d748c84225e6501e9c36b9                                                                                0.0s
 => CACHED [ 2/16] RUN apt-get -o Acquire::Retries=3 -y update && apt-get -o Acquire::Retries=3 -y install -y --no-install-recommends         software-properties-common         curl         tmux     0.0s
 => CACHED [ 3/16] RUN add-apt-repository ppa:deadsnakes/ppa                                                                                                                                           0.0s
 => CACHED [ 4/16] RUN apt-get -o Acquire::Retries=3 -y update && apt-get -o Acquire::Retries=3 -y install -y --no-install-recommends         python3.9-dev         python3.9-distutils         &&     0.0s
 => CACHED [ 5/16] RUN curl https://storage.googleapis.com/rl-infra-public/circuit-training/placement_cost/plc_wrapper_main      -o  /usr/local/bin/plc_wrapper_main                                   0.0s
 => CACHED [ 6/16] RUN chmod 555 /usr/local/bin/plc_wrapper_main                                                                                                                                       0.0s
 => CACHED [ 7/16] RUN curl -O https://bootstrap.pypa.io/get-pip.py                                                                                                                                    0.0s
 => CACHED [ 8/16] RUN python3.9 get-pip.py                                                                                                                                                            0.0s
 => CACHED [ 9/16] RUN python3.9 -mpip --no-cache-dir install tf-agents[reverb] sortedcontainers tox pytest                                                                                            0.0s
 => CACHED [10/16] RUN apt-get update         && apt-get install -y             wget             flex             libcairo2-dev             libboost-all-dev                                           0.0s
 => CACHED [11/16] RUN python3.9 -mpip install pyunpack>=0.1.2         patool>=1.12         timeout-decorator>=0.5.0         matplotlib>=2.2.2         cairocffi>=0.9.0         pkgconfig>=1.4.0       0.0s
 => CACHED [12/16] RUN python3.9 -mpip install numpy==1.23.5 # Required by tensorflow until fixed ~TF 2.12+                                                                                            0.0s
 => CACHED [13/16] RUN mkdir -p /dreamplace                                                                                                                                                            0.0s
 => CACHED [14/16] RUN curl https://storage.googleapis.com/rl-infra-public/circuit-training/dreamplace/dreamplace_python3.9.tar.gz -o /dreamplace/dreamplace.tar.gz                                    0.0s
 => CACHED [15/16] RUN tar xzf /dreamplace/dreamplace.tar.gz -C /dreamplace/                                                                                                                           0.0s
 => CACHED [16/16] RUN python3.9 -m pip freeze                                                                                                                                                         0.0s
 => exporting to image                                                                                                                                                                                 0.0s
 => => exporting layers                                                                                                                                                                                0.0s
 => => writing image sha256:650e0e691d2ae5db40e76416db4f47b6317b16ecb7b6c2093aa44757bc814b5c                                                                                                           0.0s
 => => naming to docker.io/library/circuit_training:core

Did I miss something or is the 'absl' package actually missing from the docker image?

Failed training for test_data: macro_tiles_10x10 and sample_clustered

We have trouble running these two testing data: macro_tiles_10x10 and sample_clustered, although we succeeded in executing the training job for Ariane. The problem is that the tmux session for collect_job is stuck at model_id 0/step 0 and does not progress, which is show in the following figure.

For these two testing data, we typed the same commands as we tested Ariane except the following changes:

$ export NETLIST_FILE=./circuit_training/environment/test_data/sample_clustered/netlist.pb.txt
$ export INIT_PLACEMENT=./circuit_training/environment/test_data/sample_clustered/initial.plc

Are we required to make other changes to the commands?
Shall we modify the hyperparameters to fit each testing data, such as learning_rate, batch_size, etc?

CPU RAM Usage

I'm having issues with the amount of CPU RAM being used by train_ppo.py. Specifically, over the course of training the memory usage steadily increases until there is no memory left, causing an error. This seems odd as I would expect the total memory usage to be roughly constant over the course of training as the model and dataset are both a fixed size. Does anyone have an idea as to why this is the case? Has anyone else experience similar issues, and been able to alleviate them?

CPU docker build e2e Smoke test failing

System Information

OS: Ubuntu 20.04
CUDA: 11.7
cuDNN: 8.5.0

I have followed the instructions, checked out the r0.0.3 version as well. This is the main error message in the collect_X.log files

Traceback (most recent call last):
File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/workspace/circuit_training/learning/train_ppo.py", line 154, in
multiprocessing.handle_main(functools.partial(app.run, main))
File "/usr/local/lib/python3.9/dist-packages/tf_agents/system/default/multiprocessing_core.py", line 77, in handle_main
return app.run(parent_main_fn, *args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/usr/local/lib/python3.9/dist-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "/usr/local/lib/python3.9/dist-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/usr/local/lib/python3.9/dist-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "/workspace/circuit_training/learning/train_ppo.py", line 134, in main
train_ppo_lib.train(
File "/usr/local/lib/python3.9/dist-packages/gin/config.py", line 1605, in gin_wrapper
utils.augment_exception_message_and_reraise(e, err_str)
File "/usr/local/lib/python3.9/dist-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
raise proxy.with_traceback(exception.traceback) from None
File "/usr/local/lib/python3.9/dist-packages/gin/config.py", line 1582, in gin_wrapper
return fn(*new_args, **new_kwargs)
File "/workspace/circuit_training/learning/train_ppo_lib.py", line 258, in train
save_model_trigger = triggers.PolicySavedModelTrigger(
File "/usr/local/lib/python3.9/dist-packages/tf_agents/train/triggers.py", line 127, in init
self._raw_policy_saver = self._build_saver(raw_policy, batch_size,
File "/usr/local/lib/python3.9/dist-packages/tf_agents/train/triggers.py", line 168, in _build_saver
saver = policy_saver.PolicySaver(
File "/usr/local/lib/python3.9/dist-packages/tf_agents/policies/policy_saver.py", line 383, in init
polymorphic_action_fn.get_concrete_function(
File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py", line 1258, in get_concrete_function
concrete = self._get_concrete_function_garbage_collected(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py", line 1238, in _get_concrete_function_garbage_collected
self._initialize(args, kwargs, add_initializers_to=initializers)
File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py", line 763, in _initialize
self._variable_creation_fn # pylint: disable=protected-access
File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/polymorphic_function/tracing_compiler.py", line 171, in _get_concrete_function_internal_garbage_collected
concrete_function, _ = self._maybe_define_concrete_function(args, kwargs)
File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/polymorphic_function/tracing_compiler.py", line 166, in _maybe_define_concrete_function
return self._maybe_define_function(args, kwargs)
File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/polymorphic_function/tracing_compiler.py", line 356, in _maybe_define_function
self._function_spec.make_canonicalized_monomorphic_type(
File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/polymorphic_function/function_spec.py", line 345, in make_canonicalized_monomorphic_type
function_type_lib.canonicalize_to_monomorphic(
File "/usr/local/lib/python3.9/dist-packages/tensorflow/core/function/polymorphism/function_type.py", line 419, in canonicalize_to_monomorphic
_make_validated_mono_param(name, arg, poly_parameter.kind,
File "/usr/local/lib/python3.9/dist-packages/tensorflow/core/function/polymorphism/function_type.py", line 359, in _make_validated_mono_param
mono_type = trace_type.from_value(value, type_context)
File "/usr/local/lib/python3.9/dist-packages/tensorflow/core/function/trace_type/trace_type_builder.py", line 194, in from_value
named_tuple_type, tuple(from_value(c, context) for c in value))
File "/usr/local/lib/python3.9/dist-packages/tensorflow/core/function/trace_type/trace_type_builder.py", line 194, in
named_tuple_type, tuple(from_value(c, context) for c in value))
File "/usr/local/lib/python3.9/dist-packages/tensorflow/core/function/trace_type/trace_type_builder.py", line 176, in from_value
elif isinstance(value, trace.SupportsTracingProtocol):
File "/usr/local/lib/python3.9/dist-packages/typing_extensions.py", line 604, in instancecheck
val = inspect.getattr_static(instance, attr)
File "/usr/lib/python3.9/inspect.py", line 1624, in getattr_static
instance_result = _check_instance(obj, attr)
File "/usr/lib/python3.9/inspect.py", line 1571, in _check_instance
instance_dict = object.getattribute(obj, "dict")
TypeError: this dict descriptor does not support '_DictWrapper' objects
In call to configurable 'train' (<function train at 0x7f431bc254c0>)

Also see below for the logfile in full.
collect_1.log

Issues while running Docker image

I am getting errors when running docker

I used the run commands from here: https://github.com/google-research/circuit_training/blob/main/docs/ARIANE.md#docker

$ docker build --tag circuit_training:core -f tools/docker/ubuntu_circuit_training tools/docker/
Full message :

=> ERROR [9/9] RUN python3 -mpip --no-cache-dir install tf-agents[reverb]                               7.2s
------                                                                                                        
 > [9/9] RUN python3 -mpip --no-cache-dir install tf-agents[reverb]:                                          
#12 0.492 Collecting tf-agents[reverb]                                                                        
#12 0.755   Downloading tf_agents-0.11.0-py3-none-any.whl (1.3 MB)                                            
#12 0.896      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 9.3 MB/s eta 0:00:00                       
#12 0.959 Collecting wrapt>=1.11.1                                                                            
#12 1.004   Downloading wrapt-1.13.3.tar.gz (48 kB)
#12 1.005      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 48.9/48.9 KB 232.0 MB/s eta 0:00:00
#12 1.014   Preparing metadata (setup.py): started
#12 1.128   Preparing metadata (setup.py): finished with status 'done'
#12 1.150 Collecting gym>=0.17.0
#12 1.186   Downloading gym-0.21.0.tar.gz (1.5 MB)
#12 1.221      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 46.0 MB/s eta 0:00:00
#12 1.278   Preparing metadata (setup.py): started
#12 1.387   Preparing metadata (setup.py): finished with status 'done'
#12 1.532 Collecting numpy>=1.13.3
#12 1.565   Downloading numpy-1.22.2-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (13.4 MB)
#12 1.874      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.4/13.4 MB 45.6 MB/s eta 0:00:00
#12 1.903 Collecting cloudpickle>=1.3
#12 1.935   Downloading cloudpickle-2.0.0-py3-none-any.whl (25 kB)
#12 1.948 Collecting gin-config>=0.4.0
#12 1.981   Downloading gin_config-0.5.0-py3-none-any.whl (61 kB)
#12 1.982      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.3/61.3 KB 333.0 MB/s eta 0:00:00
#12 2.097 Collecting pillow
#12 2.137   Downloading Pillow-9.0.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (4.2 MB)
#12 2.220      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.2/4.2 MB 51.4 MB/s eta 0:00:00
#12 2.225 Requirement already satisfied: six>=1.10.0 in /usr/lib/python3/dist-packages (from tf-agents[reverb]) (1.14.0)
#12 2.241 Collecting absl-py>=0.6.1
#12 2.272   Downloading absl_py-1.0.0-py3-none-any.whl (126 kB)
#12 2.274      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 126.7/126.7 KB 369.2 MB/s eta 0:00:00
#12 2.383 Collecting protobuf>=3.11.3
#12 2.516   Downloading protobuf-3.19.4-cp38-cp38-manylinux2014_aarch64.whl (913 kB)
#12 2.540      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 913.6/913.6 KB 42.1 MB/s eta 0:00:00
#12 2.561 Collecting tensorflow-probability>=0.14.1
#12 2.768   Downloading tensorflow_probability-0.15.0-py2.py3-none-any.whl (5.7 MB)
#12 2.879      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.7/5.7 MB 52.7 MB/s eta 0:00:00
#12 2.902 Collecting typing-extensions>=3.7.4.3
#12 2.935   Downloading typing_extensions-4.0.1-py3-none-any.whl (22 kB)
#12 2.997 Collecting tf-agents[reverb]
#12 3.118   Downloading tf_agents-0.10.0-py3-none-any.whl (1.3 MB)
#12 3.159      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 33.2 MB/s eta 0:00:00
#12 3.589   Downloading tf_agents-0.9.0-py3-none-any.whl (1.3 MB)
#12 3.634      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 38.4 MB/s eta 0:00:00
#12 3.766   Downloading tf_agents-0.8.0-py3-none-any.whl (1.2 MB)
#12 3.796      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 44.1 MB/s eta 0:00:00
#12 3.812 Collecting tensorflow-probability==0.12.2
#12 4.065   Downloading tensorflow_probability-0.12.2-py2.py3-none-any.whl (4.8 MB)
#12 4.461      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.8/4.8 MB 12.3 MB/s eta 0:00:00
#12 4.475 Collecting tf-agents[reverb]
#12 4.596   Downloading tf_agents-0.7.1-py3-none-any.whl (1.2 MB)
#12 4.617      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 55.1 MB/s eta 0:00:00
#12 4.757   Downloading tf_agents-0.6.0-py3-none-any.whl (1.1 MB)
#12 4.781      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 49.7 MB/s eta 0:00:00
#12 4.808 Collecting cloudpickle==1.3
#12 4.913   Downloading cloudpickle-1.3.0-py2.py3-none-any.whl (26 kB)
#12 4.920 Collecting tf-agents[reverb]
#12 5.046   Downloading tf_agents-0.5.0-py3-none-any.whl (933 kB)
#12 5.070      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 933.3/933.3 KB 51.4 MB/s eta 0:00:00
#12 5.080 WARNING: tf-agents 0.5.0 does not provide the extra 'reverb'
#12 5.084 Collecting gin-config==0.1.3
#12 5.188   Downloading gin_config-0.1.3-py3-none-any.whl (43 kB)
#12 5.191      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 43.3/43.3 KB 187.0 MB/s eta 0:00:00
#12 5.225 Collecting decorator
#12 5.264   Downloading decorator-5.1.1-py3-none-any.whl (9.1 kB)
#12 5.281 Collecting gast>=0.3.2
#12 5.313   Downloading gast-0.5.3-py3-none-any.whl (19 kB)
#12 5.332 Collecting dm-tree
#12 5.437   Downloading dm-tree-0.1.6.tar.gz (33 kB)
#12 5.447   Preparing metadata (setup.py): started
#12 5.556   Preparing metadata (setup.py): finished with status 'done'
#12 5.563 Building wheels for collected packages: dm-tree
#12 5.564   Building wheel for dm-tree (setup.py): started
#12 5.671   Building wheel for dm-tree (setup.py): finished with status 'error'
#12 5.674   error: subprocess-exited-with-error
#12 5.674   
#12 5.674   × python setup.py bdist_wheel did not run successfully.
#12 5.674   │ exit code: 1
#12 5.674   ╰─> [5 lines of output]
#12 5.674       running bdist_wheel
#12 5.674       running build
#12 5.674       running build_py
#12 5.674       running build_ext
#12 5.674       error: command 'bazel' failed: No such file or directory
#12 5.674       [end of output]
#12 5.674   
#12 5.674   note: This error originates from a subprocess, and is likely not a problem with pip.
#12 5.675   ERROR: Failed building wheel for dm-tree
#12 5.675   Running setup.py clean for dm-tree
#12 5.770 Failed to build dm-tree
#12 5.806 Installing collected packages: protobuf, numpy, gin-config, gast, dm-tree, decorator, cloudpickle, absl-py, tensorflow-probability, tf-agents
#12 6.866   Running setup.py install for dm-tree: started
#12 6.965   Running setup.py install for dm-tree: finished with status 'error'
#12 6.967   error: subprocess-exited-with-error
#12 6.967   
#12 6.967   × Running setup.py install for dm-tree did not run successfully.
#12 6.967   │ exit code: 1
#12 6.967   ╰─> [7 lines of output]
#12 6.967       running install
#12 6.967       /usr/local/lib/python3.8/dist-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
#12 6.967         warnings.warn(
#12 6.967       running build
#12 6.967       running build_py
#12 6.967       running build_ext
#12 6.967       error: command 'bazel' failed: No such file or directory
#12 6.967       [end of output]
#12 6.967   
#12 6.967   note: This error originates from a subprocess, and is likely not a problem with pip.
#12 6.968 error: legacy-install-failure
#12 6.968 
#12 6.968 × Encountered error while trying to install package.
#12 6.968 ╰─> dm-tree
#12 6.968 
#12 6.968 note: This is an issue with the package mentioned above, not pip.
#12 6.968 hint: See above for output from the failure.
------
executor failed running [/bin/sh -c $python_version -mpip --no-cache-dir install $tf_agents_version]: exit code: 1

Running Environment:

OS: Mac OS
Python Version: Python3
Docker Version 20.10.12

Add documentation for the plc_wrapper API

The source code for the plc_wrapper binary includes many internal Google IPs and that is why it is not open-sourced. An example on how to interact with plc_wrapper can be found in plc_client_test.py. However, a complete documentation for its API is needed.

macos x installation

is there any special handling for mac osx; plc_wrapper_main is an elf file that is not compatible with mac osx

Connect Circuit Training to open source place and route tooling

Might be worth seeing if using the actual wire length from an open source EDA tool improves the agent performance.

The-OpenROAD-Project/OpenROAD#1563

Configurations for running on GPU?

I'm actually trying to run the process on single GPU with 32GB gives me out of memory issue,

If I'm using multiple GPUs, how do I execute the training process? All the jobs in training are inter-related right? Can I assign a different GPU for a different process using CUDA_VISIBLE_DEVICES?

Can I directly use the code for CPU version, or do I have to make changes to execute this on the GPU environment.

Any help is highly appreciated,

@esonghori please let me know.

Thanks,
Mohan.

Type annotation, variable length mismatch error while running the latest CircuitTraining

Hi
I was running a old release of Circuit Training (CT) (commit tag: 91e14fd). I have integrated the DREAMPlace in it and observed my runs are running out of CPU memory after ~50 iteration of training.

I am trying to run the latest release of CT. Here are the details of my environment:

I have created my own DREAMPlace build using bootstrap_dreamplace_build.sh for python3.8.
For DREAMPlace I am using the circuit training branch.
I have created the docker environment using ubuntu_circuit_training and used nvidia/cuda:11.4.2-cudnn8-devel-ubuntu20.04 as base image. Changed the default numpy version from 1.24.2 to 1.23.1.

I am getting the following errors:

Lots of type annotation error. Link to one of error. There are many more in coordinate_descent_placer.py and model_lib.py. Are these errors real or am I supposed to use python 3.11?
I see ppo_train does not suppose --num_episodes_per_iteration=16 any more. It is there in the README.
Shape mismatch error in eval_job and train_job after removing the num_episode_per_iteration. I have attached the eval.log and train.log files.

So could you please suggest me is there any way to ensure that I am using the correct environment even after using docker?

I tried to run python3 -m circuit_training.environment.environment_test and I found this error (after fixing the type annotation errors): ERROR: test_save_file_train_step (main.EnvironmentTest)

So could you please suggest if it is an environment problem at my end or it is some bug in the latest release of CT?

Thanks for your help,
Sayak

cannot get hmetis

web site not working, so cannot get hmetis!
could you give hmetis binary file?

Error while training PPO agent

python3 -m circuit_training.learning.train_ppo --root_dir=${ROOT_DIR} --std_cell_placer_mode=FD --replay_buffer_server_address=${REVERB_SERVER} --variable_container_server_address=${REVERB_SERVER} --sequence_length=134 --gin_bindings='train.num_iterations=200' --netlist_file=${NETLIST_FILE} --init_placement=${INIT_PLACEMENT} --global_seed=${GLOBAL_SEED}

I'm running train_ppo using the above command
I am getting this error. I have launched reverb server. I am not using any docker.

2023-06-19 14:13:39.798528: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-06-19 14:13:39.830979: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:7697] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-06-19 14:13:39.831030: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-06-19 14:13:39.831050: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-06-19 14:13:39.837003: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-06-19 14:13:39.837239: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-19 14:13:40.386722: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
I0619 14:13:42.663258 140531667562496 train_ppo.py:92] global seed=111
I0619 14:13:44.359349 140531667562496 environment.py:216] Num node to place:133
I0619 14:13:44.449179 140531667562496 placement_util.py:447] node_order: descending_size_macro_first
W0619 14:13:44.953677 140531667562496 plc_client.py:96] JSONDecode Error for GetMacroAndClusteredPortAdjacency
Expecting value: line 1 column 219265 (char 219264)
I0619 14:13:44.953792 140531667562496 plc_client.py:98] Looking for more data for GetMacroAndClusteredPortAdjacency on connection:0/256
W0619 14:13:44.963603 140531667562496 plc_client.py:96] JSONDecode Error for GetMacroAndClusteredPortAdjacency
Expecting ',' delimiter: line 1 column 438529 (char 438528)
I0619 14:13:44.963649 140531667562496 plc_client.py:98] Looking for more data for GetMacroAndClusteredPortAdjacency on connection:1/256
W0619 14:13:44.979751 140531667562496 plc_client.py:96] JSONDecode Error for GetMacroAndClusteredPortAdjacency
Expecting value: line 1 column 657793 (char 657792)
I0619 14:13:44.979862 140531667562496 plc_client.py:98] Looking for more data for GetMacroAndClusteredPortAdjacency on connection:2/256
W0619 14:13:45.003510 140531667562496 plc_client.py:96] JSONDecode Error for GetMacroAndClusteredPortAdjacency
Expecting ',' delimiter: line 1 column 877057 (char 877056)
I0619 14:13:45.003634 140531667562496 plc_client.py:98] Looking for more data for GetMacroAndClusteredPortAdjacency on connection:3/256
W0619 14:13:45.031592 140531667562496 plc_client.py:96] JSONDecode Error for GetMacroAndClusteredPortAdjacency
Expecting value: line 1 column 1096321 (char 1096320)
I0619 14:13:45.031708 140531667562496 plc_client.py:98] Looking for more data for GetMacroAndClusteredPortAdjacency on connection:4/256
W0619 14:13:45.062884 140531667562496 plc_client.py:96] JSONDecode Error for GetMacroAndClusteredPortAdjacency
Expecting value: line 1 column 1315585 (char 1315584)
I0619 14:13:45.063026 140531667562496 plc_client.py:98] Looking for more data for GetMacroAndClusteredPortAdjacency on connection:5/256
W0619 14:13:45.100739 140531667562496 plc_client.py:96] JSONDecode Error for GetMacroAndClusteredPortAdjacency
Expecting value: line 1 column 1534849 (char 1534848)
I0619 14:13:45.100874 140531667562496 plc_client.py:98] Looking for more data for GetMacroAndClusteredPortAdjacency on connection:6/256
W0619 14:13:45.140887 140531667562496 plc_client.py:96] JSONDecode Error for GetMacroAndClusteredPortAdjacency
Expecting value: line 1 column 1754113 (char 1754112)
I0619 14:13:45.141026 140531667562496 plc_client.py:98] Looking for more data for GetMacroAndClusteredPortAdjacency on connection:7/256
I0619 14:13:45.491459 140531667562496 observation_extractor.py:320] Pad a tensor with shape (8312,) by 42000
I0619 14:13:45.491828 140531667562496 observation_extractor.py:334] Pad a tensor with shape (953,) by 3500
I0619 14:13:45.568008 140531667562496 train_ppo_lib.py:214] Initialize iteration at: init_iteration 0.
I0619 14:13:45.569200 140531667562496 train_ppo_lib.py:229] Initialize train_step at 0
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/a0393510local/circuit_training/circuit_training/learning/train_ppo.py", line 154, in
multiprocessing.handle_main(functools.partial(app.run, main))
File "/home/a0393510local/.local/lib/python3.10/site-packages/tf_agents/system/default/multiprocessing_core.py", line 77, in handle_main
return app.run(parent_main_fn, *args, **kwargs)
File "/home/a0393510local/.local/lib/python3.10/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/home/a0393510local/.local/lib/python3.10/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "/home/a0393510local/.local/lib/python3.10/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/home/a0393510local/.local/lib/python3.10/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "/home/a0393510local/circuit_training/circuit_training/learning/train_ppo.py", line 134, in main
train_ppo_lib.train(
File "/home/a0393510local/.local/lib/python3.10/site-packages/gin/config.py", line 1605, in gin_wrapper
utils.augment_exception_message_and_reraise(e, err_str)
File "/home/a0393510local/.local/lib/python3.10/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
raise proxy.with_traceback(exception.traceback) from None
File "/home/a0393510local/.local/lib/python3.10/site-packages/gin/config.py", line 1582, in gin_wrapper
return fn(*new_args, **new_kwargs)
File "/home/a0393510local/circuit_training/circuit_training/learning/train_ppo_lib.py", line 272, in train
variable_container = reverb_variable_container.ReverbVariableContainer(
File "/home/a0393510local/.local/lib/python3.10/site-packages/tf_agents/experimental/distributed/reverb_variable_container.py", line 70, in init
server_info = reverb.Client(server_address).server_info()
File "/home/a0393510local/.local/lib/python3.10/site-packages/reverb/client.py", line 500, in server_info
info_proto_strings = self._client.ServerInfo(timeout or 0)
RuntimeError: Socket closed
In call to configurable 'train' (<function train at 0x7fcf2c355480>)
Could anyone suggest me how to resolve it

AttributeError: 'PPOLossInfo' object has no attribute 'clip_fraction'

I have an error after building a docker. Tests were ok. Next I run example script:

... 

python3 -m circuit_training.learning.train_ppo \
  --root_dir=${ROOT_DIR} \
  --replay_buffer_server_address=${REVERB_SERVER} \
  --variable_container_server_address=${REVERB_SERVER} \
  --num_episodes_per_iteration=16 \
  --global_batch_size=64 \
  --netlist_file=${NETLIST_FILE} \
  --init_placement=${INIT_PLACEMENT}
  
  ...

Everything is fail at:

File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/distribute/distribute_lib.py", line 1312, in run
    return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/distribute/distribute_lib.py", line 2888, in call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/distribute/distribute_lib.py", line 3689, in _call_for_each_replica
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 642, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/agents/tf_agent.py", line 336, in train
    loss_info = self._train_fn(
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/utils/common.py", line 188, in with_check_resource_vars
    return fn(*fn_args, **fn_kwargs)
  File "/workspace/circuit_training/learning/agent.py", line 415, in _train
    data=loss_info.extra.clip_fraction,
AttributeError: 'PPOLossInfo' object has no attribute 'clip_fraction'
  In call to configurable 'Learner' (<class 'tf_agents.train.learner.Learner'>)

Probably we need to specify exact tf-agent version?

TF version in docker:

tensorflow==2.9.1
tensorflow-estimator==2.9.0
tensorflow-io-gcs-filesystem==0.26.0
tensorflow-probability==0.17.0
tf-agents==0.13.0

_allow_variable_length_episodes is not an Attribute for CircuittrainingPPOLearner

Hello,

I am trying to run the run Quick Start commands, and after setting everything up on gcloud, when i try running the training script, i am getting the following error:

Attribute errror

I0418 03:03:54.281682 140093922326336 environment.py:240] ***Num node to place***:133
I0418 03:03:54.305619 140093922326336 train_ppo_lib.py:104] Using GRL agent networks.
I0418 03:04:32.939774 140093922326336 common.py:1007] No checkpoint available at ./logs/run_00/111/train/checkpoints
Traceback (most recent call last):
  File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/dragonballz180796/circuit-training/circuit_training/learning/train_ppo.py", line 130, in <module>
    multiprocessing.handle_main(functools.partial(app.run, main))
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tf_agents/system/default/multiprocessing_core.py", line 78, in handle_main
    return app.run(parent_main_fn, *args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.9/dist-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/usr/local/lib/python3.9/dist-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.9/dist-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/home/dragonballz180796/circuit-training/circuit_training/learning/train_ppo.py", line 108, in main
    train_ppo_lib.train(
  File "/home/dragonballz180796/circuit-training/circuit_training/learning/train_ppo_lib.py", line 194, in train
    learner = learner_lib.CircuittrainingPPOLearner(
  File "/home/dragonballz180796/circuit-training/circuit_training/learning/learner.py", line 154, in __init__
    self._create_datasets(strategy)
  File "/home/dragonballz180796/circuit-training/circuit_training/learning/learner.py", line 236, in _create_datasets
    self._train_dataset = make_dataset(0)
  File "/home/dragonballz180796/circuit-training/circuit_training/learning/learner.py", line 228, in make_dataset
    return tf.data.experimental.Counter().flat_map(_make_dataset)
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 2092, in flat_map
    return FlatMapDataset(self, map_func, name=name)
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 5327, in __init__
    self._map_func = structured_function.StructuredFunctionWrapper(
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/data/ops/structured_function.py", line 271, in __init__
    self._function = fn_factory()
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 2567, in get_concrete_function
    graph_function = self._get_concrete_function_garbage_collected(
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 2533, in _get_concrete_function_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 2711, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 2627, in _create_graph_function
    func_graph_module.func_graph_from_py_func(
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/framework/func_graph.py", line 1141, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/data/ops/structured_function.py", line 248, in wrapped_fn
    ret = wrapper_helper(*args)
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/data/ops/structured_function.py", line 177, in wrapper_helper
    ret = autograph.tf_convert(self._func, ag_ctx)(*nested_args)
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/autograph/impl/api.py", line 692, in wrapper
    raise e.ag_error_metadata.to_exception(e)
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/autograph/impl/api.py", line 689, in wrapper
    return converted_call(f, args, kwargs, options=options)
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/autograph/impl/api.py", line 439, in converted_call
    result = converted_f(*effective_args, **kwargs)
  File "/tmp/__autograph_generated_file2z08qsdk.py", line 13, in tf___make_dataset
    train_dataset = ag__.converted_call(ag__.ld(train_dataset).filter, (ag__.ld(_filter_invalid_episodes),), None, fscope)
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/autograph/impl/api.py", line 377, in converted_call
    return _call_unconverted(f, args, kwargs, options)
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/autograph/impl/api.py", line 459, in _call_unconverted
    return f(*args)
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3946, in filter
    return DatasetV1Adapter(super(DatasetV1, self).filter(predicate, name=name))
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 2246, in filter
    return FilterDataset(self, predicate, name=name)
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 5477, in __init__
    wrapped_func = structured_function.StructuredFunctionWrapper(
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/data/ops/structured_function.py", line 271, in __init__
    self._function = fn_factory()
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 2567, in get_concrete_function
    graph_function = self._get_concrete_function_garbage_collected(
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 2533, in _get_concrete_function_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 2711, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 2627, in _create_graph_function
    func_graph_module.func_graph_from_py_func(
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/framework/func_graph.py", line 1141, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/data/ops/structured_function.py", line 248, in wrapped_fn
    ret = wrapper_helper(*args)
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/data/ops/structured_function.py", line 177, in wrapper_helper
    ret = autograph.tf_convert(self._func, ag_ctx)(*nested_args)
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/autograph/impl/api.py", line 692, in wrapper
    raise e.ag_error_metadata.to_exception(e)
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/autograph/impl/api.py", line 689, in wrapper
    return converted_call(f, args, kwargs, options=options)
  File "/home/dragonballz180796/.local/lib/python3.9/site-packages/tensorflow/python/autograph/impl/api.py", line 439, in converted_call
    result = converted_f(*effective_args, **kwargs)
  File "/tmp/__autograph_generated_filetax59fu6.py", line 40, in tf___filter_invalid_episodes
    ag__.if_stmt(ag__.ld(self)._allow_variable_length_episodes, if_body, else_body, get_state, set_state, ('do_return', 'retval_'), 2)
AttributeError: in user code:

    File "/home/dragonballz180796/circuit-training/circuit_training/learning/learner.py", line 184, in _make_dataset  *
        train_dataset = train_dataset.filter(_filter_invalid_episodes)
    File "/home/dragonballz180796/circuit-training/circuit_training/learning/learner.py", line 168, in _filter_invalid_episodes  *
        if self._allow_variable_length_episodes:

    AttributeError: 'CircuittrainingPPOLearner' object has no attribute '_allow_variable_length_episodes'

I can see that the code is not able to fetch information within this internal function : https://github.com/google-research/circuit_training/blob/main/circuit_training/learning/learner.py#L168

Command used:

python3 -m circuit_training.learning.train_ppo --root_dir=${ROOT_DIR} --replay_buffer_server_address=${REVERB_SERVER} --variable_container_server_address=${REVERB_SERVER} --num_episodes_per_iteration=16 --global_batch_size=64 --netlist_file=${NETLIST_FILE} --init_placement=${INIT_PLACEMENT}

I will try to debug from myside, but i wanted to ask if there's a quick fix too.

Thanks,
Mrinal

Documentation for the Netlist Format

We use tensorflow.GraphDef protobuffer format to represent netlists. The documentation for the format is not included in the repo.

Program blocked

Hi there! I ran into some problems when I'm running the project.
I did as the README.md says, and when it was executing this line, it got blocked and never return. How could this happen? I have no idea. Could you give me some advice? Thanks a lot!

# learner.py
loss_info = self._generic_learner.run(self._steps_per_iter,
                                              self._train_iterator)

Missing get_dreamplace_params function

Hi,
I have tried to run DREAMPlace to place the soft macros for the Ariane testcase available in the test_data directory using the following command:

python3 -m circuit_training.dreamplace.dreamplace_main --netlist_file=circuit_training/environment/test_data/ariane/netlist.pb.txt --init_placement=circuit_training/environment/test_data/ariane/initial.plc --output_dir=./output

The following screenshot shows the error message. Error message is AttributeError: module 'dreamplace.Params' has no attribute 'get_dreamplace_params'

I have used the ubuntu_circuit_training docker build.

Also, I have looked into the DREAMPlace Params.py and I do not see any get_dreamplace_params function. So could you please also provide the changes required in DREAMPlace to make it work with CT?

Thanks,
Sayak

Errors using grouper_main.py script

Hello, I just translated my lef/def files and generated a .pb.txt file but I am having issues setting up and running the pre-processing state before I can use CT. The grouper_main.py has multiple issues in terms of importing modules and also with adding a BLOCK_NAME (says to add it inside the pb.txt file but is not stated in the READ.ME tutorial). If at all possible, what kind of environment and dependencies should I install to get this to work or is it ran in the docker here are some example errors that I would get.

(my_venv) vboxuser@HELPME:~/circuit_training$ python3.9 circuit_training/grouping/grouper_main.py --output_dir=$OUTPUT_DIR --netlist_file=$NETLIST_FILE --block_name=$BLOCK_NAME --hmetis_dir=$HMETIS_DIR
Traceback (most recent call last):
  File "/home/vboxuser/circuit_training/circuit_training/grouping/grouper_main.py", line 36, in <module>
    from circuit_training.grouping import grouper
ModuleNotFoundError: No module named 'circuit_training'

whenever I run this inside the docker I no longer get the error but instead get another error stating about adding the block_name inside the pb.txt file :

root@b0b80216d06c:/workspace# python3.9 circuit_training/grouping/grouper_main.py --out
put_dir=$OUTPUT_DIR --netlist_file=$NETLIST_FILE --block_name=$BLOCK_NAME --hmetis_dir=
$HMETIS_DIR --plc_wrapper_main=$PLC_WRAPPER_MAIN
2023-06-16 16:04:44.672398: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-06-16 16:04:44.749733: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-06-16 16:04:44.750115: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-16 16:04:45.543137: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.25.8) or chardet (5.1.0) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
W0616 16:04:46.501832 140120903472960 placement_util.py:231] block_name is not set. Please add the block_name in:
/workspace/eda-ml/Test2/defs/nangate45_gcd_io.pb.txt
or in:
None
Traceback (most recent call last):
  File "/workspace/circuit_training/grouping/grouper_main.py", line 62, in <module>
    app.run(main)
  File "/usr/local/lib/python3.9/dist-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.9/dist-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/workspace/circuit_training/grouping/grouper_main.py", line 53, in main
    grouped_plc, placement_file = grouper.group_stdcells(
  File "/workspace/circuit_training/grouping/grouper.py", line 429, in group_stdcells
    plc = create_placement_cost_fn(netlist_file=netlist_file)
  File "/workspace/circuit_training/environment/placement_util.py", line 236, in create_placement_cost
    plc = plc_client.PlacementCost(netlist_file, macro_macro_x_spacing,
  File "/workspace/circuit_training/environment/plc_client.py", line 68, in __init__
    self.process = subprocess.Popen([str(a) for a in args])
  File "/usr/lib/python3.9/subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.9/subprocess.py", line 1837, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
PermissionError: [Errno 13] Permission denied: '/workspace/eda-ml/plc_wrapper_main'
Exception ignored in: <function PlacementCost.__del__ at 0x7f707048b940>
Traceback (most recent call last):
  File "/workspace/circuit_training/environment/plc_client.py", line 114, in __del__
AttributeError: 'function' object has no attribute 'close'
root@b0b80216d06c:/workspace#

Can someone please point me in the right direction.

Flow with OpenROAD-flow?

Hello,
I wanted to ask if there is an available way to use the OpenROAD-flow scripts and be able to transfer the LEF/DEF contents and use it for Circuit Training. So far I have been trying to use the Translator provided by TILOS but ran into issue in which Macros where not being placed, so it was to be feed into the grouper_main.py code. I would really like this feature, so that I may be able to use it for my project. I would like to know if this is possible, considering that ORFS uses YOSYS (logic synthesis) and MP utilized Cadence iSpartial (physical synthesis).

Could you tell me a function?

How do you get the cost through wirelength and area?
Thank you!

Could you share the plc_wrapper_main

Thanks a lot for sharing such grate project!
I am digging into the sub-modules, could you share some informatiom about the plc? It seems like an EDA tool, could read def/lef, place cells ,handle blockages and so on. Is there any docs for it's api?

How to generate .plc file

Hi,

I have implemented the LEF/DEF converter, but I have some question on how to generate the .plc file.

how do we determine the density cost?
how do we set the wirelength and congestion cost?
how do we set up the parameters about routers?
what is the meaning of smooth factor?

Thank you in advance!

Potential bug in getting blockages

Hi,

I think there is some minor issue with get_blockages_from_comments. Function header allows the filename to be either a string or a list of string, but the actual function definition treats string as a list nonetheless.

If you run placement_util_test.py, you will fail to extract blockage information and get something like this:

This is potentially an issue for create_placement_cost_using_common_arguments since it passes a string to retrieve blockages. This function is also used in grouper.py.

Could you please confirm this? Thanks in advance!

During training, are all macros' orientation always 'N'?

Hello I am seungju kim.

Thanks for sharing this wonderful repo.

It is very helpful for our project.

I have a question about macro's orientation.

I think that after all macros are placed, orientation optimization is performed in order to take account for orientation while training.

I conducted an experiment on the effect of optimization on performance.

I could see that this had a very big impact on the metric such as wire length, congestion and density.

I'd like to know why orientation optimization is not considered during training.

Thanks

CT not working in ubuntu22.04 VM

I have a VM setup for running circuit training and while I was able to get the grouping code to work correctly, when I went to start the e2e_smoke test, after a few minutes, I received an error message about SoftMacroPlacer() timing out.

0724 19:54:09.211993 140645097709568 NonLinearPlace.py:318] iteration  124, ( 124,  0,  0), Obj 2.997260E+03, DensityWeight 1.436298E-06, HPWL 1.045501E+05, Overflow 9.715901E-01, MaxDensity 2.036E+02, gamma 1.922252E+02, time 241.354ms
I0724 19:54:09.215112 140645097709568 NonLinearPlace.py:327] full step 2189.637 ms
Traceback (most recent call last):
  File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/workspace/circuit_training/learning/train_ppo.py", line 375, in <module>
    multiprocessing.handle_main(functools.partial(app.run, main))
  File "/usr/local/lib/python3.9/dist-packages/tf_agents/system/default/multiprocessing_core.py", line 77, in handle_main
    return app.run(parent_main_fn, *args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.9/dist-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/usr/local/lib/python3.9/dist-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.9/dist-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/workspace/circuit_training/learning/train_ppo.py", line 327, in main
    env = create_env_fn()
  File "/workspace/circuit_training/environment/environment.py", line 629, in create_circuit_environment
    env = CircuitEnv(*args, **kwarg)
  File "/usr/local/lib/python3.9/dist-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/usr/local/lib/python3.9/dist-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/usr/local/lib/python3.9/dist-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/workspace/circuit_training/environment/environment.py", line 285, in __init__
    converged = self._dreamplace.place()
  File "/workspace/circuit_training/dreamplace/dreamplace_core.py", line 65, in place
    return decorated_place()
  File "/usr/local/lib/python3.9/dist-packages/timeout_decorator/timeout_decorator.py", line 82, in new_function
    return function(*args, **kwargs)
  File "/workspace/circuit_training/dreamplace/dreamplace_core.py", line 62, in decorated_place
    return self._place()
  File "/workspace/circuit_training/dreamplace/dreamplace_core.py", line 49, in _place
    metrics = nonlinear_place(self.params, self.placedb_plc.placedb)
  File "/dreamplace/dreamplace/NonLinearPlace.py", line 448, in __call__
    one_descent_step(
  File "/dreamplace/dreamplace/NonLinearPlace.py", line 309, in one_descent_step
    optimizer.step()
  File "/usr/local/lib/python3.9/dist-packages/torch/optim/optimizer.py", line 140, in wrapper
    out = func(*args, **kwargs)
  File "/dreamplace/dreamplace/NesterovAcceleratedGradientOptimizer.py", line 121, in step
    f_kp1, g_kp1 = obj_and_grad_fn(v_kp1)
  File "/dreamplace/dreamplace/PlaceObj.py", line 387, in obj_and_grad_fn
    obj = self.obj_fn(pos)
  File "/dreamplace/dreamplace/PlaceObj.py", line 287, in obj_fn
    self.density = self.op_collections.density_op(pos)
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/dreamplace/dreamplace/ops/electric_potential/electric_potential.py", line 521, in forward
    return ElectricPotentialFunction.apply(
  File "/dreamplace/dreamplace/ops/electric_potential/electric_potential.py", line 96, in forward
    density_map = ElectricDensityMapFunction.forward(
  File "/dreamplace/dreamplace/ops/electric_potential/electric_overflow.py", line 98, in forward
    output = electric_potential_cpp.density_map(
  File "/usr/local/lib/python3.9/dist-packages/timeout_decorator/timeout_decorator.py", line 69, in handler
    _raise_exception(timeout_exception, exception_message)
  File "/usr/local/lib/python3.9/dist-packages/timeout_decorator/timeout_decorator.py", line 47, in _raise_exception
    raise exception(exception_message)
timeout_decorator.timeout_decorator.TimeoutError: 'SoftMacroPlacer place() timed out.'
  In call to configurable 'CircuitEnv' (<class 'circuit_training.environment.environment.CircuitEnv'>)
Collect job failed (SIGUSR1). Check /workspace/logs/test/collect_*.log.
Exiting with code 8.

What is the source of x, y coordinates used in the grouping (clustering) library?

Thank you for adding the grouping (clustering) library!
With respect to "merge_small_adj_close_groups" (line 598 of grouping.py) and "breakup_groups" (line 721), could you please clarify how the incoming netlist has node coordinates? Also, could you give some hint about how "distance" and "threshold" parameters are determined, for example in the Ariane example?
Thank you!

google-research / circuit_training Goto Github PK

circuit_training's Introduction

Google Research

circuit_training's People

Contributors

Stargazers

Watchers

Forkers

circuit_training's Issues

Environment

Code Used

Code Error given

Log Files

Thoughts

Recommend Projects

Recommend Topics

Recommend Org

Jobs