GithubHelp home page GithubHelp logo

chriscummins / programl Goto Github PK

View Code? Open in Web Editor NEW
286.0 10.0 60.0 52.33 MB

A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations

License: Other

Python 27.96% C 0.33% C++ 38.72% Shell 2.24% Starlark 26.77% LLVM 2.45% Makefile 1.32% Dockerfile 0.22%
graph-representation programming-languages llvm control-flow data-flow compiler-irs llvm-ir machine-learning graph-neural-networks

programl's Introduction

ProGraML: Program Graphs for Machine Learning

PyPI version PyPi Downloads License

An expressive, language-independent representation of programs.

Check the website for more information.

Introduction

ProGraML is a representation for programs as input to a machine learning model. The key features are:

  1. Simple: Everything is available through a pip install, no compilation required. Supports several programming languages (C, C++, LLVM-IR, XLA) and several graph formats (NetworkX, DGL, Graphviz, JSON) out of the box.

  2. Expressive: Captures every control, data, and call relation across entire programs. The representation is independent of the source language. Features and labels can be added at any granularity to support whole-program, per-instruction, or per-relation reasoning tasks.

  3. Fast: The core graph construction is implemented in C++ with a low overhead interface to Python. Every API method supports simple and efficient parallelization through an executor parameter.

To get stuck in and play around with our graph representation, visit:

Or if papers are more your ☕, have a read of ours:

Supported Programming Languages

The following programming languages and compiler IRs are supported out-of-the-box:

Language API Calls Supported Versions
C programl.from_cpp(), programl.from_clang() Up to ISO C 2017
C++ programl.from_cpp(), programl.from_clang() Up to ISO C++ 2020 DIS
LLVM-IR programl.from_llvm_ir() 3.8.0, 6.0.0, 10.0.0
XLA programl.from_xla_hlo_proto() 2.0.0

Is your favorite language not supported here? Submit a feature request!

Getting Started

Install the latest release of the Python package using:

pip install -U programl

The API is very simple, comprising graph creation ops, graph transform ops, and graph serialization ops. Here is a quick demo of each:

>>> import programl as pg

# Construct a program graph from C++:
>>> G = pg.from_cpp("""
... #include <iostream>
...
... int main(int argc, char** argv) {
...   std::cout << "Hello, world!" << std::endl;
...   return 0;
... }
... """)

# A program graph is a protocol buffer:
>>> type(G).__name__
'ProgramGraph'

# Convert the graph to NetworkX:
>>> pg.to_networkx(G)
<networkx.classes.multidigraph.MultiDiGraph at 0x7fbcf40a2fa0>

# Save the graph for later:
>>> pg.save_graphs('file.data', [G])

For further details check out the API reference.

Contributing

Patches, bug reports, feature requests are welcome! Please use the issue tracker to file a bug report or question. If you would like to help out with the code, please read this document.

Citation

If you use ProGraML in any of your work, please cite this paper:

@inproceedings{cummins2021a,
  title={{ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations}},
  author={Cummins, Chris and Fisches, Zacharias and Ben-Nun, Tal and Hoefler, Torsten and O'Boyle, Michael and Leather, Hugh},
  booktitle = {Thirty-eighth International Conference on Machine Learning (ICML)},
  year={2021}
}

programl's People

Contributors

chriscummins avatar jsmeredith avatar kspaff avatar rothpc avatar volpepe avatar xshaun avatar zacharias030 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

programl's Issues

Add support for test sharding

The current test suite is far from comprehensive, yet still requires about an hour to run when there aren’t any cached results to re-use. Much of this time is spent in long running integration tests which use parametrised test fixtures to run a small-ish test case with dozens of permutations of parameters.

This has the downside of slowing down the iterative devel/debug cycle. To mitigate this we could use test sharding to run parts of the larger tests concurrently.

Bazel has support for test sharding built in, the hard part would be determining how to integrate that into pytest.

Once done, we could use a Travis build matrix to use the sharding, enabling a greater subset of the test suite to be run, see #45.

Add modules for generating random data

  • //deeplearning/ml4pl/testing:random_programl_generator
  • //deeplearning/ml4pl/testing:random_networkx_generator
  • //deeplearning/ml4pl/testing:random_graph_tuple_generator
  • //deeplearning/ml4pl/testing:random_graph_tuple_database_generator
  • //deeplearning/ml4pl/testing:random_log_database_generator
  • Refactor existing tests to use these functions.
  • Add "real" graph tests to existing test suites.

Remove back edges from labelled graph tuples.

For each of the labelled graph datasets, remove the back edges from graph tuples, and instead create the back edges when required by models.

This is to reduce the size of the datasets by trading off increased work vs network bandwidth.

Refactor model class

  • Define the new base model class.
  • Update the logging API to write model batch/epoch results and checkpoints.
  • Refactor Zero-R to the new class style.
  • Refactor GGNN to the new class style.
  • Refactor LSTM to the new class style.
  • Add integration tests for all model classes using random graph tuple database generator.

KeyError in ggnn_test.py

Reproduce:
bazel test //deeplearning/ml4pl/models/ggnn/...

Error:

      # Test that model saw every graph in the database.
>     assert results.graph_count == graph_db.split_counts[epoch_type.value]
E     KeyError: 0

/home/zacharias/ml4pl/deeplearning/ml4pl/models/ggnn/ggnn_test.py:174: KeyError

Reduce epoch "warm-up" time

The gap between starting an epoch and receiving the first batch can be quite long for large datasets. I suspect that this is due to the BufferedGraphReader first reading the IDs and sizes of all graphs in the table. For cases where limit is a lot smaller than the size of the table, we can reduce this latency by inserting an offset into the SQL query, rather than reading the entire results set and discarding them:

      # When we are limiting the number of rows and not reading the table in
      # order, pick a random starting point in the list of IDs.
      if limit and order != BufferedGraphReaderOrder.IN_ORDER:
        batch_start = random.randint(
          0, max(len(self.ids_and_sizes) - limit - 1, 0)
        )
        self.ids_and_sizes = self.ids_and_sizes[
          batch_start : batch_start + limit
        ]

Memory grows over time during run script

I'm not sure if this is specific to the GGNN or applies to all classifiers, but memory consumption of long running jobs grows steadily before being killed by the OS when it reaches system capacity.

EDIT: This not specific to the GGNN, see my comments below.

graph_builder.py not working yet

the nxgraph generated doesn't have a type on its nodes, so construction fails with /phd/deeplearning/ml4pl/graphs/nx_utils.py", line 38, in NodeTypeIterator if data["type"] == node_type: KeyError: 'type'

Full trace below:

Traceback (most recent call last):
  File "/private/var/tmp/_bazel_Zacharias/56319df27ced911066fd99c97e9dce78/execroot/phd/bazel-out/darwin-fastbuild/bin/deeplearning/ml4pl/graphs/unlabelled/llvm2graph/llvm2graph.runfiles/phd/deeplearning/ml4pl/graphs/unlabelled/llvm2graph/llvm2graph.py", line 274, in <module>
    app.Run(main)
  File "/private/var/tmp/_bazel_Zacharias/56319df27ced911066fd99c97e9dce78/execroot/phd/bazel-out/darwin-fastbuild/bin/deeplearning/ml4pl/graphs/unlabelled/llvm2graph/llvm2graph.runfiles/phd/labm8/py/app.py", line 168, in Run
    RunWithArgs(RunWithoutArgs)
  File "/private/var/tmp/_bazel_Zacharias/56319df27ced911066fd99c97e9dce78/execroot/phd/bazel-out/darwin-fastbuild/bin/deeplearning/ml4pl/graphs/unlabelled/llvm2graph/llvm2graph.runfiles/phd/labm8/py/app.py", line 144, in RunWithArgs
    absl_app.run(DoMain, argv=argv)
  File "/private/var/tmp/_bazel_Zacharias/56319df27ced911066fd99c97e9dce78/execroot/phd/bazel-out/darwin-fastbuild/bin/deeplearning/ml4pl/graphs/unlabelled/llvm2graph/llvm2graph.runfiles/pypi__absl_py_0_7_0/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/private/var/tmp/_bazel_Zacharias/56319df27ced911066fd99c97e9dce78/execroot/phd/bazel-out/darwin-fastbuild/bin/deeplearning/ml4pl/graphs/unlabelled/llvm2graph/llvm2graph.runfiles/pypi__absl_py_0_7_0/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/private/var/tmp/_bazel_Zacharias/56319df27ced911066fd99c97e9dce78/execroot/phd/bazel-out/darwin-fastbuild/bin/deeplearning/ml4pl/graphs/unlabelled/llvm2graph/llvm2graph.runfiles/phd/labm8/py/app.py", line 141, in DoMain
    main(argv)
  File "/private/var/tmp/_bazel_Zacharias/56319df27ced911066fd99c97e9dce78/execroot/phd/bazel-out/darwin-fastbuild/bin/deeplearning/ml4pl/graphs/unlabelled/llvm2graph/llvm2graph.runfiles/phd/labm8/py/app.py", line 166, in RunWithoutArgs
    main()
  File "/private/var/tmp/_bazel_Zacharias/56319df27ced911066fd99c97e9dce78/execroot/phd/bazel-out/darwin-fastbuild/bin/deeplearning/ml4pl/graphs/unlabelled/llvm2graph/llvm2graph.runfiles/phd/deeplearning/ml4pl/graphs/unlabelled/llvm2graph/llvm2graph.py", line 269, in main
    graph_proto = builder.Build(bytecode, opt)
  File "/private/var/tmp/_bazel_Zacharias/56319df27ced911066fd99c97e9dce78/execroot/phd/bazel-out/darwin-fastbuild/bin/deeplearning/ml4pl/graphs/unlabelled/llvm2graph/llvm2graph.runfiles/phd/deeplearning/ml4pl/graphs/unlabelled/llvm2graph/graph_builder.py", line 92, in Build
    graphs = [self.CreateControlAndDataFlowUnion(cfg) for cfg in cfgs]
  File "/private/var/tmp/_bazel_Zacharias/56319df27ced911066fd99c97e9dce78/execroot/phd/bazel-out/darwin-fastbuild/bin/deeplearning/ml4pl/graphs/unlabelled/llvm2graph/llvm2graph.runfiles/phd/deeplearning/ml4pl/graphs/unlabelled/llvm2graph/graph_builder.py", line 92, in <listcomp>
    graphs = [self.CreateControlAndDataFlowUnion(cfg) for cfg in cfgs]
  File "/private/var/tmp/_bazel_Zacharias/56319df27ced911066fd99c97e9dce78/execroot/phd/bazel-out/darwin-fastbuild/bin/deeplearning/ml4pl/graphs/unlabelled/llvm2graph/llvm2graph.runfiles/phd/deeplearning/ml4pl/graphs/unlabelled/llvm2graph/graph_builder.py", line 132, in CreateControlAndDataFlowUnion
    self.MaybeAddDataFlowElements(g, ffg.tag_hook)
  File "/private/var/tmp/_bazel_Zacharias/56319df27ced911066fd99c97e9dce78/execroot/phd/bazel-out/darwin-fastbuild/bin/deeplearning/ml4pl/graphs/unlabelled/llvm2graph/llvm2graph.runfiles/phd/deeplearning/ml4pl/graphs/unlabelled/llvm2graph/graph_builder.py", line 179, in MaybeAddDataFlowElements
    for statement, data in nx_utils.StatementNodeIterator(g):
  File "/private/var/tmp/_bazel_Zacharias/56319df27ced911066fd99c97e9dce78/execroot/phd/bazel-out/darwin-fastbuild/bin/deeplearning/ml4pl/graphs/unlabelled/llvm2graph/llvm2graph.runfiles/phd/deeplearning/ml4pl/graphs/nx_utils.py", line 44, in StatementNodeIterator
    yield from NodeTypeIterator(g, programl_pb2.Node.STATEMENT)
  File "/private/var/tmp/_bazel_Zacharias/56319df27ced911066fd99c97e9dce78/execroot/phd/bazel-out/darwin-fastbuild/bin/deeplearning/ml4pl/graphs/unlabelled/llvm2graph/llvm2graph.runfiles/phd/deeplearning/ml4pl/graphs/nx_utils.py", line 38, in NodeTypeIterator
    if data["type"] == node_type:
KeyError: 'type'

GGNN expects >1 graphs in batch when graph_x_dimensionality > 0

Just as we finished the call :) the CPU-only devmap run failed:

$ bazel run //deeplearning/ml4pl/models/ggnn -- --graph_db='file:///var/phd/db/cc1.mysql?programl_devmap_amd' --log_db='file:///var/phd/db/cc1.mysql?programl_scratch_logs' --graph_batch_node_count=10000 --vmodule='*'=3
...

I1211 18:18:53 progress.py:92] Batch 48 with 16 graphs: accuracy=62.50%, precision=0.625, recall=0.625, f1=0.625 in 2s 465ms
I1211 18:18:54 progress.py:92] Batch 49 with 10 graphs: accuracy=30.00%, precision=0.090, recall=0.300, f1=0.138 in 1s 187ms
I1211 18:18:56 progress.py:92] Batch 50 with 8 graphs: accuracy=25.00%, precision=0.375, recall=0.250, f1=0.300 in 2s 536ms
Train epoch 2:  83%|███████████████████████████████████████████████████████████████████████████████████                 | 452/544 [02:00<00:24,  3.76 graph/s, acc=0.589, loss=1.21, prec=0.434, rec=0.589]Exception in thread Thread-12:                                                                                                                                     | 1/300 [04:57<24:43:24, 297.67s/ epoch]
Traceback (most recent call last):
  File "/home/linuxbrew/.linuxbrew/Cellar/python/3.6.5/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/home/cec/.cache/bazel/_bazel_cec/d1665aef25bbeeb91c01df7ddc90dba7/execroot/phd/bazel-out/k8-fastbuild/bin/deeplearning/ml4pl/models/ggnn/ggnn.runfiles/phd/deeplearning/ml4pl/models/classifier_base.py", line 407, in Run
    batch_results = self.model.RunBatch(self.epoch_type, batch)
  File "/home/cec/.cache/bazel/_bazel_cec/d1665aef25bbeeb91c01df7ddc90dba7/execroot/phd/bazel-out/k8-fastbuild/bin/deeplearning/ml4pl/models/ggnn/ggnn.runfiles/phd/deeplearning/ml4pl/models/ggnn/ggnn.py", line 329, in RunBatch
    outputs = self.model(*model_inputs)
  File "/home/cec/.cache/bazel/_bazel_cec/d1665aef25bbeeb91c01df7ddc90dba7/execroot/phd/bazel-out/k8-fastbuild/bin/deeplearning/ml4pl/models/ggnn/ggnn.runfiles/pypi__torch_1_3_1/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/cec/.cache/bazel/_bazel_cec/d1665aef25bbeeb91c01df7ddc90dba7/execroot/phd/bazel-out/k8-fastbuild/bin/deeplearning/ml4pl/models/ggnn/ggnn.runfiles/phd/deeplearning/ml4pl/models/ggnn/ggnn_modules.py", line 69, in forward
    prediction, num_graphs, graph_nodes_list, aux_in
  File "/home/cec/.cache/bazel/_bazel_cec/d1665aef25bbeeb91c01df7ddc90dba7/execroot/phd/bazel-out/k8-fastbuild/bin/deeplearning/ml4pl/models/ggnn/ggnn.runfiles/pypi__torch_1_3_1/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/cec/.cache/bazel/_bazel_cec/d1665aef25bbeeb91c01df7ddc90dba7/execroot/phd/bazel-out/k8-fastbuild/bin/deeplearning/ml4pl/models/ggnn/ggnn.runfiles/phd/deeplearning/ml4pl/models/ggnn/ggnn_modules.py", line 471, in forward
    return self.feed_forward(aggregate_features), graph_features
  File "/home/cec/.cache/bazel/_bazel_cec/d1665aef25bbeeb91c01df7ddc90dba7/execroot/phd/bazel-out/k8-fastbuild/bin/deeplearning/ml4pl/models/ggnn/ggnn.runfiles/pypi__torch_1_3_1/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/cec/.cache/bazel/_bazel_cec/d1665aef25bbeeb91c01df7ddc90dba7/execroot/phd/bazel-out/k8-fastbuild/bin/deeplearning/ml4pl/models/ggnn/ggnn.runfiles/pypi__torch_1_3_1/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/cec/.cache/bazel/_bazel_cec/d1665aef25bbeeb91c01df7ddc90dba7/execroot/phd/bazel-out/k8-fastbuild/bin/deeplearning/ml4pl/models/ggnn/ggnn.runfiles/pypi__torch_1_3_1/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/cec/.cache/bazel/_bazel_cec/d1665aef25bbeeb91c01df7ddc90dba7/execroot/phd/bazel-out/k8-fastbuild/bin/deeplearning/ml4pl/models/ggnn/ggnn.runfiles/pypi__torch_1_3_1/torch/nn/modules/batchnorm.py", line 81, in forward
    exponential_average_factor, self.eps)
  File "/home/cec/.cache/bazel/_bazel_cec/d1665aef25bbeeb91c01df7ddc90dba7/execroot/phd/bazel-out/k8-fastbuild/bin/deeplearning/ml4pl/models/ggnn/ggnn.runfiles/pypi__torch_1_3_1/torch/nn/functional.py", line 1666, in batch_norm
    raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 4])

See also #27.

100% pass rate on test suite

A test suite is only useful when the results can be trusted, and presently, mid-way through a large refactor, many of the tests are broken.

Support n-dimensional embedding indices in GGNN

Rather than a single embedding table for {node selector/statement representation}, add support for arbitrary embedding dimensionalities.

This isn't a priority at the moment as we don't have a use-case for it.

UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape

In GGNN model:

/home/cec/phd/tools/venv/phd/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "

This appears to be related to the number of timesteps we unroll for.

Make LLVM alias set parsing more robust

Many of the opt pointer lists can't be parsed, e.g.

(float addrspace(1)** %6, 8), (float addrspace(1)** %7, 8), (float addrspace(1)* %14, 4), (float addrspace(1)* %19, 4), (float addrspace(1)* %25, 4), (float addrspace(1)* %30, 4), (float addrspace(1)* %37, 4), (float addrspace(1)* %42, 4), (float addrspace(1)* %49, 4), (float addrspace(1)* %54, 4), (float addrspace(1)* %61, 4), (float addrspace(1)* %66, 4), (float addrspace(1)* %73, 4), (float addrspace(1)* %78, 4), (float addrspace(1)* %85, 4), (float addrspace(1)* %90, 4), (float addrspace(1)* %97, 4), (float addrspace(1)* %102, 4), (float addrspace(1)* %109, 4), (float addrspace(1)* %114, 4), (float addrspace(1)* %121, 4), (float addrspace(1)* %126, 4), (float addrspace(1)* %133, 4), (float addrspace(1)* %138, 4), (float addrspace(1)* %145, 4), (float addrspace(1)* %150, 4), (float addrspace(1)* %157, 4), (float addrspace(1)* %162, 4), (float addrspace(1)* %169, 4), (float addrspace(1)* %174, 4), (float addrspace(1)* %181, 4), (float addrspace(1)* %186, 4), (float addrspace(1)* %193, 4), (float addrspace(1)* %198, 4), (float addrspace(1)* %205, 4), (float addrspace(1)* %210, 4), (float addrspace(1)* %217, 4), (float addrspace(1)* %222, 4), (float addrspace(1)* %229, 4), (float addrspace(1)* %234, 4), (float addrspace(1)* %241, 4), (float addrspace(1)* %246, 4), (float addrspace(1)* %253, 4), (float addrspace(1)* %258, 4), (float addrspace(1)* %265, 4), (float addrspace(1)* %270, 4), (float addrspace(1)* %277, 4), (float addrspace(1)* %282, 4), (float addrspace(1)* %289, 4), (float addrspace(1)* %294, 4), (float addrspace(1)* %301, 4), (float addrspace(1)* %306, 4), (float addrspace(1)* %313, 4), (float addrspace(1)* %318, 4), (float addrspace(1)* %325, 4), (float addrspace(1)* %330, 4), (float addrspace(1)* %337, 4), (float addrspace(1)* %342, 4), (float addrspace(1)* %349, 4), (float addrspace(1)* %354, 4), (float addrspace(1)* %361, 4), (float addrspace(1)* %366, 4), (float addrspace(1)* %373, 4), (float addrspace(1)* %378, 4), (float addrspace(1)* %385, 4), (float addrspace(1)* %390, 4), (float addrspace(1)* %397, 4), (float addrspace(1)* %402, 4), (float addrspace(1)* %409, 4), (float addrspace(1)* %414, 4), (float addrspace(1)* %421, 4), (float addrspace(1)* %426, 4), (float addrspace(1)* %433, 4), (float addrspace(1)* %438, 4), (float addrspace(1)* %445, 4), (float addrspace(1)* %450, 4), (float addrspace(1)* %457, 4), (float addrspace(1)* %462, 4), (float addrspace(1)* %469, 4), (float addrspace(1)* %474, 4), (float addrspace(1)* %481, 4), (float addrspace(1)* %486, 4), (float addrspace(1)* %493, 4), (float addrspace(1)* %498, 4), (float addrspace(1)* %505, 4), (float addrspace(1)* %510, 4), (float addrspace(1)* %517, 4), (float addrspace(1)* %522, 4), (float addrspace(1)* %529, 4), (float addrspace(1)* %534, 4), (float addrspace(1)* %541, 4), (float addrspace(1)* %546, 4), (float addrspace(1)* %553, 4), (float addrspace(1)* %558, 4), (float addrspace(1)* %565, 4), (float addrspace(1)* %570, 4), (float addrspace(1)* %577, 4), (float addrspace(1)* %582, 4), (float addrspace(1)* %589, 4), (float addrspace(1)* %594, 4), (float addrspace(1)* %601, 4), (float addrspace(1)* %606, 4), (float addrspace(1)* %613, 4), (float addrspace(1)* %618, 4), (float addrspace(1)* %625, 4), (float addrspace(1)* %630, 4), (float addrspace(1)* %637, 4), (float addrspace(1)* %642, 4), (float addrspace(1)* %649, 4), (float addrspace(1)* %655, 4), (float addrspace(1)* %659, 4), (float addrspace(1)* %664, 4), (float addrspace(1)* %670, 4), (float addrspace(1)* %675, 4), (float addrspace(1)* %682, 4), (float addrspace(1)* %687, 4), (float addrspace(1)* %694, 4), (float addrspace(1)* %699, 4), (float addrspace(1)* %706, 4), (float addrspace(1)* %711, 4), (float addrspace(1)* %718, 4), (float addrspace(1)* %723, 4), (float addrspace(1)* %730, 4), (float addrspace(1)* %735, 4), (float addrspace(1)* %742, 4), (float addrspace(1)* %747, 4), (float addrspace(1)* %754, 4), (float addrspace(1)* %759, 4), (float addrspace(1)* %766, 4), (float addrspace(1)* %771, 4), (float addrspace(1)* %778, 4), (float addrspace(1)* %783, 4), (float addrspace(1)* %790, 4), (float addrspace(1)* %795, 4), (float addrspace(1)* %802, 4), (float addrspace(1)* %807, 4), (float addrspace(1)* %814, 4), (float addrspace(1)* %819, 4), (float addrspace(1)* %826, 4), (float addrspace(1)* %831, 4), (float addrspace(1)* %838, 4), (float addrspace(1)* %843, 4), (float addrspace(1)* %850, 4), (float addrspace(1)* %855, 4), (float addrspace(1)* %862, 4), (float addrspace(1)* %867, 4), (float addrspace(1)* %874, 4), (float addrspace(1)* %880, 4)' (alias_set.py:95:MakeAliasSetGraphs() -> ValueError)

Note: This replaces github.com/ChrisCummins/ml4pl/issues/14

Tidy up run ID.

Consider a human-friendly name, and fix race condition with multiple jobs starting at the same time.

GGNN loss is NaN

Command to reproduce:

$ bazel run //deeplearning/ml4pl/experiments/devmap:run_models -- --model ggnn --dataset amd --tag_suffix=test

Re-implement GGNN in PyTorch

Tracking issue for re-implementing the GGNN using pytorch.

This issue will be closed once the model achieves feature parity with the previous Tensorflow implementation:

  • Forward pass
  • Support for edge positions
  • Support for node embeddings
  • Support for dynamic unrolling strategies
  • Support for graph and node-level classification
  • Support for saving and loading from checkpoints

See also #24.

Get Travis CI test runner working

Run a subset of the test suite on Travis CI using the phd_build docker image. Some thought must go into deciding which tests to run, as many of them (e.g. GGNN integration tests) are too heavy to run in the 30-minute window provided to a Travis test job.

Using batch.graph_count to detect end-of-batches is fragile

There are legitimate reasons for a batch generator to produce an empty batch before reaching the end of the input graph iterator. However, we use the batch.graph_count to determine when we have reached the end of the batches:

  def Run(self) -> None:
    """Run the epoch worker thread."""
    rolling_results = batches.RollingResults()

    for i, batch in enumerate(self.batch_iterator.batches):
      self.batch_count += 1
      self.ctx.i += batch.graph_count

      # Record the graph IDs.
      for graph_id in batch.graph_ids:
        self.graph_ids.add(graph_id)

      # Check that at least one batch is produced.
      if not i and not batch.graph_count:
        raise OSError("No batches generated!")

      # We have run out of graphs.
      if not batch.graph_count:
        break

This causes the epoch to exit early, before having seen all batches.

Run IDs creation is not truly atomic

The critical section of run ID assignment contains race conditions when multiple processes are running concurrently on a single machine. This manifests itself as failing tests for parameterized model tests.

Create an id_split database

Replace the old GraphMeta.group column with a separate database for storing ID->split mappings. This will remove the need for separate corpus/devmap graph databases, and can be re-used for any database with a numeric ID field.

Edge position modelling is flawed.

related to a part of #27.
This is the problem:

our position embeddings are useless in their current form, I think:
imagine you have 2 incoming edges with position embeddings p_1 and p_2 and states coming across these edges h_1, h_2 all of the same edge type. An example could be c = a / b
Then the incoming message m has this lousy property (w/ A being a parameter matrix) :

m = A (p_1 + h_1) + A (p_2 + h_2) = A (p_1 + h_2) + A (p_2 + h_1) 
by associativity and distributivity

meaning we can not distinguish between a/b and b/a like this.

New labelled graph representations

  • Define the new graph tuple representation.
  • Create a new graph tuple database.
  • Port graph annotators to the new graph tuple representation.
  • Write a new script to populate graph tuple databases with analyses.
  • Create new analysis datasets.
  • Create new devmap dataset.

There are five distinct representations we have considered so far:

  1. ProGraML Graph (Graph Tuple).
  2. DeepTune-source (sequence).
  3. DeepTune-ir (sequence).
  4. DeepTune-inst2vec (sequence).
  5. XFG (Graph Tuple).

Refactor CDFG construction to generate ProGraML protos.

  • Copy //deeplearning/ml4pl/graphs/unlabelled/cdfg:control_and_data_flow_graph to //deeplearning/ml4pl/graphs/unlabelled/llvm2graph:graph_builder and refactor to return a ProGraML proto rather than a networkx graph.
  • Add a //deeplearning/ml4pl/graphs/unlabelled/llvm2graph binary to generate protos from the command line.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.