google / gematria Goto Github PK

Machine learning for machine code.

License: Apache License 2.0

Starlark 5.41% Python 47.50% C++ 43.38% Dockerfile 0.03% CMake 0.13% Assembly 3.12% C 0.12% Shell 0.31%

compiler machine-code machine-learning performance-analysis

gematria's Issues

Handle additional syscall cases/terminator instructions

There are other ways to make syscalls on X86, including int 0x80, and sysenter that we should probably also be sanitizing for security reasons.

We also need to make sure that we're taking care of all terminator instructions. We seem to be, but changing to MCInstrDesc::isTerminator probably makes a lot of sense and avoids us needing to manually handle the cases.

[Abseil C++] Remove dependencies on Abseil logging from the C++ code

Title says it all. Reduce dependencies on Abseil libraries from the C++ code.

Parallelize memory annotations

The current script in ./gematria/datasets/convert_bhive_to_exegesis_inputs.cc runs sequentially. This is somewhat of a problem for using the Exegesis annotator, which isn't particularly fast. This can easily be parallelized as we don't care about the timings at all while running the annotations. This should be doable with some refactoring and use of LLVM's threading APIs.

Annotator running out of processes

After a while (maybe about ~1000 blocks from my testing), the annotator begins to fail on every block with the following message:

Failed to find addresses for block '488B442410488B7808837C240C00': INTERNAL: Failed to create child process: Resource temporarily unavailable
Block disassembly:
                movq    16(%rsp), %rax
                movq    8(%rax), %rdi
                cmpl    $0, 12(%rsp)

This is presumably because the underlying exegesis code is keeping processes around (although I have yet to confirm that hypothesis). More debugging is needed.

GRANITE Performance model with context

As of now, the GRANITE model is basic-block oriented (resp. trace-oriented), i.e. it doesn't use any information about code that was executed before or after the basic block. We believe that adding such information may provide additional context to improve the prediction of the precisions.

The GRANITE model can be extended to cover such context by:

adding code graph from the preceding basic block (in execution order) and the following basic block,
modifying the predictor to compute throughput only from the instructions in the input basic block.

This modification will also require an extension to the data collection methodology to collect basic blocks and their throughput with the execution context.

Error when training model with rep mov instruction

Traceback:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1379, in _do_call
    return fn(*args)
           ^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1362, in _run_fn
    return self._call_tf_sessionrun(options, feed_dict, fetch_list,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1455, in _call_tf_sessionrun
    return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0] = 8 is not in [0, 8)
         [[{{node encoder_1/edge_model/embed/embedding_lookup}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/granite/python/run_granite_model.py", line 109, in <module>
    app.run(main)
  File "/usr/local/lib/python3.11/dist-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.11/dist-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
             ^^^^^^^^^^
  File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/granite/python/run_granite_model.py", line 48, in main
    main_function.run_gematria_model_from_command_line_flags(
  File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/model/python/main_function.py", line 871, in run_gematria_model_from_command_line_flags
    model.train(
  File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/model/python/model_base.py", line 1535, in train
    stats = run_one_epoch()
            ^^^^^^^^^^^^^^^
  File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/model/python/model_base.py", line 1500, in run_one_epoch
    return self.train_mini_batch(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/model/python/model_base.py", line 1628, in train_mini_batch
    return self.train_batch(sess, train_schedule)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/model/python/model_base.py", line 1590, in train_batch
    (_, stats) = sess.run((self._train_step, stats_ops), feed_dict=schedule)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/training/monitored_session.py", line 778, in run
    return self._sess.run(
           ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/training/monitored_session.py", line 1307, in run
    return self._sess.run(
           ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/training/monitored_session.py", line 1397, in run
    return self._sess.run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/training/monitored_session.py", line 1464, in run
    outputs = _WrappedSession.run(
              ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/training/monitored_session.py", line 1228, in run
    return self._sess.run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 969, in run
    result = self._run(None, fetches, feed_dict, options_ptr,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1192, in _run
    results = self._do_run(handle, final_targets, final_fetches,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1372, in _do_run
    return self._do_call(_run_fn, feeds, fetches, targets, options,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1398, in _do_call
    raise type(e)(node_def, op, message)  # pylint: disable=no-value-for-parameter
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:

Detected at node 'encoder_1/edge_model/embed/embedding_lookup' defined at (most recent call last):
    File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/granite/python/run_granite_model.py", line 109, in <module>
      app.run(main)
    File "/usr/local/lib/python3.11/dist-packages/absl/app.py", line 308, in run
      _run_main(main, args)
    File "/usr/local/lib/python3.11/dist-packages/absl/app.py", line 254, in _run_main
      sys.exit(main(argv))
    File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/granite/python/run_granite_model.py", line 48, in main
      main_function.run_gematria_model_from_command_line_flags(
    File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/model/python/main_function.py", line 803, in run_gematria_model_from_command_line_flags
      model.initialize()
    File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/model/python/model_base.py", line 391, in initialize
      self._create_tf_graph()
    File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/granite/python/graph_builder_model_base.py", line 170, in _create_tf_graph
      super()._create_tf_graph()
    File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/model/python/token_model.py", line 200, in _create_tf_graph
      super()._create_tf_graph()
    File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/granite/python/gnn_model_base.py", line 238, in _create_tf_graph
      self._graphs_tuple_outputs = self._create_graph_network()
    File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/granite/python/gnn_model_base.py", line 353, in _create_graph_network
      graphs_tuple = layer.module(graphs_tuple)
    File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/sonnet_repo/sonnet/python/modules/base.py", line 397, in __call__
      return self._call(*args, **kwargs)
    File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/sonnet_repo/sonnet/python/modules/base.py", line 419, in _call
      outputs, subgraph_name_scope = self._template(*args, **kwargs)
    File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/sonnet_repo/sonnet/python/modules/base.py", line 227, in _build_wrapper
      output = self._build(*args, **kwargs)
    File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/graph_nets_repo/graph_nets/modules.py", line 409, in _build
      edges=self._edge_model(graph.edges, **edge_model_kwargs),
    File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/sonnet_repo/sonnet/python/modules/base.py", line 397, in __call__
      return self._call(*args, **kwargs)
    File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/sonnet_repo/sonnet/python/modules/base.py", line 419, in _call
      outputs, subgraph_name_scope = self._template(*args, **kwargs)
    File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/sonnet_repo/sonnet/python/modules/base.py", line 227, in _build_wrapper
      output = self._build(*args, **kwargs)
    File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/graph_nets_repo/graph_nets/_base.py", line 112, in _build
      return self._model(*args, **kwargs)
    File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/sonnet_repo/sonnet/python/modules/base.py", line 397, in __call__
      return self._call(*args, **kwargs)
    File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/sonnet_repo/sonnet/python/modules/base.py", line 419, in _call
      outputs, subgraph_name_scope = self._template(*args, **kwargs)
    File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/sonnet_repo/sonnet/python/modules/base.py", line 227, in _build_wrapper
      output = self._build(*args, **kwargs)
    File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/sonnet_repo/sonnet/python/modules/embed.py", line 182, in _build
      return tf.nn.embedding_lookup(embeddings, ids, name="embedding_lookup")
Node: 'encoder_1/edge_model/embed/embedding_lookup'
indices[0] = 8 is not in [0, 8)
         [[{{node encoder_1/edge_model/embed/embedding_lookup}}]]

Original stack trace for 'encoder_1/edge_model/embed/embedding_lookup':
  File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/granite/python/run_granite_model.py", line 109, in <module>
    app.run(main)
  File "/usr/local/lib/python3.11/dist-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.11/dist-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/granite/python/run_granite_model.py", line 48, in main
    main_function.run_gematria_model_from_command_line_flags(
  File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/model/python/main_function.py", line 803, in run_gematria_model_from_command_line_flags
    model.initialize()
  File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/model/python/model_base.py", line 391, in initialize
    self._create_tf_graph()
  File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/granite/python/graph_builder_model_base.py", line 170, in _create_tf_graph
    super()._create_tf_graph()
  File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/model/python/token_model.py", line 200, in _create_tf_graph
    super()._create_tf_graph()
  File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/granite/python/gnn_model_base.py", line 238, in _create_tf_graph
    self._graphs_tuple_outputs = self._create_graph_network()
  File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/granite/python/gnn_model_base.py", line 353, in _create_graph_network
    graphs_tuple = layer.module(graphs_tuple)
  File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/sonnet_repo/sonnet/python/modules/base.py", line 397, in __call__
    return self._call(*args, **kwargs)
  File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/sonnet_repo/sonnet/python/modules/base.py", line 419, in _call
    outputs, subgraph_name_scope = self._template(*args, **kwargs)
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/template.py", line 398, in __call__
    return self._call_func(args, kwargs)
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/template.py", line 368, in _call_func
    result = self._func(*args, **kwargs)
  File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/sonnet_repo/sonnet/python/modules/base.py", line 227, in _build_wrapper
    output = self._build(*args, **kwargs)
  File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/graph_nets_repo/graph_nets/modules.py", line 409, in _build
    edges=self._edge_model(graph.edges, **edge_model_kwargs),
  File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/sonnet_repo/sonnet/python/modules/base.py", line 397, in __call__
    return self._call(*args, **kwargs)
  File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/sonnet_repo/sonnet/python/modules/base.py", line 419, in _call
    outputs, subgraph_name_scope = self._template(*args, **kwargs)
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/template.py", line 398, in __call__
    return self._call_func(args, kwargs)
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/template.py", line 368, in _call_func
    result = self._func(*args, **kwargs)
  File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/sonnet_repo/sonnet/python/modules/base.py", line 227, in _build_wrapper
    output = self._build(*args, **kwargs)
  File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/graph_nets_repo/graph_nets/_base.py", line 112, in _build
    return self._model(*args, **kwargs)
  File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/sonnet_repo/sonnet/python/modules/base.py", line 397, in __call__
    return self._call(*args, **kwargs)
  File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/sonnet_repo/sonnet/python/modules/base.py", line 419, in _call
    outputs, subgraph_name_scope = self._template(*args, **kwargs)
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/template.py", line 398, in __call__
    return self._call_func(args, kwargs)
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/template.py", line 368, in _call_func
    result = self._func(*args, **kwargs)
  File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/sonnet_repo/sonnet/python/modules/base.py", line 227, in _build_wrapper
    output = self._build(*args, **kwargs)
  File "/tmp/bazel-cache/_bazel_aidengro/ab2551f03460bb1db9bd438eba2ec331/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/sonnet_repo/sonnet/python/modules/embed.py", line 182, in _build
    return tf.nn.embedding_lookup(embeddings, ids, name="embedding_lookup")
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/dispatch.py", line 1176, in op_dispatch_handler
    return dispatch_target(*args, **kwargs)
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/embedding_ops.py", line 326, in embedding_lookup
    return _embedding_lookup_and_transform(
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/embedding_ops.py", line 145, in _embedding_lookup_and_transform
    array_ops.gather(params[0], ids, name=name), ids, max_norm)
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/dispatch.py", line 1176, in op_dispatch_handler
    return dispatch_target(*args, **kwargs)
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/deprecation.py", line 576, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/array_ops.py", line 5138, in gather
    return gen_array_ops.gather_v2(params, indices, axis, name=name)
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 3982, in gather_v2
    _, _, _op, _outputs = _op_def_library._apply_op_helper(
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/op_def_library.py", line 795, in _apply_op_helper
    op = g._create_op_internal(op_type_name, inputs, dtypes=None,
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py", line 3381, in _create_op_internal
    ret = Operation.from_node_def(

With the following command line invocation:

bazel run //gematria/granite/python:run_granite_model -- --gematria_action=train --gematria_checkpoint_dir=/tmp/test_model/ --gematria_learning_rate=0.001 --gematria_loss_type=mean_absolute_error --gematria_training_num_epochs=100000 --gematria_tokens_file=/data/vocab_10u7.txt  --gematria_input_file=/tmp/test.tfrecord  --gematria_max_blocks_in_batch=100 --gematria_learning_rate_schedule=cosine --gematria_decay_steps=100000

With the tfrecord dataset produced from the following csv:

f3b801000000,1

With the patch from #107 applied.

TEST llvm-cm X86/multi_func.s FAILED

Hello, after building tflite, there is an error with llvm-cm:

(env) $                         ninja check-llvm-tools-llvm-cm
[0/1] Running llvm-cm tests
llvm-lit: /home/hrong1/llvm-src/llvm-project/llvm/utils/lit/lit/llvm/config.py:502: note: using yaml2obj: /home/hrong1/llvm-src/cmake-build/bin/yaml2obj
llvm-lit: /home/hrong1/llvm-src/llvm-project/llvm/utils/lit/lit/llvm/config.py:502: note: using llvm-cm: /home/hrong1/llvm-src/cmake-build/bin/llvm-cm
llvm-lit: /home/hrong1/llvm-src/llvm-project/llvm/utils/lit/lit/llvm/config.py:502: note: using split-file: /home/hrong1/llvm-src/cmake-build/bin/split-file
llvm-lit: /home/hrong1/llvm-src/llvm-project/llvm/utils/lit/lit/llvm/config.py:502: note: using llvm-mc: /home/hrong1/llvm-src/cmake-build/bin/llvm-mc
FAIL: llvm-cm :: X86/multi_func.s (11 of 11)
******************** TEST 'llvm-cm :: X86/multi_func.s' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 2
/home/hrong1/llvm-src/cmake-build/bin/llvm-mc -o /home/hrong1/llvm-src/cmake-build/X86/Output/multi_func.s.tmp.o --filetype=obj -triple=x86_64-unknown-linux-gnu /home/hrong1/gematria/llvm_cm/test/X86/multi_func.s
# executed command: /home/hrong1/llvm-src/cmake-build/bin/llvm-mc -o /home/hrong1/llvm-src/cmake-build/X86/Output/multi_func.s.tmp.o --filetype=obj -triple=x86_64-unknown-linux-gnu /home/hrong1/gematria/llvm_cm/test/X86/multi_func.s
# RUN: at line 3
/home/hrong1/llvm-src/cmake-build/bin/llvm-cm /home/hrong1/llvm-src/cmake-build/X86/Output/multi_func.s.tmp.o -csv=/home/hrong1/gematria/llvm_cm/test/X86/Inputs/multi-func.csv -granite_model=/home/hrong1/gematria/llvm_cm/test/X86/Inputs/gb-token-mit-2022_12_02.tflite -evaluator=granite | /home/hrong1/llvm-src/cmake-build/bin/FileCheck /home/hrong1/gematria/llvm_cm/test/X86/multi_func.s
# executed command: /home/hrong1/llvm-src/cmake-build/bin/llvm-cm /home/hrong1/llvm-src/cmake-build/X86/Output/multi_func.s.tmp.o -csv=/home/hrong1/gematria/llvm_cm/test/X86/Inputs/multi-func.csv -granite_model=/home/hrong1/gematria/llvm_cm/test/X86/Inputs/gb-token-mit-2022_12_02.tflite -evaluator=granite
# .---command stderr------------
# | Unexpected node token: 'RIP'
# `-----------------------------
# executed command: /home/hrong1/llvm-src/cmake-build/bin/FileCheck /home/hrong1/gematria/llvm_cm/test/X86/multi_func.s
# .---command stderr------------
# | /home/hrong1/gematria/llvm_cm/test/X86/multi_func.s:8:15: error: CHECK-NEXT: expected string not found in input
# | # CHECK-NEXT: Calculated Frequency: 8.342712e+03
# |               ^
# | <stdin>:1:11: note: scanning from here
# | <reverse>:
# |           ^
# | <stdin>:2:1: note: possible intended match here
# | Calculated Frequency: 8.342695e+03
# | ^
# |
# | Input file: <stdin>
# | Check file: /home/hrong1/gematria/llvm_cm/test/X86/multi_func.s
# |
# | -dump-input=help explains the following input dump.
# |
# | Input was:
# | <<<<<<
# |           1: <reverse>:
# | next:8'0               X~ error: no match found
# |           2: Calculated Frequency: 8.342695e+03
# | next:8'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | next:8'1     ?                                   possible intended match
# |           3: <tallestBillboard>:
# | next:8'0     ~~~~~~~~~~~~~~~~~~~~~
# |           4: Calculated Frequency: 2.928508e+05
# | next:8'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           5: <isMatch>:
# | next:8'0     ~~~~~~~~~~~~
# |           6: Calculated Frequency: 8.204262e+02
# | next:8'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           7: <bubbleSort>:
# | next:8'0     ~~~~~~~~~~~~~~~
# |           .
# |           .
# |           .
# | >>>>>>
# `-----------------------------
# error: command failed with exit status: 1

--

********************
********************
Failed Tests (1):
  llvm-cm :: X86/multi_func.s


Testing Time: 0.59s

Total Discovered Tests: 11
  Passed: 10 (90.91%)
  Failed:  1 (9.09%)
FAILED: tools/gematria/llvm_cm/CMakeFiles/check-llvm-tools-llvm-cm /home/hrong1/llvm-src/cmake-build/tools/gematria/llvm_cm/CMakeFiles/check-llvm-tools-llvm-cm
cd /home/hrong1/llvm-src/cmake-build/tools/gematria/llvm_cm && /home/hrong1/gematria/env/bin/python3 /home/hrong1/llvm-src/cmake-build/./bin/llvm-lit -sv /home/hrong1/llvm-src/cmake-build/tools/gematria/llvm_cm

Bazel test failed: bhive_importer_test

Hi, after building gematria (bazel build ...), I run bazel test ..., but failed at bhive_importer_test.test_x86_parse_csv_line, and the error log indicate that

e_importer_test/test.log 
exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //gematria/datasets/python:bhive_importer_test
-----------------------------------------------------------------------------
Running tests under Python 3.10.0: /home/gematria/gematria_env/bin/python3
[ RUN      ] BhiveImporterTest.test_x86_basic_block_proto_from_bytes
[       OK ] BhiveImporterTest.test_x86_basic_block_proto_from_bytes
[ RUN      ] BhiveImporterTest.test_x86_basic_block_proto_from_hex
[       OK ] BhiveImporterTest.test_x86_basic_block_proto_from_hex
[ RUN      ] BhiveImporterTest.test_x86_nonstandard_columns
[       OK ] BhiveImporterTest.test_x86_nonstandard_columns
[ RUN      ] BhiveImporterTest.test_x86_parse_csv_line
[  FAILED  ] BhiveImporterTest.test_x86_parse_csv_line
======================================================================
ERROR: test_x86_parse_csv_line (__main__.BhiveImporterTest)
BhiveImporterTest.test_x86_parse_csv_line
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/lukez/.cache/bazel/_bazel_lukez/6ec059981b607312b48b2c4811597fe7/sandbox/linux-sandbox/6/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/datasets/python/bhive_importer_test.runfiles/com_google_gematria/gematria/datasets/python/bhive_importer_test.py", line 203, in test_x86_parse_csv_line
    block_proto = importer.basic_block_with_throughput_proto_from_csv_line(
TypeError: basic_block_with_throughput_proto_from_csv_line(): incompatible function arguments. The following argument types are supported:
    1. (self: gematria.datasets.python.bhive_importer.BHiveImporter, source_name: str, line: str, machine_code_hex_column_index: int, throughput_column_index: int, throughput_scaling: float = 1.0, base_address: int = 0) -> gematria::BasicBlockWithThroughputProto

Invoked with: <gematria.datasets.python.bhive_importer.BHiveImporter object at 0x7ffb153321f0>; kwargs: source_name='test: made-up', line='4829d38b44246c8b54246848c1fb034829d04839c3,10', base_address=600, throughput_scaling=2.0

----------------------------------------------------------------------
Ran 4 tests in 0.069s

FAILED (errors=1)

Do you know how to address this issue? The machine I am using is a Intel Broadwell in x86.

Attempt a complete mlgo regalloc training using gematria as latency predictor

Main goals are to:

see what's missing.
fix what's missing so we have a complete testbed others can use (basically llvm-cm plugin)

We can start with @boomanaiden154 's very simple decompression benchmark and then @virajbshah 's cache missing benchmarks - totally fine if models are overfitting initially.

Assign requires shapes of both tensors to match.

Hello, I try to train the granite model and get an error with unmatched tensors:

env USE_BAZEL_VERSION=6.4.0  ../bazelisk-linux-amd64   run //gematria/granite/python:run_granite_model -- --gematria_action=train --gematria_checkpoint_dir=/tmp/test_model/ --gematria_training_num_epochs=10  --gematria_input_file=/tmp/basic_blocks_with_throughput.tfrecord  --gematria_tokens_file=/tmp/tokens.txt

(env) ~/gematria$ env USE_BAZEL_VERSION=6.4.0  ../bazelisk-linux-amd64   run //gematria/granite/python:run_granite_model -- --gematria_action=train --gematria_checkpoint_dir=/tmp/test_model/ --gematria_training_num_epochs=10  --gematria_input_file=/tmp/basic_blocks_with_throughput.tfrecord  --gematria_tokens_file=/tmp/tokens.txt
INFO: Analyzed target //gematria/granite/python:run_granite_model (0 packages loaded, 85 targets configured).
INFO: Found 1 target...
Target //gematria/granite/python:run_granite_model up-to-date:
  bazel-bin/gematria/granite/python/run_granite_model
INFO: Elapsed time: 0.168s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Running command line: bazel-bin/gematria/granite/python/run_granite_model '--gematria_action=train' '--gematria_checkpoint_dir=/tmp/test_model/' '--gematria_training_num_epochs=10' '--gematria_input_file=/tmp/basic_blocks_with_throughput.tfrecord' '--gematria_tokens_file=/tmp/tokens.txt'
2024-06-07 11:42:19.573765: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-07 11:42:19.575373: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-06-07 11:42:19.603053: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-07 11:42:19.603109: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-07 11:42:19.604100: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-06-07 11:42:19.609623: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-06-07 11:42:19.609868: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-07 11:42:20.147141: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
WARNING:tensorflow:From /home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/compat/v2_compat.py:108: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
W0607 11:42:21.881521 139849213219904 model_base.py:683] ModelBase._output_tensor has invalid name. Expected ModelBase.output_tensor, found concat/concat:0.
W0607 11:42:22.258888 139849213219904 model_base.py:902] ModelBase._synchronous_training is True with a single worker.
I0607 11:42:23.650717 139849213219904 timer.py:61] Creating model: TokenGraphBuilderModel: 2.525757s
I0607 11:42:23.650949 139849213219904 timer.py:61] Loading basic blocks: 0.000080s
WARNING:tensorflow:From /home/hrong1/.cache/bazel/_bazel_hrong1/32246067180bfaeac7e17e4449bcdc84/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/model/python/main_function.py:545: StopAtStepHook.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
W0607 11:42:23.651067 139849213219904 deprecation.py:50] From /home/hrong1/.cache/bazel/_bazel_hrong1/32246067180bfaeac7e17e4449bcdc84/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/model/python/main_function.py:545: StopAtStepHook.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/monitored_session.py:579: StepCounterHook.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
W0607 11:42:23.882867 139849213219904 deprecation.py:50] From /home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/monitored_session.py:579: StepCounterHook.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/basic_session_run_hooks.py:686: SecondOrStepTimer.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
W0607 11:42:23.883046 139849213219904 deprecation.py:50] From /home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/basic_session_run_hooks.py:686: SecondOrStepTimer.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/monitored_session.py:586: SummarySaverHook.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
W0607 11:42:23.883132 139849213219904 deprecation.py:50] From /home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/monitored_session.py:586: SummarySaverHook.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/monitored_session.py:597: CheckpointSaverHook.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
W0607 11:42:23.883212 139849213219904 deprecation.py:50] From /home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/monitored_session.py:597: CheckpointSaverHook.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
INFO:tensorflow:Create CheckpointSaverHook.
I0607 11:42:23.883271 139849213219904 basic_session_run_hooks.py:557] Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
I0607 11:42:25.717828 139849213219904 monitored_session.py:240] Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/test_model/model.ckpt-0
I0607 11:42:25.724106 139849213219904 saver.py:1413] Restoring parameters from /tmp/test_model/model.ckpt-0
2024-06-07 11:42:25.754281: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled
Traceback (most recent call last):
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/client/session.py", line 1402, in _do_call
    return fn(*args)
           ^^^^^^^^^
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/client/session.py", line 1385, in _run_fn
    return self._call_tf_sessionrun(options, feed_dict, fetch_list,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/client/session.py", line 1478, in _call_tf_sessionrun
    return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [53,128] rhs shape= [9,128]
         [[{{node save/Assign_12}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/saver.py", line 1418, in restore
    sess.run(self.saver_def.restore_op_name,
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/client/session.py", line 972, in run
    result = self._run(None, fetches, feed_dict, options_ptr,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/client/session.py", line 1215, in _run
    results = self._do_run(handle, final_targets, final_fetches,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/client/session.py", line 1395, in _do_run
    return self._do_call(_run_fn, feeds, fetches, targets, options,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/client/session.py", line 1421, in _do_call
    raise type(e)(node_def, op, message)  # pylint: disable=no-value-for-parameter
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:

Detected at node 'save/Assign_12' defined at (most recent call last):
    File "/home/hrong1/.cache/bazel/_bazel_hrong1/32246067180bfaeac7e17e4449bcdc84/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/granite/python/run_granite_model.py", line 109, in <module>
    File "/home/hrong1/gematria/env/lib/python3.11/site-packages/absl/app.py", line 308, in run
    File "/home/hrong1/gematria/env/lib/python3.11/site-packages/absl/app.py", line 254, in _run_main
    File "/home/hrong1/.cache/bazel/_bazel_hrong1/32246067180bfaeac7e17e4449bcdc84/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/granite/python/run_granite_model.py", line 48, in main
    File "/home/hrong1/.cache/bazel/_bazel_hrong1/32246067180bfaeac7e17e4449bcdc84/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/model/python/main_function.py", line 893, in run_gematria_model_from_command_line_flags
    File "/home/hrong1/.cache/bazel/_bazel_hrong1/32246067180bfaeac7e17e4449bcdc84/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/model/python/main_function.py", line 565, in _monitored_training_session_from_flags
Node: 'save/Assign_12'
Assign requires shapes of both tensors to match. lhs shape= [53,128] rhs shape= [9,128]
         [[{{node save/Assign_12}}]]

Original stack trace for 'save/Assign_12':
  File "/home/hrong1/.cache/bazel/_bazel_hrong1/32246067180bfaeac7e17e4449bcdc84/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/granite/python/run_granite_model.py", line 109, in <module>
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/absl/app.py", line 308, in run
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/absl/app.py", line 254, in _run_main
  File "/home/hrong1/.cache/bazel/_bazel_hrong1/32246067180bfaeac7e17e4449bcdc84/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/granite/python/run_granite_model.py", line 48, in main
  File "/home/hrong1/.cache/bazel/_bazel_hrong1/32246067180bfaeac7e17e4449bcdc84/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/model/python/main_function.py", line 893, in run_gematria_model_from_command_line_flags
  File "/home/hrong1/.cache/bazel/_bazel_hrong1/32246067180bfaeac7e17e4449bcdc84/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/model/python/main_function.py", line 565, in _monitored_training_session_from_flags
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/saver.py", line 934, in __init__
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/saver.py", line 946, in build
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/saver.py", line 974, in _build
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/saver.py", line 543, in _build_internal
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/saver.py", line 383, in _AddRestoreOps
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/saving/saveable_object_util.py", line 86, in restore
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/ops/state_ops.py", line 353, in assign
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/ops/gen_state_ops.py", line 61, in assign
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/framework/op_def_library.py", line 796, in _apply_op_helper
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/framework/ops.py", line 2652, in _create_op_internal
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/framework/ops.py", line 1160, in from_node_def


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/hrong1/.cache/bazel/_bazel_hrong1/32246067180bfaeac7e17e4449bcdc84/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/granite/python/run_granite_model.py", line 109, in <module>
    app.run(main)
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
             ^^^^^^^^^^
  File "/home/hrong1/.cache/bazel/_bazel_hrong1/32246067180bfaeac7e17e4449bcdc84/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/granite/python/run_granite_model.py", line 48, in main
    main_function.run_gematria_model_from_command_line_flags(
  File "/home/hrong1/.cache/bazel/_bazel_hrong1/32246067180bfaeac7e17e4449bcdc84/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/model/python/main_function.py", line 893, in run_gematria_model_from_command_line_flags
    session = _monitored_training_session_from_flags(model, is_chief)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hrong1/.cache/bazel/_bazel_hrong1/32246067180bfaeac7e17e4449bcdc84/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/model/python/main_function.py", line 571, in _monitored_training_session_from_flags
    return tf.train.MonitoredTrainingSession(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/monitored_session.py", line 606, in MonitoredTrainingSession
    return MonitoredSession(
           ^^^^^^^^^^^^^^^^^
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/monitored_session.py", line 1050, in __init__
    super(MonitoredSession, self).__init__(
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/monitored_session.py", line 753, in __init__
    self._sess = _RecoverableSession(self._coordinated_creator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/monitored_session.py", line 1259, in __init__
    _WrappedSession.__init__(self, self._create_session())
                                   ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/monitored_session.py", line 1264, in _create_session
    return self._sess_creator.create_session()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/monitored_session.py", line 906, in create_session
    self.tf_sess = self._session_creator.create_session()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/monitored_session.py", line 665, in create_session
    return self._get_session_manager().prepare_session(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/session_manager.py", line 320, in prepare_session
    sess, is_loaded_from_checkpoint = self._restore_checkpoint(
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/session_manager.py", line 254, in _restore_checkpoint
    _restore_checkpoint_and_maybe_run_saved_model_initializers(
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/session_manager.py", line 71, in _restore_checkpoint_and_maybe_run_saved_model_initializers
    saver.restore(sess, path)
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/saver.py", line 1454, in restore
    raise _wrap_restore_error_with_msg(
tensorflow.python.framework.errors_impl.InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Graph execution error:

Detected at node 'save/Assign_12' defined at (most recent call last):
    File "/home/hrong1/.cache/bazel/_bazel_hrong1/32246067180bfaeac7e17e4449bcdc84/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/granite/python/run_granite_model.py", line 109, in <module>
    File "/home/hrong1/gematria/env/lib/python3.11/site-packages/absl/app.py", line 308, in run
    File "/home/hrong1/gematria/env/lib/python3.11/site-packages/absl/app.py", line 254, in _run_main
    File "/home/hrong1/.cache/bazel/_bazel_hrong1/32246067180bfaeac7e17e4449bcdc84/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/granite/python/run_granite_model.py", line 48, in main
    File "/home/hrong1/.cache/bazel/_bazel_hrong1/32246067180bfaeac7e17e4449bcdc84/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/model/python/main_function.py", line 893, in run_gematria_model_from_command_line_flags
    File "/home/hrong1/.cache/bazel/_bazel_hrong1/32246067180bfaeac7e17e4449bcdc84/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/model/python/main_function.py", line 565, in _monitored_training_session_from_flags
Node: 'save/Assign_12'
Assign requires shapes of both tensors to match. lhs shape= [53,128] rhs shape= [9,128]
         [[{{node save/Assign_12}}]]

Original stack trace for 'save/Assign_12':
  File "/home/hrong1/.cache/bazel/_bazel_hrong1/32246067180bfaeac7e17e4449bcdc84/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/granite/python/run_granite_model.py", line 109, in <module>
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/absl/app.py", line 308, in run
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/absl/app.py", line 254, in _run_main
  File "/home/hrong1/.cache/bazel/_bazel_hrong1/32246067180bfaeac7e17e4449bcdc84/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/granite/python/run_granite_model.py", line 48, in main
  File "/home/hrong1/.cache/bazel/_bazel_hrong1/32246067180bfaeac7e17e4449bcdc84/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/model/python/main_function.py", line 893, in run_gematria_model_from_command_line_flags
  File "/home/hrong1/.cache/bazel/_bazel_hrong1/32246067180bfaeac7e17e4449bcdc84/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/granite/python/run_granite_model.runfiles/com_google_gematria/gematria/model/python/main_function.py", line 565, in _monitored_training_session_from_flags
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/saver.py", line 934, in __init__
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/saver.py", line 946, in build
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/saver.py", line 974, in _build
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/saver.py", line 543, in _build_internal
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/saver.py", line 383, in _AddRestoreOps
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/training/saving/saveable_object_util.py", line 86, in restore
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/ops/state_ops.py", line 353, in assign
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/ops/gen_state_ops.py", line 61, in assign
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/framework/op_def_library.py", line 796, in _apply_op_helper
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/framework/ops.py", line 2652, in _create_op_internal
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/framework/ops.py", line 1160, in from_node_def

The input tfrecord is converted from gematria/testing/testdata/basic_blocks_with_throughput.pbtxt.

Any suggestion?

Or do you have any working example for me to try instead? I just want to see how to train the model at this stage.

Thanks,
Hongbo

FindAccessedAddrsExegesisTest: can't run 'latency' mode

Hello, after building gematria, there is 1 failure with FindAccessedAddrsExegesisTest:

(env)$ env USE_BAZEL_VERSION=6.4.0 ../bazelisk-linux-amd64 test ...
...
//gematria/datasets:find_accessed_addrs_exegesis_test                    FAILED in 0.8s
  /home/hrong1/.cache/bazel/_bazel_hrong1/32246067180bfaeac7e17e4449bcdc84/execroot/com_google_gematria/bazel-out/k8-fastbuild/testlogs/gematria/datasets/find_accessed_addrs_exegesis_test/test.log

Executed 1 out of 51 tests: 50 tests pass and 1 fails locally.

Here is the content of find_accessed_addrs_exegesis_test/test.log:

Executing tests from //gematria/datasets:find_accessed_addrs_exegesis_test
-----------------------------------------------------------------------------
Running main() from gmock_main.cc
[==========] Running 5 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 5 tests from FindAccessedAddrsExegesisTest
[ RUN      ] FindAccessedAddrsExegesisTest.ExegesisNoAccess
Failure value returned from cantFail wrapped call
can't run 'latency' mode, sched model does not define a cycle counter. You can pass --benchmark-phase=... to skip the actual benchmarking or --use-dummy-perf-counters to not query the kernel for real event counts.
UNREACHABLE executed at external/llvm-project/llvm/include/llvm/Support/Error.h:790!

Is this expected? This looks like an important test to fix, as it seems to measure cycles of instructions, which I guess is a basic functionality of gematria.
Thanks!

Update llvm-cm against recent LLVM changes

llvm-cm needs to be updated to reflect some recent LLVM changes:

BBRanges are now used to represent the basic blocks in a function. llvm-cm needs to support these for cases like basic block sections and split machine functions (and should at the very least have test coverage for them).
-mbb-profile-dump no longer exists, and instead PGOAnalysisMap should be used. Tests/code needs to be updated for this.

This (at least the second part) is sort of a prerequisite for #55.

error in converting bhive to tfrecord

Hello, I am trying the example at https://github.com/google/gematria/blob/main/g3doc/obtaining-training-data.md to convert bhive to tfrecord, and get the following error:

(env) ~/gematria$ curl -L https://raw.githubusercontent.com/ithemal/bhive/5f1d50077ac0779fd227b261dcf517862c7104bd/benchmark/throughput/skl.csv > skl.csv
(env) ~/gematria$ env USE_BAZEL_VERSION=6.4.0  ../bazelisk-linux-amd64  run //gematria/datasets/python:import_from_bhive -- \
    --gematria_input_csv=skl.csv \
    --gematria_output_tfrecord=skl.tfrecord \
    --gematria_throughput_source_name="bhive: skl"
INFO: Analyzed target //gematria/datasets/python:import_from_bhive (9 packages loaded, 3974 targets configured).
INFO: Found 1 target...
INFO: From Compiling llvm/lib/Support/Process.cpp:
In file included from external/llvm-project/llvm/lib/Support/Process.cpp:123:
external/llvm-project/llvm/lib/Support/Unix/Process.inc:101:10: warning: 'mallinfo' is deprecated [-Wdeprecated-declarations]
  mi = ::mallinfo();
         ^
/usr/include/malloc.h:114:48: note: 'mallinfo' has been explicitly marked deprecated here
extern struct mallinfo mallinfo (void) __THROW __MALLOC_DEPRECATED;
                                               ^
/usr/include/malloc.h:32:30: note: expanded from macro '__MALLOC_DEPRECATED'
# define __MALLOC_DEPRECATED __attribute_deprecated__
                             ^
/usr/include/x86_64-linux-gnu/sys/cdefs.h:339:51: note: expanded from macro '__attribute_deprecated__'
# define __attribute_deprecated__ __attribute__ ((__deprecated__))
                                                  ^
1 warning generated.
INFO: From Compiling llvm/lib/Support/Process.cpp [for tool]:
In file included from external/llvm-project/llvm/lib/Support/Process.cpp:123:
external/llvm-project/llvm/lib/Support/Unix/Process.inc:101:10: warning: 'mallinfo' is deprecated [-Wdeprecated-declarations]
  mi = ::mallinfo();
         ^
/usr/include/malloc.h:114:48: note: 'mallinfo' has been explicitly marked deprecated here
extern struct mallinfo mallinfo (void) __THROW __MALLOC_DEPRECATED;
                                               ^
/usr/include/malloc.h:32:30: note: expanded from macro '__MALLOC_DEPRECATED'
# define __MALLOC_DEPRECATED __attribute_deprecated__
                             ^
/usr/include/x86_64-linux-gnu/sys/cdefs.h:339:51: note: expanded from macro '__attribute_deprecated__'
# define __attribute_deprecated__ __attribute__ ((__deprecated__))
                                                  ^
1 warning generated.
Target //gematria/datasets/python:import_from_bhive up-to-date:
  bazel-bin/gematria/datasets/python/import_from_bhive
INFO: Elapsed time: 144.058s, Critical Path: 31.45s
INFO: 782 processes: 3 internal, 779 linux-sandbox.
INFO: Build completed successfully, 782 total actions
INFO: Running command line: bazel-bin/gematria/datasets/python/import_from_bhive '--gematria_input_csv=skl.csv' '--gematria_output_tfrecord=skl.tfrecord' '--gematria_throughput_source_name=bhive: skl'
2024-06-11 11:24:53.336711: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-11 11:24:53.338850: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-06-11 11:24:53.372145: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-11 11:24:53.372199: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-11 11:24:53.373893: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-06-11 11:24:53.382875: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-06-11 11:24:53.383089: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
  File "/home/hrong1/.cache/bazel/_bazel_hrong1/32246067180bfaeac7e17e4449bcdc84/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/datasets/python/import_from_bhive.runfiles/com_google_gematria/gematria/datasets/python/import_from_bhive.py", line 36, in <module>
    import tensorflow as tf
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/__init__.py", line 48, in <module>
    from tensorflow._api.v2 import __internal__
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/_api/v2/__internal__/__init__.py", line 8, in <module>
    from tensorflow._api.v2.__internal__ import autograph
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/_api/v2/__internal__/autograph/__init__.py", line 8, in <module>
    from tensorflow.python.autograph.core.ag_ctx import control_status_ctx # line: 34
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/autograph/core/ag_ctx.py", line 21, in <module>
    from tensorflow.python.autograph.utils import ag_logging
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/autograph/utils/__init__.py", line 17, in <module>
    from tensorflow.python.autograph.utils.context_managers import control_dependency_on_returns
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/autograph/utils/context_managers.py", line 19, in <module>
    from tensorflow.python.framework import ops
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/framework/ops.py", line 44, in <module>
    from tensorflow.python.client import pywrap_tf_session
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/client/pywrap_tf_session.py", line 25, in <module>
    from tensorflow.python.util import tf_stack
  File "/home/hrong1/gematria/env/lib/python3.11/site-packages/tensorflow/python/util/tf_stack.py", line 22, in <module>
    from tensorflow.python.util import _tf_stack
ImportError: generic_type: cannot initialize type "StatusCode": an object with that name is already defined

I'm wondering if this is expected or if there is something wrong with my environment. My environment uses the same requirment.in as master except tensorflow-probability>=0.19.0 is changed to tensorflow-probability==0.23.0.

Any suggestion? Thanks!
Hongbo

Implement tooling for python formatting

There is currently no python formatting tooling in the repository, although it presumably follows the Google Python style guide.

I'd like to propose using yapf:

It seems to be the standard for Google open source projects.
Does require some reformatting of code (based on my testing, might be missing a flag or something)
Is fairly well maintained.

The only issue is that yapf currently doesn't support match-case statements due to some third-party dependencies being unmaintained. This is being tracked in this issue and some work has been going on recently in this PR, but nothing has been finalized yet. This means yapf fails when formatting gematria currently. Building from a fork seems like it should work in the mean time.

Noting here that other formatters like black or pyink don't have this issue.

[Abseil C++] Remove dependency on absl::StatusOr

Title says it all. Remove references to absl::StatusOr from the C++ code.

Segmentation fault with cpp protos in compile_modules_lib

When adding any protobuf library as a dependency to compile_modules_lib, we get a segmentation fault (at least in compile_modules_lib_test) rather than the expected behavior:

Fatal Python error: Segmentation fault

Thread 0x00007f3a20e00640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 324 in wait
  File "/usr/lib/python3.10/threading.py", line 607 in wait
  File "/usr/local/lib/python3.10/dist-packages/apache_beam/runners/worker/data_plane.py", line 255 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Current thread 0x00007f3ad2b9d1c0 (most recent call first):
  File "/usr/local/lib/python3.10/dist-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py", line 1048 in _run_bundle
  File "/usr/local/lib/python3.10/dist-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py", line 811 in _execute_bundle
  File "/usr/local/lib/python3.10/dist-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py", line 483 in run_stages
  File "/usr/local/lib/python3.10/dist-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py", line 228 in run_via_runner_api
  File "/usr/local/lib/python3.10/dist-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py", line 204 in run_pipeline
  File "/usr/local/lib/python3.10/dist-packages/apache_beam/runners/direct/direct_runner.py", line 128 in run_pipeline
  File "/usr/local/lib/python3.10/dist-packages/apache_beam/pipeline.py", line 587 in run
  File "/usr/local/lib/python3.10/dist-packages/apache_beam/pipeline.py", line 563 in run
  File "/usr/local/lib/python3.10/dist-packages/apache_beam/testing/test_pipeline.py", line 115 in run
  File "/usr/local/lib/python3.10/dist-packages/apache_beam/pipeline.py", line 613 in __exit__
  File "/root/.cache/bazel/_bazel_root/5b63f27bc35a3d0572c069ebf1768159/sandbox/linux-sandbox/8748/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/datasets/pipelines/compile_modules_lib_test.runfiles/com_google_gematria/gematria/datasets/pipelines/compile_modules_lib_test.py", line 129 in test_get_bbs
  File "/usr/lib/python3.10/unittest/case.py", line 549 in _callTestMethod
  File "/usr/lib/python3.10/unittest/case.py", line 591 in run
  File "/usr/lib/python3.10/unittest/case.py", line 650 in __call__
  File "/usr/lib/python3.10/unittest/suite.py", line 122 in run
  File "/usr/lib/python3.10/unittest/suite.py", line 84 in __call__
  File "/usr/lib/python3.10/unittest/suite.py", line 122 in run
  File "/usr/lib/python3.10/unittest/suite.py", line 84 in __call__
  File "/usr/lib/python3.10/unittest/runner.py", line 184 in run
  File "/usr/lib/python3.10/unittest/main.py", line 271 in runTests
  File "/usr/lib/python3.10/unittest/main.py", line 101 in __init__
  File "/usr/local/lib/python3.10/dist-packages/absl/testing/absltest.py", line 2653 in _run_and_get_tests_result
  File "/usr/local/lib/python3.10/dist-packages/absl/testing/absltest.py", line 2689 in run_tests
  File "/usr/local/lib/python3.10/dist-packages/absl/testing/absltest.py", line 2234 in main_function
  File "/usr/local/lib/python3.10/dist-packages/absl/app.py", line 254 in _run_main
  File "/usr/local/lib/python3.10/dist-packages/absl/app.py", line 308 in run
  File "/usr/local/lib/python3.10/dist-packages/absl/testing/absltest.py", line 2236 in _run_in_app
  File "/usr/local/lib/python3.10/dist-packages/absl/testing/absltest.py", line 2131 in main
  File "/root/.cache/bazel/_bazel_root/5b63f27bc35a3d0572c069ebf1768159/sandbox/linux-sandbox/8748/execroot/com_google_gematria/bazel-out/k8-fastbuild/bin/gematria/datasets/pipelines/compile_modules_lib_test.runfiles/com_google_gematria/gematria/datasets/pipelines/compile_modules_lib_test.py", line 142 in <module>

Extension modules: google.protobuf.pyext._message, google3.net.proto2.python.internal.cpp._message, apache_beam.coders.stream, grpc._cython.cygrpc, apache_beam.utils.windowed_value, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, fastavro._logical_readers, fastavro._schema, zstandard.backend_c, fastavro._read, fastavro._logical_writers, fastavro._validation, fastavro._write, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, apache_beam.coders.coder_impl, apache_beam.transforms.cy_dataflow_distribution_counter, apache_beam.transforms.cy_combiners, charset_normalizer.md, apache_beam.utils.counters, apache_beam.runners.common, apache_beam.transforms.stats, apache_beam.metrics.cells, apache_beam.runners.worker.statesampler_fast, apache_beam.metrics.execution, bson._cbson, pymongo._cmessage, pyarrow._parquet, pyarrow._fs, pyarrow._azurefs, pyarrow._hdfs, pyarrow._gcsfs, pyarrow._s3fs, crcmod._crcfunext, regex._regex, apache_beam.runners.worker.opcounters, apache_beam.runners.worker.operations (total: 89)

The main difference between the two setups seems to be that without any proto dependency defined in bazel, we use the system-installed protobuf version, whereas when we have a proto dependency specified, we use the bazel-managed protobuf version and automatically use the cpp backend. I have not been able to see if the issue reproduces with the system installed protobuf with the cpp backend as it is not installed by default.

Incompatibility with Python 3.12

Python 3.12 supports only TensorFlow 2.16 - which drops the tf.estimator API, used by the current version of tensorflow-ranking used in Gematria. Until tensorflow-ranking gains compatibility with TensorFlow 2.16 (and hence Python 3.12), we'll have to stick with Python 3.11 and TensorFlow 2.15.

semantics of inverse_throughput_cycles and prefix_inverse_throughput_cycles

Hello, is it accurate to say that inverse_throughput_cycles refers to CPI (cycles per instruction)?

I do not understand the description of prefix_inverse_throughput_cycles in throughput.py: "The number of cycles of inverse throughput of the prefixes of the basic block.". There can be prefixes like REP for instructions, but what are the prefixes of a basic block?

Thanks!
Hongbo

how to reproduce the experiments in the Granite paper?

Hello, this is not an issue, but a question how to reproduce the experiments in Table 5 or 6 in the IISWC'22 Granite paper. Specific instructions would be greatly appreciated.
Hongbo

[Abseil C++] Remove dependencies on Abseil flags from the C++ code

Title says it all. Reduce dependency on Abseil libraries from the C++ code. Use LLVM command-line parsing facilities instead.

build failure: no such attribute 'exec_tools' in 'genrule' rule

Hello, I followed the instructions in README to install gematria, but the build failed for an issue: no such attribute 'exec_tools' in 'genrule' rule. Below are the details.

This is the version of bazelisk:

(env) [gematria]$ ../bazelisk-linux-amd64 version
Bazelisk version: v1.19.0
WARNING: Output base '/data/nfs_home/hrong1/.cache/bazel/_bazel_hrong1/fe3956cd0667122177c09d0692bd5c86' is on NFS. This may lead to surprising failures and undetermined behavior.
Build label: 7.1.1
Build target: @@//src/main/java/com/google/devtools/build/lib/bazel:BazelServer
Build time: Thu Mar 21 18:08:37 2024 (1711044517)
Build timestamp: 1711044517
Build timestamp as int: 1711044517

And this is the build:

(env) [gematria]$ ../bazelisk-linux-amd64 build ...
WARNING: Output base '/data/nfs_home/hrong1/.cache/bazel/_bazel_hrong1/fe3956cd0667122177c09d0692bd5c86' is on NFS. This may lead to surprising failures and undetermined behavior.
Starting local Bazel server and connecting to it...
WARNING: --enable_bzlmod is set, but no MODULE.bazel file was found at the workspace root. Bazel will create an empty MODULE.bazel file. Please consider migrating your external dependencies from WORKSPACE to MODULE.bazel. For more details, please refer to https://github.com/bazelbuild/bazel/issues/18958.
DEBUG: Rule 'com_google_protobuf' indicated that a canonical reproducible form can be obtained by modifying arguments commit = "a74f54b724bdc2fe0bfc271f4dc0ceb159805625" and dropping ["tag"]
DEBUG: Repository com_google_protobuf instantiated at:
  /data/nfs_home/hrong1/gematria/WORKSPACE:16:15: in <toplevel>
Repository rule git_repository defined at:
  /data/nfs_home/hrong1/.cache/bazel/_bazel_hrong1/fe3956cd0667122177c09d0692bd5c86/external/bazel_tools/tools/build_defs/repo/git.bzl:189:33: in <toplevel>
ERROR: /data/nfs_home/hrong1/.cache/bazel/_bazel_hrong1/fe3956cd0667122177c09d0692bd5c86/external/com_google_protobuf/python/BUILD.bazel:123:13: @@com_google_protobuf//python:aarch64_test_genrule: no such attribute 'exec_tools' in 'genrule' rule (did you mean 'executable'?)
ERROR: /data/nfs_home/hrong1/.cache/bazel/_bazel_hrong1/fe3956cd0667122177c09d0692bd5c86/external/com_google_protobuf/python/BUILD.bazel:131:12: @@com_google_protobuf//python:x86_64_test_genrule: no such attribute 'exec_tools' in 'genrule' rule (did you mean 'executable'?)
ERROR: /data/nfs_home/hrong1/.cache/bazel/_bazel_hrong1/fe3956cd0667122177c09d0692bd5c86/external/com_google_protobuf/python/BUILD.bazel:17:11: errors encountered resolving select() keys for @@com_google_protobuf//python:protobuf_python
ERROR: Analysis of target '//gematria/proto:canonicalized_instruction_py_pb2' failed; build aborted: Analysis failed
INFO: Elapsed time: 123.120s, Critical Path: 0.03s
INFO: 1 process: 1 internal.
ERROR: Build did NOT complete successfully
FAILED:
    Fetching repository @@pybind11; Cloning tags/v2.10.3 of https://github.com/pybind/pybind11.git
    Fetching repository @@pybind11_abseil_repo; Cloning 1caf1890443e8e303bf88850d3c27d5422903168 of https://github.com/pybind/pybind11_abseil.git
    Fetching repository @@sonnet_repo; Cloning cd5b5fa48e15e4d020f744968f5209949ebe750f of https://github.com/deepmind/sonnet.git
    Fetching repository @@graph_nets_repo; Cloning adf25162ba21bb0ae176c35483a74fb0c9dff576 of https://github.com/deepmind/graph_nets.git
    Fetching repository @@rules_license~; starting
    Fetching repository @@protobuf~; starting
    Fetching repository @@rules_java~; starting
    Fetching repository @@apple_support~; starting

Anyone has any idea? Thanks!

Parallelize benchmarking

With the large scale of our datasets (potentially 10^8 BBs), we will need a reasonably fast way to benchmark basic blocks. Parallelizing this is an obvious first step. This needs a couple things implemented on the LLVM side:

Shared memory names (used for memory annotations) need a name that is also based on the thread ID rather than just the process ID.
There needs to be an option to pin a benchmarking process to a specific core within llvm-exegesis.

(There might be more on the llvm-exegesis side).

Then, we need to do the following:

Implement parallel benchmarking using LLVM threading primitives.
Validate that running on multiple threads doesn't impact results (using validation counters).
Ship it.

tokens.txt

Hi,

I'm trying to follow the g3doc inference-api.md documentation, but when I run the command I'm missing the /tmp/tokens.txt file. Could you please let me know how to generate this file?

Thanks,
Z

Write comparison script

It would be good to validate to validate that the benchmarking numbers that we're getting match previous results (like BHive and uica-eval) to ensure that we aren't doing anything egregiously wrong. To do this we need to do a couple things:

Write a script (probably python) that can compare CSVs in the BHive format and identify (major) discrepancies.
Do a benchmarking run using our tooling against one of these datasets.
Run the comparison script, observe the results.

Bus error in annotator

Using the parallelized annotator:

Bus error (core dumped)

Need to see if this is reproducible and debug why it is happening. It seemed like this happened in the parent process rather than a signal received in the child process that would've been handled through ptrace.

Too many open files error

Failed to annotate block: INTERNAL: Failed to create a pipe for interprocess communication between llvm-exegesis and the benchmarking subprocess: Too many open files

More investigation is needed. Probably an issue on the LLVM side, but opening here first in case there is some complicated interaction.

--blocks_per_json_file flag not working as expected

Using the following CSV, test.csv:

85c044897c2460,98.000000
3b31,45.000000

With the following command line invocation, assuming ./json exists:

./bazel-bin/gematria/datasets/convert_bhive_to_llvm_exegesis_input --json_output_dir=./json --bhive_csv=./test.csv --blocks_per_json_file=1

We get the following in ./json:

0.json  1.json  2.json

Note that we should only get two files.

0.json:

[
  {
    "Hex": "85c044897c2460",
    "MemoryDefinitions": [
      {
        "Name": "MEM",
        "Size": 4096,
        "Value": 305419776
      }
    ],
    "MemoryMappings": [
      {
        "Address": 65536,
        "Value": "MEM"
      }
    ]
  }
]

1.json:

[
  {
    "Hex": "3b31",
    "MemoryDefinitions": [
      {
        "Name": "MEM",
        "Size": 4096,
        "Value": 305419776
      }
    ],
    "MemoryMappings": [
      {
        "Address": 65536,
        "Value": "MEM"
      }
    ]
  }
]

2.json:

[]

We see one of the blocks duplicated, the second block shows up twice, we get an extra file, and the extra file is empty. This needs to be fixed.

Implement benchmarking script

In order to construct large-scale BB datasets, we need a script that can perform these benchmarking runs, taking in annotated basic blocks from the annotation script (most likely in JSON), and then returning them with throughput information.

Snippet causing remappings of the same address in the exegesis annotator

# LLVM-EXEGESIS-DEFREG EFLAGS 12345600
# LLVM-EXEGESIS-DEFREG RCX 12345600
# LLVM-EXEGESIS-DEFREG RDI 12345600
# LLVM-EXEGESIS-DEFREG RIP 12345600
# LLVM-EXEGESIS-DEFREG XMM2 12345600
# LLVM-EXEGESIS-LOOP-REGISTER RDX
        movzbl  (%rcx), %eax
        movd    %edi, %xmm0
        pshufd  $0, %xmm0, %xmm0
        movdqa  (%rip), %xmm1
        pand    %xmm0, %xmm1
        pand    (%rip), %xmm0
        pxor    %xmm2, %xmm2
        movdqa  %xmm0, %xmm3
        pcmpeqd %xmm2, %xmm3
        movdqa  %xmm1, %xmm4
        pcmpeqd %xmm2, %xmm4
        packssdw        %xmm3, %xmm4
        packsswb        %xmm4, %xmm4
        movdqa  %xmm4, %xmm3
        pandn   (%rip), %xmm3
        movb    %al, (%rip)
        pand    (%rip), %xmm4
        por     %xmm3, %xmm4
        movq    %xmm4, (%rip)
        testb   $1, %dil
        movl    $45, %eax
        movl    $120, %ecx
        cmovel  %eax, %ecx
        movb    %cl, (%rip)
        testl   $2048, %edi

This snippet ends up making the exegesis annotator map the same address over and over (but eventually, it moves on to another page). Not sure why this behavior is occurring and more investigation is needed.

Rewrite `get_bbs` as a composite transform

The get_bbs function composes Beam pipelines. We should adopt the idiomatic Beam approach, which is to make this a composite transform.

Add layering check/header parsing to bazel flags

Adding a features = ["layering_check"] to all our packages incrementally (and eventually somewhere common) would be ideal. It would prevent any transitive dependency issues and would also make imports into google3 easier where this check is enabled by default.

how to generate llvm_mnemonic and memory alias group ID?

Hello, for a sequence of instructions executed on an emulator, how to generate the right llvm_mnemonics and memory alias group IDs? In the emulator, I can see the opcode, operands, and memory locations accessed.

Also, can we get rid of llvm_mnemonics entirely? Its information should have already been expressed by opcode and operands. In basic_block.cc, Instruction::AddTokensToList() does not seem to treat llvm_mnemonics as a token either; so getting rid of it should not affect the sonnet.Embed layer, I guess?

Thanks!
Hongbo

Remove the mocking of LayerNormalization in tests

As discussed in the review of #132, we're patching tf_keras.layers.LayerNormalization with itself in some of our tests, e.g. in gematria/granite/python/gnn_model_base_test.py. We should investigate whether this mocking is still needed, and if not, remove it.

convert_bhive_to_llvm_exegesis_inputs tests failing when binary is optimized

I'm observing the following failures:

FAIL: //gematria/datasets/convert_bhive_to_llvm_exegesis_input_tests:conversion.test_lit_test (see /root/.cache/bazel/_bazel_root/5b63f27bc35a3d0572c069ebf1768159/execroot/com_google_gematria/bazel-out/k8-opt/testlogs/gematria/datasets/convert_bhive_to_llvm_exegesis_input_tests/conversion.test_lit_test/test.log)
FAIL: //gematria/datasets/convert_bhive_to_llvm_exegesis_input_tests:loop_register.test_lit_test (see /root/.cache/bazel/_bazel_root/5b63f27bc35a3d0572c069ebf1768159/execroot/com_google_gematria/bazel-out/k8-opt/testlogs/gematria/datasets/convert_bhive_to_llvm_exegesis_input_tests/loop_register.test_lit_test/test.log)
FAIL: //gematria/datasets/convert_bhive_to_llvm_exegesis_input_tests:max_bb_count.test_lit_test (see /root/.cache/bazel/_bazel_root/5b63f27bc35a3d0572c069ebf1768159/execroot/com_google_gematria/bazel-out/k8-opt/testlogs/gematria/datasets/convert_bhive_to_llvm_exegesis_input_tests/max_bb_count.test_lit_test/test.log)

specifically when I run blaze test -c opt .... These failures notably do not appear when running blaze test ....

cannot create event unhalted_core_cycles

Hello, I checked out the latest gematria (5409714) for Ubuntu 22.04.4 LTS in WSL on a 12th Gen Intel(R) Core(TM) i7-1270P CPU.

I made no change except:

requirements.in
-tensorflow-probability>=0.19.0
+tensorflow-probability>=0.23.0
-tensorflow>=2.11.0; sys_platform=='linux'
+tensorflow>=2.15.1; sys_platform=='linux'

Then build, and test:

(env) (base) ~/gematria$   env USE_BAZEL_VERSION=6.4.0 ../bazelisk-linux-amd64 test ...
.....
FAIL: //gematria/datasets:find_accessed_addrs_exegesis_test (see /home/hrong1/.cache/bazel/_bazel_hrong1/32246067180bfaeac7e17e4449bcdc84/execroot/com_google_gematria/bazel-out/k8-fastbuild/testlogs/gematria/datasets/find_accessed_addrs_exegesis_test/test.log)
[6,387 / 6,636] 22 / 52 tests, 1 failed; 16 actions running; last test: //gematria/llvm/python:canonicalizer_test
INFO: From ProtoCompile external/com_google_protobuf/python/google/protobuf/compiler/plugin_pb2.py:
external/com_google_protobuf/.: warning: directory does not exist.
INFO: From ProtoCompile external/com_google_protobuf/python/google/protobuf/any_pb2.py:
external/com_google_protobuf/.: warning: directory does not exist.
INFO: From ProtoCompile external/com_google_protobuf/python/google/protobuf/duration_pb2.py:
external/com_google_protobuf/.: warning: directory does not exist.
INFO: From ProtoCompile external/com_google_protobuf/python/google/protobuf/descriptor_pb2.py:
external/com_google_protobuf/.: warning: directory does not exist.
INFO: From ProtoCompile external/com_google_protobuf/python/google/protobuf/api_pb2.py:
external/com_google_protobuf/.: warning: directory does not exist.
INFO: From ProtoCompile external/com_google_protobuf/python/google/protobuf/empty_pb2.py:
external/com_google_protobuf/.: warning: directory does not exist.
INFO: From ProtoCompile external/com_google_protobuf/python/google/protobuf/field_mask_pb2.py:
external/com_google_protobuf/.: warning: directory does not exist.
INFO: From ProtoCompile external/com_google_protobuf/python/google/protobuf/wrappers_pb2.py:
external/com_google_protobuf/.: warning: directory does not exist.
INFO: From ProtoCompile external/com_google_protobuf/python/google/protobuf/source_context_pb2.py:
external/com_google_protobuf/.: warning: directory does not exist.
INFO: From ProtoCompile external/com_google_protobuf/python/google/protobuf/struct_pb2.py:
external/com_google_protobuf/.: warning: directory does not exist.
INFO: From ProtoCompile external/com_google_protobuf/python/google/protobuf/timestamp_pb2.py:
external/com_google_protobuf/.: warning: directory does not exist.
INFO: From ProtoCompile external/com_google_protobuf/python/google/protobuf/type_pb2.py:
external/com_google_protobuf/.: warning: directory does not exist.

The log file of the failed test: /home/hrong1/.cache/bazel/_bazel_hrong1/32246067180bfaeac7e17e4449bcdc84/execroot/com_google_gematria/bazel-out/k8-fastbuild/testlogs/gematria/datasets/find_accessed_addrs_exegesis_test/test.log:

Executing tests from //gematria/datasets:find_accessed_addrs_exegesis_test
-----------------------------------------------------------------------------
Running main() from gmock_main.cc
[==========] Running 5 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 5 tests from FindAccessedAddrsExegesisTest
[ RUN      ] FindAccessedAddrsExegesisTest.ExegesisNoAccess
event not found - cannot create event unhalted_core_cycles
gematria/datasets/find_accessed_addrs_exegesis_test.cc:91: Failure
Value of: static_cast<bool>(AddrsOrErr)
  Actual: false
Expected: true

Any idea? Thanks!
Hongbo

Incorrect canonicalized instruction

Looking at the following block:

basic_block {
  machine_instructions {
    assembly: "\tmovl\t$7, %eax"
    machine_code: "\270\007\000\000\000"
  }
  machine_instructions {
    address: 5
    assembly: "\trep\t\tmovl\t$1, %eax"
    machine_code: "\363\270\001\000\000\000"
  }
  canonicalized_instructions {
    mnemonic: "MOV"
    llvm_mnemonic: "MOV32ri"
    output_operands {
      register_name: "EAX"
    }
    input_operands {
      immediate_value: 7
    }
  }
  canonicalized_instructions {
    mnemonic: "MOV\tEAX,"
    prefixes: "REP"
    llvm_mnemonic: "MOV32ri"
    output_operands {
      register_name: "EAX"
    }
    input_operands {
      immediate_value: 1
    }
  }
}
inverse_throughputs {
  source: "zen2"
  inverse_throughput_cycles: 100.0
}

The mneomic for the second canonicalized instruction is incorrect, as for some reason it also includes the register. This causes issues when trying to train a model as there ends up being an out of bounds embedding table access, which causes the job to fail.

google / gematria Goto Github PK

gematria's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs