❓ Questions and Help Hi <a class="user-mention notranslate" data-

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

What we want is to extract the Metadata info.I got the <code class="notra

Metadata difference between original LLVM file and produced Graph about programl HOT 5 CLOSED

cheukwaylee commented on June 11, 2024

Metadata difference between original LLVM file and produced Graph

from programl.

Comments (5)

cheukwaylee commented on June 11, 2024

An example to make it more clear: the corresponding part in .ll and .gexf file respectively, but the bitwidth is different

in the original LLVM

  %empty = alloca i32, !bitwidth !257
  %k = alloca i7, !bitwidth !189
  %j = alloca i6, !bitwidth !190

in the produced graph

      <node id="1216" label="1216">
        <attvalues>
          <attvalue for="0" value="4" />
          <attvalue for="4" value="{'full_text': ['%empty = alloca i32, !bitwidth !189']}" />
          <attvalue for="1" value="3" />
          <attvalue for="2" value="alloca" />
          <attvalue for="3" value="0" />
        </attvalues>
      </node>
      <node id="1217" label="1217">
        <attvalues>
          <attvalue for="0" value="4" />
          <attvalue for="4" value="{'full_text': ['%k = alloca i7, !bitwidth !190']}" />
          <attvalue for="1" value="3" />
          <attvalue for="2" value="alloca" />
          <attvalue for="3" value="0" />
        </attvalues>
      </node>
      <node id="1218" label="1218">
        <attvalues>
          <attvalue for="0" value="4" />
          <attvalue for="4" value="{'full_text': ['%j = alloca i6, !bitwidth !191']}" />
          <attvalue for="1" value="3" />
          <attvalue for="2" value="alloca" />
          <attvalue for="3" value="0" />
        </attvalues>
      </node>

from programl.

ChrisCummins commented on June 11, 2024

Hi @cheukwaylee, sorry for the slow reply.

That's an interesting bug. I believe the bitwidth values (!257, !189, etc) are references to metadata strings. The numbers themselves are only indexes into a list of strings. My guess is that when ProGraML serializes the text for the instructions, the indices change, causing the values to be lost. You may want to consider directly embedding the value for those bitwidths in the full_text, rather than the index. Here's a couple of code pointers:

This is where nodes are constructed from functions, including setting full_text:

ProGraML/programl/ir/llvm/internal/program_graph_builder.cc

Lines 300 to 322 in cd7d293

 Node* ProgramGraphBuilder::AddLlvmInstruction(const ::llvm::Instruction* instruction, 

 const Function* function) { 

 const LlvmTextComponents text = textEncoder_.Encode(instruction); 

 Node* node = AddInstruction(text.opcode_name, function); 

 node->set_block(blockCount_); 

 graph::AddScalarFeature(node, "full_text", text.text); 

 #if PROGRAML_LLVM_VERSION_MAJOR > 3 

 // Add profiling information features, if available. 

 uint64_t profTotalWeight; 

 if (instruction->extractProfTotalWeight(profTotalWeight)) { 

 graph::AddScalarFeature(node, "llvm_profile_total_weight", profTotalWeight); 

 } 

 uint64_t profTrueWeight; 

 uint64_t profFalseWeight; 

 if (instruction->extractProfMetadata(profTrueWeight, profFalseWeight)) { 

 graph::AddScalarFeature(node, "llvm_profile_true_weight", profTrueWeight); 

 graph::AddScalarFeature(node, "llvm_profile_false_weight", profFalseWeight); 

 } 

 #endif 

 return node; 

 }

This is the logic for serializing an instruction into the full_text representation:

ProGraML/programl/ir/llvm/internal/text_encoder.cc

Lines 99 to 129 in cd7d293

 LlvmTextComponents TextEncoder::Encode(const ::llvm::Instruction* instruction) { 

 // Return from cache if available. 

 auto it = instruction_cache_.find(instruction); 

 if (it != instruction_cache_.end()) { 

 return it->second; 

 } 

 LlvmTextComponents encoded; 

 encoded.text = PrintToString(*instruction); 

 encoded.opcode_name = instruction->getOpcodeName(instruction->getOpcode()); 

 const size_t snipAt = encoded.text.find(" = "); 

 // An instruction without a LHS. 

 if (snipAt == string::npos) { 

 instruction_cache_.insert({instruction, encoded}); 

 return encoded; 

 } 

 encoded.lhs_type = PrintToString(*instruction->getType()); 

 encoded.lhs_identifier = encoded.text.substr(0, snipAt); 

 std::stringstream instructionName; 

 instructionName << encoded.lhs_type << ' ' << encoded.lhs_identifier; 

 encoded.lhs = instructionName.str(); 

 encoded.rhs = encoded.text.substr(snipAt + 3); 

 instruction_cache_.insert({instruction, encoded}); 

 return encoded; 

 }

Let me know if that helps.

Cheers,
Chris

from programl.

cheukwaylee commented on June 11, 2024

Hi @ChrisCummins, many thanks for your kind feedback, it helps me a lot!! At least for now I can do some modifications at ProGraML and re-compile it.

You remind me that something like !257 is nothing but a reference to the Metadata node. And we suppose the !bitwidth !257 describes how complex the llvm instruction is, so it should be considered as node's feature.

What we want is to extract the Metadata info.I got the MDNode by using the following function with filter argument "bitwidth". But unfortunately, I don't know how to convert it to a simple string and then pass it to ProGraML (as nodes' attribute).

::llvm::MDNode* MDN = instruction->getMetadata("bitwidth");

After referring to the blog http://blog.llvm.org/2010/04/extensible-metadata-in-llvm-ir.html, I know that MDNode is a tuple that can reference arbitrary LLVM IR values in the program as well as other metadata, so it should be extracted recursively. Each MDNode can have many operands. The draft code as follows, but it doesn't work :(

int numOperand = MDN->getNumOperands();
for (size_t i = 0; i < numOperand; i++) {
  const ::llvm::MDOperand& MDOpRef = MDN->getOperand(i);
  ::llvm::MDString* MDStr = ::llvm::cast<::llvm::MDString>(MDOpRef);
  ::llvm::StringRef StrRef = MDStr->getString();

  encoded.bitwidthAttr += StrRef.str(); // add to ProGraML
}

Moreover, I found a repo https://github.com/madhur13490/LLVM-Metadata-Visualizer for visualizing the Metadata as an undirected graph to show its internal reference relationship. But this repo seems to be implemented through text analysis. I believe it would be better if it can be extracted through LLVM lib interface.

Once I can figure out the reference relationship of every Metadata, and then it can be converted to a string and pass it to the ProGraML and eventually as the node's feature.

Any suggestion will be highly appreciated!!!

from programl.

ChrisCummins commented on June 11, 2024

What we want is to extract the Metadata info.I got the MDNode by using the following function with filter argument "bitwidth". But unfortunately, I don't know how to convert it to a simple string and then pass it to ProGraML (as nodes' attribute).

That sounds like a good plan. I can't help you with the MDNode to string conversion (may be worth asking on the LLVM discourse), but once you have the string, you could add to the ProGraML graph as a node feature:

Node* ProgramGraphBuilder::AddLlvmInstruction(const ::llvm::Instruction* instruction,
                                              const Function* function) {
  Node* node = AddInstruction(text.opcode_name, function);
  // ...
  const std::string bitwidthAsString = /* implement this bit */
  graph::AddScalarFeature(node, "bitwidth", bitwidthAsString);
  // ...
}

Cheers,
Chris

from programl.

cheukwaylee commented on June 11, 2024

THANKS for your kind answer! I will have a look at LLVM things.

Merry Christmas and happy New Year!
Zhuowei

from programl.

Metadata difference between original LLVM file and produced Graph about programl HOT 5 CLOSED

Comments (5)

in the original LLVM

in the produced graph

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs

	Node* ProgramGraphBuilder::AddLlvmInstruction(const ::llvm::Instruction* instruction,
	const Function* function) {
	const LlvmTextComponents text = textEncoder_.Encode(instruction);
	Node* node = AddInstruction(text.opcode_name, function);
	node->set_block(blockCount_);
	graph::AddScalarFeature(node, "full_text", text.text);

	#if PROGRAML_LLVM_VERSION_MAJOR > 3
	// Add profiling information features, if available.
	uint64_t profTotalWeight;
	if (instruction->extractProfTotalWeight(profTotalWeight)) {
	graph::AddScalarFeature(node, "llvm_profile_total_weight", profTotalWeight);
	}
	uint64_t profTrueWeight;
	uint64_t profFalseWeight;
	if (instruction->extractProfMetadata(profTrueWeight, profFalseWeight)) {
	graph::AddScalarFeature(node, "llvm_profile_true_weight", profTrueWeight);
	graph::AddScalarFeature(node, "llvm_profile_false_weight", profFalseWeight);
	}
	#endif

	return node;
	}

	LlvmTextComponents TextEncoder::Encode(const ::llvm::Instruction* instruction) {
	// Return from cache if available.
	auto it = instruction_cache_.find(instruction);
	if (it != instruction_cache_.end()) {
	return it->second;
	}

	LlvmTextComponents encoded;
	encoded.text = PrintToString(*instruction);
	encoded.opcode_name = instruction->getOpcodeName(instruction->getOpcode());

	const size_t snipAt = encoded.text.find(" = ");

	// An instruction without a LHS.
	if (snipAt == string::npos) {
	instruction_cache_.insert({instruction, encoded});
	return encoded;
	}

	encoded.lhs_type = PrintToString(*instruction->getType());
	encoded.lhs_identifier = encoded.text.substr(0, snipAt);

	std::stringstream instructionName;
	instructionName << encoded.lhs_type << ' ' << encoded.lhs_identifier;
	encoded.lhs = instructionName.str();

	encoded.rhs = encoded.text.substr(snipAt + 3);

	instruction_cache_.insert({instruction, encoded});
	return encoded;
	}