Follow-up to <a class="issue-link js-issue-link" data-error-text="Failed to load title

Thinking about this more, I had an idea--generate the FileDeorSet at build time,

Generate FileDescriptorProto accessors about quickbuffers HOT 18 CLOSED

PeterJohnson commented on June 22, 2024

Generate FileDescriptorProto accessors

from quickbuffers.

Comments (18)

ennerf commented on June 22, 2024 1

Sorry, I forgot to reply earlier.

I was thinking of generating FileDescriptorProto data in the parent wrapper file. I think the bytes for the nested message descriptors are all contained inside, so I think it should be possible to access the message-descriptors by offset and length. The messages then only need to generate a small wrapper and a String for the full identifier.

The FileDescriptor could potentially also provide a List or Map of all identifiers. Creating a reduced version of the Google Descriptors w/ storing unknown fields should work as well.

from quickbuffers.

ennerf commented on June 22, 2024

The combination does get a bit messy from an API perspective. Could you generate a descriptor file with protoc --descriptor_set_out=.desc and load it as a resource at runtime?

protoc --include_imports --descriptor_set_out=<filename>.desc <filename>.proto <filename>.proto <filename>.proto

from quickbuffers.

PeterJohnson commented on June 22, 2024

To give a bit more detail... my use case is the following:

We provide a library that provides classes and protobuf serialization for those classes, and hooks for serializing protobuf-serializable classes to the network (or log files). We also provide applications on the other side of the network/log files to view what's been sent.
Third parties provide libraries that provide their own classes and protobufs for those classes that can depend on protobufs defined in our library
End users create applications and may or may not create their own protobufs (most do not). We want them to be able to transparently send either our classes or the third party classes over the network. We need the end user API to be extremely easy to use and somewhat dynamic in that the end user doesn't need to specify somewhere globally in their code what classes they're going to send, they just send it and the base library under the hood handles getting the descriptor published.
In the network case, multiple applications built and distributed by different parties (e.g. both third parties and end users) can connect and send data to each other. There is no guarantee of a common complete set of protobufs across these applications as they're built at different times by different people. There will be overlap (common individual protobufs from the base library or third party libraries) but the complete set will not be in common across all applications on the network. The network is pub/sub; individual topics have a type string and can be published/subscribed to by any application that knows how to talk to that type string.

So the problem is how to publish the descriptor data from the applications over the network (using the API provided by the base library) such that it is dynamically introspectable by tools.

We could have the build system for each application generate a file descriptor set for all of the descriptors for all of the libraries it uses (and anything the application itself has) and then publish that to the network to a unique location per application. It can't only publish the descriptors it actually uses because there's no way to get that information at build time or at runtime (with QB). This will result in duplication (as application 1 and application 2 will both publish file descriptor sets that contain base library descriptors, for example), and I'm not sure exactly how big the complete descriptor set is going to get (and this size is of course multiplied by the number of applications running). Tools will need to maintain separate descriptor databases for each application and figure out which descriptor to use from which database for a given type string, but that's relatively easy to do.

If each generated file provides access to its filename, file descriptor proto, and dependencies, we can walk this tree at runtime to either individually publish file descriptors or build/publish a file descriptor set (although that's maybe not possible given just a byte[] for a file descriptor)--this time with only the file descriptors that are actually used at runtime by that particular application. My original thought was to do common publishing of file descriptors (uniquely indexed only by file name), but the downside of this is different applications could publish different versions of the file and conflict with each other in a way that's not discoverable by tools (for debugging purposes--the tools don't actually know which application published a particular typed value, so have to pick one), so it may be better to also make this publish unique per application (effectively publishing a per-application file descriptor set), so the main thing that's being gained with this approach is the avoidance of duplication of all the (mostly unused--most applications will only use a tiny subset) library file descriptors. It also makes it substantially less likely that conflicts will arise between the file descriptors, because only the files actually being used by an application are getting published by that application.

from quickbuffers.

PeterJohnson commented on June 22, 2024

Thinking about this more, I had an idea--generate the FileDescriptorSet at build time, load it at runtime, and also manipulate it at runtime by extracting only the FileDescriptorProto's we care about. To avoid pulling in the entire google upstream descriptors, I could have "lightweight" versions of those protobufs (e.g. my own version with only some of the fields defined), and I think as long as store_unknown_fields=true, I can use QB to parse the "lightweight" FileDescriptorSet/FileDescriptorProto and output them intact either individually or as a new FileDescriptorSet? I still need to think through how the generation process is going to work in the multiple-library-and-user-builds scenario, and whether for that reason it might still be beneficial to embed in the generated code instead.

from quickbuffers.

ennerf commented on June 22, 2024

Fyi, I just finished a large project I was working on. I need to do a few high-priority smaller items, but I should hopefully be able to get to this early next week.

from quickbuffers.

PeterJohnson commented on June 22, 2024

Any update on this?

from quickbuffers.

ennerf commented on June 22, 2024

Sorry, it got delayed a lot and I'm still a bit confused about the requirements.

From what I saw in the protobuf-java API I think you are looking for equivalents for the two methods:

String fullName = MyProtoType.getDescriptor().getFullName();
byte[]  fileDescriptor = MyProtoType.getDescriptor().getFile().toProto().toByteArray();

but I wonder how that works with protos that are defined in other files? I saw a MyProtoType.getDescriptor().getFile().getDependencies(), but how would those be serialized? Is there some top-level descriptor message that includes everything? Can you provide a code snippet that shows how you would write the descriptor with the official API?

from quickbuffers.

PeterJohnson commented on June 22, 2024

What I'm currently doing in C++ is the following. Note I'm not actually publishing a FileDescriptorSet as such, I'm instead publishing (via the callback function fn) the FileDescriptor of each file for the whole dependency tree of file descriptors, starting from the proto's file descriptor.

static void ForEachProtobufDescriptorImpl(
    const FileDescriptor* desc,
    function_ref<bool(std::string_view typeString)> wants,
    function_ref<void(std::string_view typeString,
                      std::span<const uint8_t> schema)>
        fn,
    Arena* arena) {
  if (!wants(desc->name())) {
    return;
  }
  for (int i = 0, ndep = desc->dependency_count(); i < ndep; ++i) {
    ForEachProtobufDescriptorImpl(desc->dependency(i), wants, fn, arena);
  }
  FileDescriptorProto* descproto = Arena::CreateMessage<FileDescriptorProto>(arena);
  descproto->Clear();
  desc->CopyTo(descproto);
  std::vector<uint8_t> buf;
  detail::SerializeProtobuf(buf, *descproto);
  delete descproto;
  fn(fmt::format("proto:{}", desc->name()), buf);
}

void detail::ForEachProtobufSchema(
    const google::protobuf::Message& msg,
    function_ref<bool(std::string_view filename)> wants,
    function_ref<void(std::string_view filename,
                      std::span<const uint8_t> descriptor)>
        fn) {
  ForEachProtobufDescriptorImpl(msg.GetDescriptor()->file(), wants, fn,
                                msg.GetArena());
}

from quickbuffers.

ennerf commented on June 22, 2024

Thanks. Would an API like below (reduced from protobuf-java) work?

class SomeGeneratedMessage extends ProtoMessage {
  public static Descriptor getDescriptor();
} 

interface Descriptor {
  FileDescriptor getFile();
  String getName();
  String getFullName();
  byte[] toProtoBytes();
}

interface FileDescriptor {
  String getName();
  String getFullName();
  String getPackage();
  byte[] toProtoBytes();
  List<FileDescriptor> getDependencies();
}

from quickbuffers.

PeterJohnson commented on June 22, 2024

That looks great!

from quickbuffers.

ennerf commented on June 22, 2024

I implemented an initial version of the API above. The generated code currently looks like this: https://gist.github.com/ennerf/222e68f6b6ac5fb2600c58ec35804457

with each message generating a

public static class MessageSetCorrectExtension2 {
    // ...
    public static Descriptors.Descriptor getDescriptor() {
        return AllTypesOuterClass.internal_static_quickbuf_unittest_TestAllTypes_MessageSetCorrectExtension2_descriptor;
    }
    // ...
}

from quickbuffers.

ennerf commented on June 22, 2024

I also added FileDescriptor::getAllContainedTypes and FileDescriptor::getAllKnownTypes so it's easier to create a lookup table of all types and their dependencies. Please double-check PR #57.

I removed the gpg requirement, so you should be able to run mvn clean install for local testing.

from quickbuffers.

PeterJohnson commented on June 22, 2024

Thanks! I'll try it out this weekend.

from quickbuffers.

PeterJohnson commented on June 22, 2024

~~So mvn clean install builds the Java artifacts, but how do I build the matching protoc-gen-quickbuf?~~

Nevermind, figured it out.

from quickbuffers.

ennerf commented on June 22, 2024

Sorry, I just got back from a conference. Do you have any more suggestions or is the PR state good as is?

from quickbuffers.

PeterJohnson commented on June 22, 2024

It’s good as is for what I need. Thanks!

from quickbuffers.

ennerf commented on June 22, 2024

Thanks for verifying. I'll get a release out soon.

from quickbuffers.

ennerf commented on June 22, 2024

version 1.3.2 is on maven central

from quickbuffers.

Generate FileDescriptorProto accessors about quickbuffers HOT 18 CLOSED

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs