Transilvania JUG



Deserializing arbitrary messages with Google Protcol Buffers

Google Protocol Buffers (shortened protobuf) is data (de)serialization with good speed, size and cross-platform support (it supports Java, C++ and Python out of the box). It can also handle different versions of the same message (as in: old and new code running side-by-side) if its rules are followed. There is one cornercase which it doesn’t handle well: parsing of arbitrary messages (one example where you might need this would be a logging daemon which sits on your network and logs all the messages going by for later debugging). The reason for this is that in protobuf data travels without its metadata (ie. the description of the message format – number of fields, their types, etc) and it supposes that the metadata is present on both ends (it can be different versions of the metadata though).

To parse an arbitrary message you need access to its metadata, there is no way around it. What I would like to present in this post is a simplification of the usual solution (generating a “.desc” file – which involves). The solution is completely runtime and no extra build steps are needed after the initial setup. This is how it works:

We set up a generic “Metadata” structure. This is the only thing which is needed to be shared between different communicating parties:

package tutorial;

option java_package = "com.example.protoctest";
option java_outer_classname = "MetadataProtos";

message Metadata {
  required bytes field_descriptor = 1;
  optional int32 field_descriptor_idx = 2;
  repeated Metadata dependencies = 3;

Next, we populate this structure for the message we want to send. You can look at the actual code for details. The most important aspect is that this needs to be done recursively since protobuf allows for the nesting of types (although not for inheritance).

Now, we can ship over the metadata and data to the client. Probably each communication channel/topic has a limited set of message types it can contain, and as such you won’t need to waste bandwidth with the metadata before or after each message (you can send it once per channel, or maybe every time a new client joins or the message version changes, etc).

At the client you can deserialize the metadata and using it together with DynamicMessage reconstruct the contents of the message. Again, you can see the full details in the code.

Hope this helps somebody facing a similar need! The code can also be used as a reference for how to use the maven-protoc plugin. Finally a word of caution: make sure that your proto compiler (protoc) and the runtime libraries used match as closely as possible in version (for example protc 2.4.0 and protobuf-java 2.4.1 is ok, but 2.2.0 wouldn’t), otherwise you will get some really hard to track down compile time / runtime errors.

2 Responses to Deserializing arbitrary messages with Google Protcol Buffers

  1. Hitesh T says:

    Could not access the example code mentioned (hype-free).

    401: Anonymous caller does not have storage.objects.get access to google-code-archive/v2/

Leave a Reply

Your email address will not be published. Required fields are marked *