Protocol Buffers: Display non-present fields with TextFormat

Recently I started to evaluate Google’s Protocol Buffers library, also known as protobuf, to transfer data between different processes (e.g Java application communicating with a C++ application) and between layers of the same process (e.g C#/C++ interoperability).

Protocol Buffers

In protobuf you define your message interface as a separate text file and invoke the protobuf compiler to generate access, query and serialization code for your chosen destination language. Besides type and field name, you can specify default values for each message field you define. The default is used, when the field is not explicitly provided by the user. The data can be formatted in binary and text. The text format is similar to JSON and quite well suited (IMHO) for debugging or application configuration purposes.

Problem Description

The google::protobuf::TextFormat class skips all non-present fields when formatting a protobuf google::protobuf::Message. For example, given the following proto definition

message Outer {

  message Inner {
    required int64 i = 1 [default = 5];
    optional string str = 2 [default = "hello world"];
  }

  enum Types {
    A = 0;
    B = 1;
    C = 2;
  }

  required Inner inner = 1;
  required float f = 2; // no default
  optional Types t = 3 [default = B];
}

google::protobuf::TextFormat behaves as follows

BOOST_AUTO_TEST_CASE(output_empty)
{
  namespace pb = google::protobuf;

  Outer o;
  o.set_f(1.0f);
  o.mutable_inner()->set_i(5);

  std::string formatted;
  pb::TextFormat::PrintToString(o, &formatted);

  std::cout << formatted << std::endl;
  /*
   inner {
     i: 5
   }
   f: 1
  */
}

Only fields explicitly assigned to have been printed to the formatted string. Wouldn’t it be nice, especially for debugging purposes or generating default configuration files, to print all the fields and use defaults where fields are empty?

A couple of people have been requesting this feature in the past, some provide quite large patches for google::protobuf::TextFormat. Unfortunately, Google will not implement this feature, but rather recommends:

[...] So, what I generally tell people who want to change TextFormat is to write your own class similar to TextFormat that implements your alternative encoding.

Reflection based Solution

I have come up with my own solution to tackle this problem which facilitates the reflection API protobuf provides. By recursively traversing a google::protobuf::Message object, we can apply default values to fields that are currently non-present, thus making them present.

As far as I know, default values can be applied to scalar types (int, float, …), enumerations and strings (this in includes bytes). In case a default value is not explicitly set in the proto file, protobuf’s type-specific default values are used: zero for scalar types, the first enumeration entry for enumerations and empty strings for string types.

Without further ado, here’s the sourcecode for my apply_protobuf_defaults method in C++:


#include <google/protobuf/message.h>
#include <google/protobuf/descriptor.h>

/** Apply default values to fields where fields are non-present
  *
  * \note This utility was designed to allow all message fields to be printed 
  *       in text format even if some of the fields haven't been set. The default 
  *       behaviour of google's TextFormat class is to skip fields without value.
  *       
  * \note The utility will preserve fields that already have values set.
  *
  * \param m message to apply defaults recursively
  * \param apply_no_default apply protocol buffer defaults to fields that 
  *                         have no user specified default value.
  *
  * Christoph Heindl, 2011
  * christoph.heindl@gmail.com
  */
  void apply_protobuf_defaults(::google::protobuf::Message *m, bool apply_no_default)
  {
    namespace pb = ::google::protobuf;

    const pb::Descriptor *d = m->GetDescriptor();
    const pb::Reflection *r = m->GetReflection();

    for (int i = 0; i < d->field_count(); ++i) 
    {
      // Retrieve field info
      const pb::FieldDescriptor *fd = d->field(i);
      const pb::FieldDescriptor::CppType fdt = fd->cpp_type();

      if (pb::FieldDescriptor::CPPTYPE_MESSAGE == fdt) 
      {
        // Recursively apply defaults if a nested message/group
        // is encountered
        apply_protobuf_defaults(r->MutableMessage(m, fd), apply_no_default);
      } 
      else if ((apply_no_default || fd->has_default_value()) && !r->HasField(*m, fd)) 
      {
        // Field has default value and has not yet been set -> apply default
        switch(fdt) 
        {
        case pb::FieldDescriptor::CPPTYPE_INT32 :
          r->SetInt32(m, fd, fd->default_value_int32());
          break;
        case pb::FieldDescriptor::CPPTYPE_INT64 :
          r->SetInt64(m, fd, fd->default_value_int64());
          break;
        case pb::FieldDescriptor::CPPTYPE_UINT32 :
          r->SetUInt32(m, fd, fd->default_value_uint32());
          break;
        case pb::FieldDescriptor::CPPTYPE_UINT64 :
          r->SetUInt64(m, fd, fd->default_value_uint64());
          break;
        case pb::FieldDescriptor::CPPTYPE_DOUBLE :
          r->SetDouble(m, fd, fd->default_value_double());
          break;
        case pb::FieldDescriptor::CPPTYPE_FLOAT :
          r->SetFloat(m, fd, fd->default_value_float());
          break;
        case pb::FieldDescriptor::CPPTYPE_BOOL :
          r->SetBool(m, fd, fd->default_value_bool());
          break;
        case pb::FieldDescriptor::CPPTYPE_ENUM :
          r->SetEnum(m, fd, fd ->default_value_enum());
          break;
        case pb::FieldDescriptor::CPPTYPE_STRING :
          r->SetString(m, fd, fd->default_value_string());
          break;
        } // switch
      } // if
    } // for
  } // apply_protobuf_defaults

Using the proto definition from above, the following snippet now yields

BOOST_AUTO_TEST_CASE(output_applied)
{
BOOST_AUTO_TEST_CASE(output_empty)
{
  namespace pb = google::protobuf;

  Outer o;
  o.set_f(1.0f);
  o.mutable_inner()->set_i(5);

  // Apply defaults to non-present fields
  apply_protobuf_defaults(&o, true);

  std::string formatted;
  pb::TextFormat::PrintToString(o, &formatted);

  std::cout << formatted << std::endl;
  /*
  inner {
    i: 5
    str: "hello world"
  }
  f: 1
  t: B
  */
}
}

Defaults are only applied to those fields that are non-present. The pros and cons of my approach are listed below (might be incomplete).

Pros

  • No need to implement a custom format.
  • No patching of existing code.

Cons

  • Needs to implemented for each supported language
  • Changes object state (non-present fields become present)

In case anyone is willing to provide a translation of the above method to other languages (python, java), I’d be happy to integrate them into the post.

About these ads

2 thoughts on “Protocol Buffers: Display non-present fields with TextFormat

  1. Your code throws an error for repeated fields
    (e.g. something like this in the .proto file “repeated int32 just_another_array = 10;”).
    There should be at least a check for fd->is_repeated() before HasField function is called.
    And also repeated fields doesn’t have a default value.

    Despite this thanks for the help.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s