Since the first thoughts about mapping EMF Ecore to Google Protocol Buffers some time has elapsed and I would like to share my results achieved so far.

I started by implementing the “ProtoBuf message per concrete class” mapping. Soon the first problems appeared.

Problem #1 What if the base class is not abstract?

Continuing the example from the last post this means the Order class is not abstract. This does not fit nicely in the described mapping approach, because Order should only be a container for either a BookOrder or a CdOrder. The solution I came up with was to introduce a separate OrderRef Protocol Buffer message having the same structure as the previous Order message. As a result, the base class Order does not have to be astract.

The Protocol Buffer mapping would look like:

The pattern is to create a message for every non-abstract class and a *Ref message for every class, which is used as reference type somewhere. The *Ref message acts as a container having a field for every non-abstract class in the hierarchy.

The “ProtoBuf message references” described in the last post were easily integrated by adding the _internal_id field to every *Ref message. In the case of non-containment references just this field is set to the id of the referenced object.

Problem #2 What if the model is defined in more than one Ecore package?

The mapping described so far worked fine as long as the model was defined in only one Ecore package. To illustrate the problem we extend the example a bit. Suppose we have a second Ecore package extorders extending the existing package orders. The package extorders defines a Customer class having multiple Orders and a class DvdOrder being a subclass of Order.

The Protocol Buffer definition based on the mapping strategy shown so far would look like (only important fields are shown):

The problematic lines are highlighted above. The mapping leads to a cyclic dependency, which does not exist in the Ecore representation and, even worse, is not allowed by Protocol Buffers  at all. Limiting the implementation to models only consisting of one Ecore package was no option. Therefor I had to develop a different mapping strategy.

The solution I developed is to use Protocol Buffer’s extension feature. It allows to define a range of field numbers in a message as placeholders. These extension slots can be filled with fields defined somewhere else. Now, the basic idea is to let the *Ref messages define extensions from 1 to the maximum number of fields. Every message corresponding to one class in the class hierarchy defines an extension for its Ref message and the Ref messages of its super classes. Thus, the example model is mapped the following way:

This way there is no more cyclic dependency and the above Protocol Buffer definition is valid. The extension fields have names like order_bookOrder or order_dvdOrder following the pattern baseClass_class, because the extension field names have to be unique and in the case where there is more than one base class name collisions would occur. To support non-containment references the message contained in a field of the Ref class has only its _internal_id field set.

Problem #3 What if there are cyclic dependencies between Ecore packages?

Dependencies between Ecore packages are created by using the classes and data types defined in one package in another one. If two packages use each other’s classes, a dependency cycle is created. This can be easily done between a package and its subpackages without any Ecore tool complaining. For Ecore packages defined in two separate files it is only possible, if one generator model is used for both packages.

As this is so easy for packages and their subpackages and also quite common, as my mentor told me, I had to develop a solution for this. My idea for this is to flatten the package hierarchy and put all packages and subpackages into one Protocol Buffer package and hope there are no name collisions. Maybe I will prevent name collisions by including the subpackage name in the message name.

For the rare case of cyclic dependencies between two or more Ecore packages, defined in separate files, an exception is thrown.

The mapping described in this post is implemented by now except the handling of some edge cases. When I am done with this, I will run some performance tests to find bottlenecks and compare the implementation to the existing XmiResourceImpl and BinaryResourceImpl.