DynaMix
1.3.7
A new take on polymorphism in C++
|
The performance of messages is indeed slower than regular function calls and even virtual function calls. Even though a call's algorithmic complexity is still O(1), a number of memory indirections happen in order to call the actual method behind the message.
Unfortunately it's hard to estimate exactly how much slower the message call is. With perfect cache locality and compiler optimizations a message call will take several nanoseconds to execute on a modern CPU. It would be a bit slower than a virtual vall and about as fast as a std::function
call.
Unfortunately perfect cache locality is hard to come by, and especially hard in polymorphic code. So, generally speaking if the programmer doesn't take special care to achieve cache locality for their object lists, a message call is about 2-3 times slower than a virtual method agaub about as fast a std::function
call, which again can be called negligible.
As, we've mentioned before if cache locality is an absolutely critical feature for the desired performance, mixin messages, virtual calls, std::function
, and other types of polymorphism will almost certainly be detrimental for this code, and the programmers are advised to do something else.
Here is the assembly generated by gcc 7.1 for int(int)
message call:
The message performance test compares virtual method calls vs std::function
calls vs DynaMix message calls.
For all tests a number of random objects of one of two possible types are generated. The noop
test calls a function with no arguments and no body, measuring the time only. The setter
test sets a member variable via a virtual setter function.
The multicast tests create random objects of one of three possible types, which all call three member polymorphic functions. Once again there's a noop
test with empty functions with no arguments and setter
test with virtual setter functions which set an integer.
In all example test the std::function
implementation is chosen as a baseline and the baseline column is the performance compared to it.
Here are some sample test results:
-O3
Name (baseline is *) | Objects | Total ms | ns/op | Baseline | Ops/second |
---|---|---|---|---|---|
virtual_noop | 200000 | 1.288 | 6 | 0.256 | 155225270.7 |
std_func_noop * | 200000 | 5.035 | 25 | - | 39725820.3 |
msg_noop | 200000 | 3.118 | 15 | 0.619 | 64151747.1 |
virtual_setter | 200000 | 1.540 | 7 | 0.254 | 129892565.9 |
std_func_setter * | 200000 | 6.070 | 30 | - | 32951002.8 |
msg_setter | 200000 | 4.509 | 22 | 0.743 | 44354926.3 |
(multi) virtual_noop | 200000 | 5.326 | 26 | 0.282 | 37549680.6 |
(multi) std_func_noop * | 200000 | 18.881 | 94 | - | 10592822.0 |
(multi) msg_noop | 200000 | 7.760 | 38 | 0.411 | 25774461.3 |
(multi) virtual_setter | 200000 | 6.192 | 30 | 0.279 | 32298656.6 |
(multi) std_func_setter * | 200000 | 22.219 | 111 | - | 9001301.1 |
(multi) msg_setter | 200000 | 8.877 | 44 | 0.400 | 22529979.2 |
/Od /EHsc /RTC1 /MDd /ZI
)/O2 /Ob2 /MD
Name (baseline is *) | Objects | Total ms | ns/op | Baseline | Ops/second |
---|---|---|---|---|---|
virtual_noop | 200000 | 0.930 | 4 | 0.204 | 215059082.1 |
std_func_noop * | 200000 | 4.549 | 22 | - | 43965397.5 |
msg_noop | 200000 | 2.841 | 14 | 0.625 | 70391577.8 |
virtual_setter | 200000 | 1.212 | 6 | 0.146 | 165015820.9 |
std_func_setter * | 200000 | 8.302 | 41 | - | 24089393.8 |
msg_setter | 200000 | 4.561 | 22 | 0.549 | 43853542.3 |
(multi) virtual_noop | 200000 | 4.115 | 20 | 0.331 | 48604622.1 |
(multi) std_func_noop * | 200000 | 12.443 | 62 | - | 16073708.9 |
(multi) msg_noop | 200000 | 5.343 | 26 | 0.429 | 37429695.3 |
(multi) virtual_setter | 200000 | 5.581 | 27 | 0.306 | 35837457.8 |
(multi) std_func_setter * | 200000 | 18.236 | 91 | - | 10967410.0 |
(multi) msg_setter | 200000 | 9.913 | 49 | 0.544 | 20175710.3 |
The stand-alone functions generated for messages typically have an if
statement in them. It's there so as to throw an exception if none of the mixins in an object implements the message. If you disable the library's exceptions those if
-s will be converted to assert
-s (which in non-debug compilations are simply ignored).
If you don't want to recompile the library with exceptions disabled, or if you just want all other exceptions, but not these, you can disable the throwing of exceptions from the message functions if you define DYNAMIX_NO_MSG_THROW
before including the DynaMix headers.
Note that if you disable the exceptions from the message functions, calling a message on an object that doesn't implement it, will certainly lead to undefined behavior and crashes.
Also have in mind, that removing the 'if'-s will improve the performance by only a small amount of nanoseconds per message call on a modern CPU. Situations where such a thing could be significant should be very very rare.
An object mutation can be a relatively slow operation.
Every mutation will invoke all mutation rules registered within the system. Their speed may vary and will depend on whether they end up changing the mutation or not. If they do change it, some allocation may take place. Even if they don't, each of them will be invoked by a virtual function and will have at least one 'if' check (possibly more, depending on the mutation rule).
If the mutation ends up creating a new type – a mixin combination that hasn't yet been met – this will also lead to the relatively slow process of initializing the internal data structures for that type. This will lead to some allocations and loops that generate the type's call table.
Even if the mutation doesn't generate a new type, it will have to find the existing one, which is a hash table lookup with key a bitset of size DYNAMIX_MAX_MIXINS
.
Finally, the mutation will change the object (unless it happens to be an identity mutation – adding and removing nothing). To change it it will have to allocate new mixin data for it – an array of pointers to mixins, then deallocate any mixins being removed and allocate any mixins being added, and finally deallocate the old mixin data for the object.
Using dynamix::object_type_template
or dynamix::same_type_mutator
will perform the first steps – the ones concerning the identification of the object's type – only once.
To reduce the allocations for the individual object's being mutated, you can add custom allocators to some of the mixins or to the entire domain.
The mutation performance test test creation of new objects and the mutation of existing ones.
dynamix::mutate
.dynamix::object_type_template
.dynamix::mutate
. Adding two mixins and removing one.dynamix::same_type_mutator
.dynamix::same_type_mutator
and a fast custom allocator.Here are some sample test results:
-O3
Name (baseline is *) | Objects | Total ms | ns/op | Baseline | Ops/second |
---|---|---|---|---|---|
create_mutate * | 10000 | 9.810 | 981 | - | 1019320.5 |
type_template | 10000 | 3.728 | 372 | 0.380 | 2682686.2 |
type_template_alloc | 10000 | 2.163 | 216 | 0.220 | 4623247.0 |
mutation * | 10000 | 8.311 | 831 | - | 1203170.2 |
same_type_mutator | 10000 | 2.944 | 294 | 0.354 | 3397164.9 |
same_type_mutator_alloc | 10000 | 1.336 | 133 | 0.161 | 7486296.3 |
/Od /EHsc /RTC1 /MDd /ZI
)/O2 /Ob2 /MD
Name (baseline is *) | Objects | Total ms | ns/op | Baseline | Ops/second |
---|---|---|---|---|---|
create_mutate * | 10000 | 16.103 | 1610 | - | 620993.2 |
type_template | 10000 | 5.468 | 546 | 0.340 | 1828722.2 |
type_template_alloc | 10000 | 2.749 | 274 | 0.171 | 3637850.5 |
mutation * | 10000 | 10.333 | 1033 | - | 967747.5 |
same_type_mutator | 10000 | 3.648 | 364 | 0.353 | 2741181.5 |
same_type_mutator_alloc | 10000 | 1.688 | 168 | 0.163 | 5923647.7 |
Some functionalities of the library are purposefully not thread-safe so as to avoid the synchronization overhead for users which don't need it. For those it is possible to implement thread-safety in the client code.
has
, get
, or implements
) is thread-safe.(For the complete, working source of this example see allocators.cpp)
DynaMix allows you to set custom allocators for the persistent pieces of memory the library may require.
The library allocates some memory on initialization, which happens at a global scope – before the entry point of a program. It also has some allocations which are for instances with a very short lifetime. Currently those are not covered by the allocators.
What you can control with the custom allocators is the new memory allocated for dynamix::object
instances - their internal mixin data. You can assign a global allocator to the library and you can also set individual allocators per mixin type.
First let's see how you can create a global allocator. Let's assume you have a couple of functions of your own that allocate and deallocate memory in some way specific to your needs:
To create a global allocator, you need to create a class derived from domain_allocator
and override its virtual methods.
The first two methods allocate a buffer for the mixin data pointers. Every object has pointers to its mixins within it. This is the array of such pointers. The class domain_allocator
has a static constant member – mixin_data_size
– which you should use to see the size of a single element in that array.
The methods also have arguments referring to the object for which the mixin data is being allocated. We won't be using it in this simple example.
The other two methods you need to overload allocate and deallocate the memory for an actual mixin class instance. As you may have already read, the buffer allocated for a mixin instance is bigger than needed because the library stores a pointer to the owning object immediately before the memory used by the mixin instance.
That's why this function is not as simple as the one for the mixin data array. It has to conform to the mixin (and also object
pointer) alignment.
The users are strongly advised to use the static method domain_allocator::mem_size_for_mixin
. It will appropriately calculate how much memory is needed for the mixin instance such that there is enough room at the beginning for the pointer to the owning object and the memory alignment is respected.
After you allocate the buffer you should take care of the other return value - the mixin offset. It calculates the offset of the actual mixin instance memory within the buffer, such that there is room for the owning object pointer in before it and all alignments are respected.
You are encouraged to use the static method domain_allocator::mixin_offset
for this purpose.
The mixin instance deallocation method can be trivial
To use the custom global allocator you need to instantiate it and then set it with set_global_allocator
. Unlike the mutation rules, the responsibility for the allocator instance is yours. You need to make sure that the lifetime of the instance is at least as long as the lifetime of all objects in the system.
Unfortunately this means that if you have global or static objects, you need to create a new pointer that is, in a way, a memory leak. If you do not have global or static objects, it should be safe for it to just be a local variable in your program's entry point function.
As we mentioned before, you can have an allocator specific for a mixin type.
A common case for such use is to have a per-frame allocator – one that has a preallocated buffer which is used much like a stack, with its pointer reset at the end of each simulation frame (or at the beginning each new one). Let's create such an allocator.
First, a mixin instance allocator is not necessarily bound to a concrete mixin type. You can have the same instance of such an allocator set for many mixins (which would be a common use of a per-frame allocator), but for our example let's create one that is bound to an instance. We will make it a template class because the code for each mixin type will be the same.
A mixin instance allocator needs to be derived from the class mixin_allocator
. You then need to overload its two virtual methods which are exactly the same as the mixin instance allocation/deallocation methods in domain_allocator
.
Now this class can be set as a mixin allocator for a given mixin type. A side effect of the fact that it's bound to the type is that it keeps mixin instances in a continuous buffer. With some changes (to take care of potential holes in the buffer) such an allocator can be used by a subsystem that works through mixins relying on them being in a continuous buffer to avoid cache misses.
To illustrate a usage for our mixin allocator, let's imagine we have a game. If a character in our game dies, it will be destroyed at the end of the current frame and should stop responding to any messages. We can create a mixin called dead_character
which implements all those the messages with a higher priority than the rest of the mixins. Since every object that has a dead_character
mixin will be destroyed by the end of the frame, it will be safe to use the per-frame allocator for it.
First let's create the mixin class and sample messages:
Now we define the mixin so that it uses the allocator, we just need to add it with "`&`" to the mixin feature list, just like we add messages. There are two ways to do so. The first one would be to do it like this:
DYNAMIX_DEFINE_MIXIN(dead_character, ... & dynamix::priority(1, die_msg) & dynamix::allocator<per_frame_allocator<dead_character>>());
This will create the instance of the allocator internally and we won't be able to get it. Since in our case we do care about the instance because we want to call its reset
method, we could use an alternative way, by just adding an actual instance of the allocator to the feature list:
If we share a mixin instance allocator between multiple mixins, the second way is also the way to go.
Now all mixin allocations and deallocations will pass through our mixin allocator:
Currently the maximum number of arguments you can have in a message is:
6
There simply is no message declaration macro for messages with more. If you need macros for messages with more arguments, you can do so, without having to rebuild the library.
In your DynaMix installation, in the gen
directory you will see a file, named arity
. It is a text file with a single number in it. Edit the file, setting the number to whichever value you need. Then run the script gen_message_macros.rb
(you will need a Ruby interpreter to do so). It will generate the file include/dynamix/gen/message_macros.hpp
with macros for messages with 0 to arity arguments.
If you use an include directory for DynaMix diferent from the one in your installation, you will have to manually copy the newly generated message macros file over the one you use.
As long as the library itself is shared (.dll
on Windows or .so
on Unix or Linux) it's safe to use in an application that has shared libraries which use DynaMix.
An interesting thing which you can accomplish with the library is to have optional plugins – dynamic libraries that aren't linked with the executable but may or may not be present, and if they are, they are being loaded dynamically (with LoadLibrary
or dlopen
).
Such plugin may add a mutation rule for its special mixins, or export functions that mutate objects.
For example this may be very useful for an engineering CAD system that could potentially have many different optional plugins for its different needs. Say, a plugin that extends the buildings with electrical wiring could simply mutate objects, adding a mixin mixin called electrical_wiring
that contains the appropriate functionality.
There is only one thing you need to remember when you're exporting mixins or messages from a dynamic library: to use the export macros: DYNAMIX_DECLARE_EXPORTED_MIXIN
and DYNAMIX_EXPORTED_xxx_MESSAGE_N
. They are exactly like their regular counterparts but for their first argument, which is the compiler specific export symbol (__declspec(dllexport)
for Visual C++ or, say, BOOST_SYMBOL_EXPORT
if you're using Boost).
One of the examples that come with the library illustrates how you can have the two types of dynamic libraries – one which you link with, and one plugin.
(For the complete, working source of this example see serialization.cpp)
Of course dealing with objects in a complex system means also having some way to save and load them. Be it from a file, database, or through a network. We did mention serialization before in our examples but we never gave an example of how you can serialize dynamix::object
-s.
The main issue would be how to convert the object's type information to a string and then use this string to create an object with the same type information. As for the concrete serialization of the data within the mixins, this example will only offer a very naive approach. We don't recommend that you use such an approach for your mixins as it is not safe, and not backward compatible. We only use it here because it's very easy to write and the focus of this example is on the object type information.
Basically what we'll do here to save and load the data inside the mixins is rely on two multicast messages – save
and load
– which will have a given priority for each mixin. Thus ensuring the order of execution to be the same in loading as it is in saving. Each save and load method of the mixin type will assume it has the right data to save and load.
So, let's declare and define our messages.
Now let's define some simple mixins. For this example we'll assume we're writing a company management system, which has a database of key individuals – employees and clients.
Now let's declare the save and load functions for an object.
Because in this example they're stand-alone functions, like the messages, we can't just name them save and load, because they'll clash with the message functions defined by the message macros. So let's just name them save_obj
and load_obj
.
Assuming those functions are written and work correctly, we could write some code with them like this.
First we create some objects:
Then we save them to some stream:
And finally use that stream to load those objects:
Now, let's see what we need to do to make the code from above work as expected.
First let's write the save function.
To save the object we'll need to write the names of its mixins. You might know that mixins also have id-s, but saving id-s is not a safe operation as they are generated based on the global instantiation order. This means that different programs with the same mixins (like a client or a server), or even the same program after a recompilation, could end up generating different id-s for the same mixins.
To get the names of the mixins in an object we could use object::get_mixin_names
and it's perfectly fine, but in order to make this example a bit more interesting, let's dive a bit into the library's internal structure.
If you've read the implementation notes or the debugging tutorial, you'll know that an object has a type information member which contains the mixin composition of the object in a std::vector
called _compact_mixins
. We'll use this vector to save the mixin names.
After we've stored the object type information, we can now save the data within its mixins via the save
message from above.
That was it. Now let's move to the code of the load_obj
function.
First we need to get the number of mixins we're loading. Let's do it in this simple fashion:
Now we'll create an object_type_template
which we'll use to store the loaded type and give it to a new object. The type template class (as all other mutator classes) has the method add
. Besides the way you're used to calling it – add<mixin>()
– you can also call it with a const char*
argument, which will be interpreted as a mixin name.
When being called like this, it will return bool
. True if a mixin of this name was found, and false if it wasn't. Note that this true or false value does not give you the information on whether the mixin will be added or removed from the object, but only a mixin of name exists in the domain. As you might remember the mutation rules (if such are added) will determine whether the mixin is actually added an removed.
Now what's left is to apply the template to the object:
The last thing we need to do when loading an object is to load the data within its mixins. The object is created with the appropriate mixins, so let's call the load
multicast message.
It should load the data correctly because the order of the save and load execution is the same – determined by their priority.
Here are some explanations that may help you make sense of the code of the library if you need to read it:
The overall structure of the library is based on a main class called domain
which holds all registered mixins and messages, and keeps the type registry.
The DYNAMIX_DEFINE_MIXIN
macro instantiates a class that is similar to a metafunction, as its only purpose is to globally instantiate itself, which in turn will lead to domain::register_mixin_type
being called.
It also generates a function that registers the mixin features.
domain::register_mixin_type
is a template method and it will appropriately fill a structure, containing the mixin type information – name, constructor, destructor, id – and will also call the generated function that registers its features.
The feature registration is composed of two parts: one global - to introduce the feature to the domain, and local called for the specific mixin type being registered. This means that a feature is globally registered multiple times - once for each of its uses for a mixin type. The first of those times will give it an id and fill the feature information structure appropriately. The other global registrations of a feature will see that it has a valid id, and will simply skip the rest of the code.
The local feature registration is performed by the class feature_parser
that has overloads for the supported mixin features: currently messages and allocators. The allocator registration is simple. It just sets the allocator member in the mixin type information structure to the appropriate value.
The message registration generates a caller function, based on the specific mixin. This caller function is a specific instantiation of a template function which is generated by the message declaration macros. Its template parameters are the mixin type and the actual member function in the mixin. The caller is then cast to void (*)()
to be stored in a vector in the mixin type information structure along with the caller functions for all of its messages.
This process of creating a caller function is based on the article The Impossibly Fast C++ Delegates by Sergey Ryazanov.
Each newly registered mixin and message get an id. The id-s are consecutive indexes in arrays (of mixin_type_info
and message_t
respectively) in the domain. Thus getting the information for a mixin or a message through its id is a O(1) operation.
The maximum numbers of registered messages and mixins are fixed through the constants DYNAMIX_MAX_MIXINS
and DYNAMIX_MAX_MESSAGES
in config.hpp
.
This allows us to have fixed-size arrays for both in the domain and per object type.
A new object type is initially identified via a bitset per mixin. The domain contains an unordered (hash) map where the key is such a bitset, and the value is an object type info.
The object type info consists of such a bitset (to mark which mixins are available in this type), a compact vector of mixin type information structures a cross indexing array (to indicate which mixin data is at which position in the compact array), and a call table.
The call table plays the same role as the virtual table in C++. It's a fixed size array for every message with non-null values for the messages that are implemented by that type. An element of that array is of type call_table_entry
. This is a union that, based on whether the message is a unicast or a multicast, will contain the message data or a begin and an end for a buffer or message datas.
When a type is requested for an object, first it's checked whether such a combination of mixins is an existing type. If not a new object type is created. This fills the mixin information bitset, vector and cross indexing array and then fills the call table. It will allocate a single buffer for all multicast messages within that type. When filling the call table the type creation process will choose the top priority unicast messages and sort the multicasts by priority. It will throw an exception if same-priority unicasts exist.
After the type is available the object data needs to be filled. The object consists of a pointer to its type and an array of the structure mixin_data_in_object
. This structure wraps a simple buffer that contains the mixin instance and a pointer to the owning object right in front of it. This is required for the need to get the owning object from within the code of the mixin class (made through dm_this
or object_of
). Thus, getting the owning object from the mixin is an offset from the this
pointer.
The message calling happens through the message functions which are generated by the message declaration macros.
The call consists of the following steps:
void (*)()
to the appropriate signature.For multicasts there is a for
loop for the last four steps.
Many people, upon seeing DynaMix for the first time, have expressed a concern with the seemingly excessive amount of macros the library's users are required to write.
The mixin definition and declaration macros are often mentioned as easy to remove, and indeed there is a way to reproduce almost all of their functionality without any macros. However not all of it can be reproduced.
One of the key features those macros provide is the global instantiation. Without them, the users will be required to provide explicit entry points for their subsystems and dynamic libraries, where they will have to call some mixin initialization functions. This is not as simple as it sounds. Here is a list of downsides that such explicit entry points may introduce:
The message macros are most likely impossible to remove. Unlike the mixin ones, each of them generates many lines of code. More than a hundred.
Probably the only way to remove them completely, would be to make the message calls by string. This will cause the calls to make hash table look-ups (or worse) and will prohibitively slow them down. Such a scenario will also reflect on the way mixins are registerd. The mixin messages would have to be set through something that resembles std::bind
adding yet more complexity to the user code.
Is is possible (and probably part of the future of the library) to create an external tool that makes the user code a bit nicer. It would resemble The Meta-Object Compiler of Qt, and similarly, would require a custom preprocessing step of the users' code.
Such a tool could theoretically solve more of the library's problems, like the need to call message(object)
instead of object.message()
at the very least, and many more...
Still, such an approach also has many opponents, as the code you write when you use it becomes effectively not-C++, but something that can be called a C++ dialect.