Skip to main content

Non-colliding Efficient type_info::hash_code Across Shared Libraries


C++ standard library has std::type_info and std::type_index to get run-time type information about a type. There are some efficiency and robustness issues in using them (especially when dynamically loaded libraries are involved.)

TL;DR; The -D__GXX_MERGED_TYPEINFO_NAMES -rdynamic compiler/linker options (for both the main program and the library) generates code that uses pointer comparison in std::type_info::operator==().

The typeid keyword is used to obtain a type's run-time type information. Quoting cppreference.
The typeid expression is an lvalue expression which refers to an object with static storage duration, of the polymorphic type const std::type_info or of some type derived from it.
std::type_info objects can not be put in std::vector because they are non-copyable and non-assignable. Of course, you can have a std::vector<const std::type_info *> as the object returned by typeid has static storage duration. You could also use std::vector<std::type_index>. std::type_index contains a pointer to std::type_info and therefore, copies are possible and cheap. It's also safer to use std::type_index because for associative containers, std::type_index delegates less-then, equality, and greater-than to the underlying std::type_info object. And that's what you want. Just using const std::type_info * would do pointer comparisons. The result may be different.

The real question I'm seeking an answer to is
Is C++ typeid machinery reliable for determining uniqueness of polymorphic types robustly, efficiently, portably, and across dynamically loaded modules?
This seems like a tall order. There's one caveat though. "Portability" for me is limited to RHEL7 Linux, MacOS 10.x, and may be Windows 10 with really latest toolchains (clang++ 7.x, g++ 8.x, Visual Studio 2017). I'm not worried about other platforms at the moment.

Robustness

The first step is to check if std::type_info or std::type_index is the same for the same type and not same for different types.
We've a few things to use for comparisons:
  • std::type_info::operator==()
  • std::type_info::name()
  • std::type_info::hash_code()
  • std::type_info *
Consider type_info::operator==. Equality comparison between two type_info objects returns true for the same types and false for different types even when dynamically loaded libraries are involved. The question is how fast is it. We'll look at that a little later.

The worst function for determining equality appears to be type_info::name. Quoting cppreference: "No guarantees are given; in particular, the returned string can be identical for several types". I'm really bummed by that.

Next is type_info::hash_code. As hashes for two different types can collide, it is useless for determining type equality. The only thing C++17 standard (n4713) says is
Within a single execution of the program, it shall return the same value for any two type_info objects which compare equal.
Hash calculation could also be slow as it would be typically O(n) where n is the length of the mangled name. There's one implementation-specific hack though. Certain preprocessor macros (discussed below) enable type_info::hash_code to return a pointer to type_info object. That's super-fast. But does it provide guarantees of uniqueness? May be so.

That brings us to the last option: std::type_info *. If std::type_info::operator==() is implemented in terms of pointer comparisons, then we might get the best of both worlds. Fast, reliable type_info comparisons. Is there a way? Read on...

However, when shared libraries (.so on Linux, .dll on Windows) are in the picture, no such guarantee can be given. And it makes sense. As shared-library and the main program could be compiled completely independently, expecting that typeid(Foo) is the same object in main and dynamically loaded libraries is wishful thinking. We'll tackle this issue after the next section.

Efficiency

If you look at std::type_info in libc++ and libstdc++ you will discover a couple of macros that directly determine efficiency of the comparison operators. It's _LIBCPP_HAS_NONUNIQUE_TYPEINFO in libc++ and __GXX_MERGED_TYPEINFO_NAMES in libstdc++ respectively. In the respective library implementations, they control whether std::type_info comparisons are simply pointer comparisons or much more expensive const char * comparisons. With long names of template instantiations, the cost of strcmp-like operations could be high.

If you are interested in detailed performance numbers and library code, you may want to checkout Fun with typeid() blogpost by David Holmes. The long and the short of it is that with _LIBCPP_HAS_NONUNIQUE_TYPEINFO disabled in libc++ and __GXX_MERGED_TYPEINFO_NAMES enabled in libstdc++, performance of std::type_info and std::type_index comparisons is an order of magnitude better (due to just pointer comparisons).

On my MacOS machine, _LIBCPP_HAS_NONUNIQUE_TYPEINFO is not defined by default. So things are good. On my RHEL7 box, __GXX_MERGED_TYPEINFO_NAMES is not defined. There's explanation why that's the case in libstdc++. It reads something like this.

// Determine whether typeinfo names for the same type are merged (in which
// case comparison can just compare pointers) or not (in which case strings
// must be compared), and whether comparison is to be implemented inline or
// not.  

// We used to do inline pointer comparison by default if weak symbols
// are available, but even with weak symbols sometimes names are not merged
// when objects are loaded with RTLD_LOCAL, so now we always use strcmp by
// default.  

// For ABI compatibility, we do the strcmp inline if weak symbols
// are available, and out-of-line if not.  Out-of-line pointer comparison
// is used where the object files are to be portable to multiple systems,
// some of which may not be able to use pointer comparison, but the
// particular system for which libstdc++ is being built can use pointer
// comparison; in particular for most ARM EABI systems, where the ABI
// specifies out-of-line comparison.  

// The compiler's target configuration
// can override the defaults by defining __GXX_TYPEINFO_EQUALITY_INLINE to
// 1 or 0 to indicate whether or not comparison is inline, and
// __GXX_MERGED_TYPEINFO_NAMES to 1 or 0 to indicate whether or not pointer
// comparison can be used.
Thats' dense! I'm unclear about what merged really means in this context. What is being merged with what? Anyone?

The best part is the last sentence. The standard library authors are permitting setting an otherwise internal macro (starts with __) to enable pointer comparisons. So there seems to be light at the end of the tunnel.

One thing I'm not 100% sure is the keyword "target configuration". A compiler's target configuration is the machine assembly code is generated for. On my machine, gcc -v prints Target: x86_64-redhat-linux. I.e., the resulting code is suitable for running on x86_64-redhat-linux---a native build. I'm unclear whether the compiler and the standard library itself should be built with the same preprocessor macro. If you are curious about what build, host, and target machines are for a compiler, see gcc configure terms and history.

The following invocation of the compiler seems to produce code that uses pointer comparisons in type_info::operator==.
g++ -std=c++11 -D__GXX_MERGED_TYPEINFO_NAMES -ldl -o test test.cpp

Dynamically Loaded Libraries

There's another wrinkle which appears to be around dynamic loading of shared libraries. Something about "weak symbols" and RTLD_LOCAL. What in the world are those things?

In the man pages for dlopen---a library function to load shared library files (*.so) at run-time---you will find RTLD_LOCAL. Quoting man pages:
This is the converse of RTLD_GLOBAL, and the default if neither flag is specified. Symbols defined in this library are not made available to resolve references in subsequently loaded libraries.
So if your program uses dynamically loaded libraries and the libraries rely on a globally known definition of std::type_info(Foo) object, you might be out of luck if the libraries are opened using default flags or explicitly with RTLD_LOCAL. Such libraries, even if compiled with __GXX_TYPEINFO_EQUALITY_INLINE, will use their own local definitions of std::type_info(Foo). Obviously, if your program relies on a global unique definition, as in std::set<std::type_index> or some similar shenanigans, your program is likely to explode.

Ok, so, I can't open the libraries with RTLD_LOCAL or default. I've to use RTLD_GLOBAL. Easy.

To be extra careful, I threw in a run-time check to ensure the main program and the shared-library file agree on the definition of std::type_info of Foo.

The Foo header file.
// Foo.h
#ifndef FOO_H
#define FOO_H

namespace test {
class Foo {
  virtual ~Foo() = default;
};
}
using namespace test;
extern "C" void foo(const std::type_info &);
 
#endif  // FOO_H
The Foo implementation file.
// Foo.cpp (shared-library implementation)
#include <iostream>
#include <typeinfo> 
#include <cassert>

#include "foo.h"

void test(const std::type_info &other)
{
  assert(other == typeid(Foo));
  std::cout << "typeid equality = " << std::boolalpha << (other == typeid(Foo)) << std::endl;
  assert(other.hash_code() == typeid(Foo).hash_code());
  std::cout << "typeid hash_code equality = " << std::boolalpha << (other.hash_code() == typeid(Foo).hash_code()) << std::endl;
  std::cout << "typeid name: module=" << typeid(Foo).name() << ", other=" << other.name() << std::endl;
}
And the main program (robust_typeid.cpp)
#include <typeinfo>
#include <iostream>
#include <string>
#include <unistd.h>
#include <dlfcn.h>

#include "foo.h"

int main(void) {
  char cwd[1024];
  getcwd(cwd, sizeof(cwd));
  std::string path = std::string(cwd) + "/libfoo.so";
  void *handle = dlopen(path.c_str(), RTLD_GLOBAL);

  std::cout << "handle = " << handle << "\n";
  using TestFunctionType = void (*)(const std::type_info &); 
  TestFunctionType foo_ptr = reinterpret_cast<TestFunctionType>(dlsym(handle, "test"));

  if(test_ptr) 
    test_ptr(typeid(Foo));
  
  if(handle)
    dlclose(handle);
}
The program loads libfoo.so dynamically and calls the test function in the library. The main module passes a reference to Foo's std::type_info object (as observed by the main module) to function test. The function checks if they agree on the uniqueness of std::type_info object for Foo.

Finally, the compiler options.
// Create libfoo.so
$ clang++ -std=c++11 -D__GXX_MERGED_TYPEINFO_NAMES -fpic -shared foo.cpp -o libfoo.so
// Create the main program
$ clang++ -std=c++11 -D__GXX_MERGED_TYPEINFO_NAMES -ldl -o robust_typeid robust_typeid.cpp
// Run
$ /.robust_typeid
It crashes with an assertion failure. Ouch!
handle = 0x85dcf0
robust_typeid: foo.cpp:9: void test(const std::type_info &): Assertion other == typeid(Foo) failed.
Aborted (core dumped)
Suspicion turned to be right. Something's not right.

With some google-foo, I found gcc's linker flag -rdynamic or -export-dynamic. Quoting man pages:
This instructs the linker to add all symbols, not only used ones, to the dynamic symbol table. This option is needed for some uses of dlopen
Let's try.
Voilla!

These two options seem to enable the best of both worlds: fast, reliable type_info comparisons. Additionally, the type_info::hash_code function returns a pointer. Does that make it non-colliding? Is -D__GXX_MERGED_TYPEINFO_NAMES -rdynamic really a silver bullet? Let me know what you think. Comment on reddit/r/cpp.

Comments

Popular Content

Unit Testing C++ Templates and Mock Injection Using Traits

Unit testing your template code comes up from time to time. (You test your templates, right?) Some templates are easy to test. No others. Sometimes it's not clear how to about injecting mock code into the template code that's under test. I've seen several reasons why code injection becomes challenging. Here I've outlined some examples below with roughly increasing code injection difficulty. Template accepts a type argument and an object of the same type by reference in constructor Template accepts a type argument. Makes a copy of the constructor argument or simply does not take one Template accepts a type argument and instantiates multiple interrelated templates without virtual functions Lets start with the easy ones. Template accepts a type argument and an object of the same type by reference in constructor This one appears straight-forward because the unit test simply instantiates the template under test with a mock type. Some assertion might be tested in...

Covariance and Contravariance in C++ Standard Library

Covariance and Contravariance are concepts that come up often as you go deeper into generic programming. While designing a language that supports parametric polymorphism (e.g., templates in C++, generics in Java, C#), the language designer has a choice between Invariance, Covariance, and Contravariance when dealing with generic types. C++'s choice is "invariance". Let's look at an example. struct Vehicle {}; struct Car : Vehicle {}; std::vector<Vehicle *> vehicles; std::vector<Car *> cars; vehicles = cars; // Does not compile The above program does not compile because C++ templates are invariant. Of course, each time a C++ template is instantiated, the compiler creates a brand new type that uniquely represents that instantiation. Any other type to the same template creates another unique type that has nothing to do with the earlier one. Any two unrelated user-defined types in C++ can't be assigned to each-other by default. You have to provide a...

Multi-dimensional arrays in C++11

What new can be said about multi-dimensional arrays in C++? As it turns out, quite a bit! With the advent of C++11, we get new standard library class std::array. We also get new language features, such as template aliases and variadic templates. So I'll talk about interesting ways in which they come together. It all started with a simple question of how to define a multi-dimensional std::array. It is a great example of deceptively simple things. Are the following the two arrays identical except that one is native and the other one is std::array? int native[3][4]; std::array<std::array<int, 3>, 4> arr; No! They are not. In fact, arr is more like an int[4][3]. Note the difference in the array subscripts. The native array is an array of 3 elements where every element is itself an array of 4 integers. 3 rows and 4 columns. If you want a std::array with the same layout, what you really need is: std::array<std::array<int, 4>, 3> arr; That's quite annoying for...