C++ standard library has
std::type_info
and std::type_index
to get run-time type information about a type. There are some efficiency and robustness issues in using them (especially when dynamically loaded libraries are involved.)
TL;DR; The
-D__GXX_MERGED_TYPEINFO_NAMES -rdynamic
compiler/linker options (for both the main program and the library) generates code that uses pointer comparison in std::type_info::operator==()
.
The
typeid
keyword is used to obtain a type's run-time type information. Quoting cppreference.
Thetypeid
expression is an lvalue expression which refers to an object with static storage duration, of the polymorphic typeconst std::type_info
or of some type derived from it.
std::type_info
objects can not be put in std::vector
because they are non-copyable and non-assignable. Of course, you can have a std::vector<const std::type_info *>
as the object returned by typeid
has static storage duration. You could also use std::vector<std::type_index>
. std::type_index
contains a pointer to std::type_info
and therefore, copies are possible and cheap. It's also safer to use std::type_index
because for associative containers, std::type_index
delegates less-then, equality, and greater-than to the underlying std::type_info
object. And that's what you want. Just using const std::type_info *
would do pointer comparisons. The result may be different.
The real question I'm seeking an answer to is
Is C++ typeid machinery reliable for determining uniqueness of polymorphic types robustly, efficiently, portably, and across dynamically loaded modules?This seems like a tall order. There's one caveat though. "Portability" for me is limited to RHEL7 Linux, MacOS 10.x, and may be Windows 10 with really latest toolchains (clang++ 7.x, g++ 8.x, Visual Studio 2017). I'm not worried about other platforms at the moment.
Robustness
The first step is to check ifstd::type_info
or std::type_index
is the same for the same type and not same for different types.
We've a few things to use for comparisons:
std::type_info::operator==()
std::type_info::name()
std::type_info::hash_code()
std::type_info *
type_info::operator==
. Equality comparison between two type_info
objects returns true for the same types and false for different types even when dynamically loaded libraries are involved. The question is how fast is it. We'll look at that a little later.
The worst function for determining equality appears to be
type_info::name
. Quoting cppreference: "No guarantees are given; in particular, the returned string can be identical for several types". I'm really bummed by that.
Next is
type_info::hash_code
. As hashes for two different types can collide, it is useless for determining type equality. The only thing C++17 standard (n4713) says is
Within a single execution of the program, it shall return the same value for any two type_info
objects which compare equal.
Hash calculation could also be slow as it would be typically O(n)
where n is the length of the mangled name. There's one implementation-specific hack though. Certain preprocessor macros (discussed below) enable type_info::hash_code
to return a pointer to type_info
object. That's super-fast. But does it provide guarantees of uniqueness? May be so.
That brings us to the last option:
std::type_info *
. If std::type_info::operator==()
is implemented in terms of pointer comparisons, then we might get the best of both worlds. Fast, reliable type_info
comparisons. Is there a way? Read on...
However, when shared libraries (.so on Linux, .dll on Windows) are in the picture, no such guarantee can be given. And it makes sense. As shared-library and the main program could be compiled completely independently, expecting that
typeid(Foo)
is the same object in main and dynamically loaded libraries is wishful thinking. We'll tackle this issue after the next section.
Efficiency
If you look atstd::type_info
in libc++ and libstdc++ you will discover a couple of macros that directly determine efficiency of the comparison operators. It's _LIBCPP_HAS_NONUNIQUE_TYPEINFO
in libc++ and __GXX_MERGED_TYPEINFO_NAMES
in libstdc++ respectively. In the respective library implementations, they control whether std::type_info
comparisons are simply pointer comparisons or much more expensive const char *
comparisons. With long names of template instantiations, the cost of strcmp
-like operations could be high.
If you are interested in detailed performance numbers and library code, you may want to checkout Fun with typeid() blogpost by David Holmes. The long and the short of it is that with
_LIBCPP_HAS_NONUNIQUE_TYPEINFO
disabled in libc++ and __GXX_MERGED_TYPEINFO_NAMES
enabled in libstdc++, performance of std::type_info
and std::type_index
comparisons is an order of magnitude better (due to just pointer comparisons).
On my MacOS machine,
_LIBCPP_HAS_NONUNIQUE_TYPEINFO
is not defined by default. So things are good. On my RHEL7 box, __GXX_MERGED_TYPEINFO_NAMES
is not defined. There's explanation why that's the case in libstdc++. It reads something like this.
// Determine whether typeinfo names for the same type are merged (in which // case comparison can just compare pointers) or not (in which case strings // must be compared), and whether comparison is to be implemented inline or // not. // We used to do inline pointer comparison by default if weak symbols // are available, but even with weak symbols sometimes names are not merged // when objects are loaded with RTLD_LOCAL, so now we always use strcmp by // default. // For ABI compatibility, we do the strcmp inline if weak symbols // are available, and out-of-line if not. Out-of-line pointer comparison // is used where the object files are to be portable to multiple systems, // some of which may not be able to use pointer comparison, but the // particular system for which libstdc++ is being built can use pointer // comparison; in particular for most ARM EABI systems, where the ABI // specifies out-of-line comparison. // The compiler's target configuration // can override the defaults by defining __GXX_TYPEINFO_EQUALITY_INLINE to // 1 or 0 to indicate whether or not comparison is inline, and // __GXX_MERGED_TYPEINFO_NAMES to 1 or 0 to indicate whether or not pointer // comparison can be used.Thats' dense! I'm unclear about what merged really means in this context. What is being merged with what? Anyone?
The best part is the last sentence. The standard library authors are permitting setting an otherwise internal macro (starts with __) to enable pointer comparisons. So there seems to be light at the end of the tunnel.
One thing I'm not 100% sure is the keyword "target configuration". A compiler's target configuration is the machine assembly code is generated for. On my machine,
gcc -v
prints Target: x86_64-redhat-linux
. I.e., the resulting code is suitable for running on x86_64-redhat-linux
---a native build. I'm unclear whether the compiler and the standard library itself should be built with the same preprocessor macro. If you are curious about what build, host, and target machines are for a compiler, see gcc configure terms and history.
The following invocation of the compiler seems to produce code that uses pointer comparisons in
type_info::operator==
.
g++ -std=c++11 -D__GXX_MERGED_TYPEINFO_NAMES -ldl -o test test.cpp
Dynamically Loaded Libraries
There's another wrinkle which appears to be around dynamic loading of shared libraries. Something about "weak symbols" andRTLD_LOCAL
. What in the world are those things?
In the man pages for
dlopen
---a library function to load shared library files (*.so) at run-time---you will find RTLD_LOCAL
. Quoting man pages:
This is the converse of RTLD_GLOBAL
, and the default if neither flag is specified. Symbols defined in this library are not made available to resolve references in subsequently loaded libraries.
So if your program uses dynamically loaded libraries and the libraries rely on a globally known definition of std::type_info(Foo)
object, you might be out of luck if the libraries are opened using default flags or explicitly with RTLD_LOCAL
. Such libraries, even if compiled with __GXX_TYPEINFO_EQUALITY_INLINE
, will use their own local definitions of std::type_info(Foo)
. Obviously, if your program relies on a global unique definition, as in std::set<std::type_index>
or some similar shenanigans, your program is likely to explode.
Ok, so, I can't open the libraries with
RTLD_LOCAL
or default. I've to use RTLD_GLOBAL
. Easy.
To be extra careful, I threw in a run-time check to ensure the main program and the shared-library file agree on the definition of
std::type_info
of Foo.
The Foo header file.
// Foo.h #ifndef FOO_H #define FOO_H namespace test { class Foo { virtual ~Foo() = default; }; } using namespace test; extern "C" void foo(const std::type_info &); #endif // FOO_HThe Foo implementation file.
// Foo.cpp (shared-library implementation) #include <iostream> #include <typeinfo> #include <cassert> #include "foo.h" void test(const std::type_info &other) { assert(other == typeid(Foo)); std::cout << "typeid equality = " << std::boolalpha << (other == typeid(Foo)) << std::endl; assert(other.hash_code() == typeid(Foo).hash_code()); std::cout << "typeid hash_code equality = " << std::boolalpha << (other.hash_code() == typeid(Foo).hash_code()) << std::endl; std::cout << "typeid name: module=" << typeid(Foo).name() << ", other=" << other.name() << std::endl; }And the main program (robust_typeid.cpp)
#include <typeinfo> #include <iostream> #include <string> #include <unistd.h> #include <dlfcn.h> #include "foo.h" int main(void) { char cwd[1024]; getcwd(cwd, sizeof(cwd)); std::string path = std::string(cwd) + "/libfoo.so"; void *handle = dlopen(path.c_str(), RTLD_GLOBAL); std::cout << "handle = " << handle << "\n"; using TestFunctionType = void (*)(const std::type_info &); TestFunctionType foo_ptr = reinterpret_cast<TestFunctionType>(dlsym(handle, "test")); if(test_ptr) test_ptr(typeid(Foo)); if(handle) dlclose(handle); }The program loads libfoo.so dynamically and calls the
test
function in the library. The main module passes a reference to Foo
's std::type_info
object (as observed by the main module) to function test
. The function checks if they agree on the uniqueness of std::type_info
object for Foo
.
Finally, the compiler options.
// Create libfoo.so $ clang++ -std=c++11 -D__GXX_MERGED_TYPEINFO_NAMES -fpic -shared foo.cpp -o libfoo.so // Create the main program $ clang++ -std=c++11 -D__GXX_MERGED_TYPEINFO_NAMES -ldl -o robust_typeid robust_typeid.cpp // Run $ /.robust_typeidIt crashes with an assertion failure. Ouch!
handle = 0x85dcf0
robust_typeid: foo.cpp:9: void test(const std::type_info &): Assertion other == typeid(Foo)
failed.
Aborted (core dumped)
Suspicion turned to be right. Something's not right.
With some google-foo, I found gcc's linker flag
-rdynamic
or -export-dynamic
. Quoting man pages:
This instructs the linker to add all symbols, not only used ones, to the dynamic symbol table. This option is needed for some uses of dlopen
Let's try.
Voilla!
These two options seem to enable the best of both worlds: fast, reliable
type_info
comparisons. Additionally, the type_info::hash_code
function returns a pointer. Does that make it non-colliding? Is -D__GXX_MERGED_TYPEINFO_NAMES -rdynamic
really a silver bullet? Let me know what you think. Comment on reddit/r/cpp.
Comments