charley charley - 2 months ago 9
C++ Question

Will C++ linker automatically inline functions (without "inline" keyword, without implementation in header)?

Will the C++ linker automatically inline "pass-through" functions, which are NOT defined in the header, and NOT explicitly requested to be "inlined" through the

inline
keyword?

For example, the following happens so often, and should always benefit from "inlining", that it seems every compiler vendor should have "automatically" handled it through "inlining" through the linker (in those cases where it is possible):

//FILE: MyA.hpp
class MyA
{
public:
int foo(void) const;
};

//FILE: MyB.hpp
class MyB
{
private:
MyA my_a_;
public:
int foo(void) const;
};

//FILE: MyB.cpp
// PLEASE SAY THIS FUNCTION IS "INLINED" BY THE LINKER, EVEN THOUGH
// IT WAS NOT IMPLICITLY/EXPLICITLY REQUESTED TO BE "INLINED"?
int MyB::foo(void)
{
return my_a_.foo();
}


I'm aware the MSVS linker will perform some "inlining" through its Link Time Code Generation (LTGCC), and that the GCC toolchain also supports Link Time Optimization (LTO) (see: Can the linker inline functions?).

Further, I'm aware that there are cases where this cannot be "inlined", such as when the implementation is not "available" to the linker (e.g., across shared library boundaries, where separate linking occurs).

However, if this is code is linked into a single executable that does not cross DLL/shared-lib boundaries, I'd expect the compiler/linker vendor to automatically inline the function, as a simple-and-obvious optimization (benefiting both performance-and-size)?

Are my hopes too naive?

Answer

Here's a quick test of your example (with a MyA::foo() implementation that simply returns 42). All these tests were with 32-bit targets - it's possible that different results might be seen with 64-bit targets. It's also worth noting that using the -flto option (GCC) or the /GL option (MSVC) results in full optimization - wherever MyB::foo() is called, it's simply replaced with 42.

With GCC (MinGW 4.5.1):

gcc -g -O3 -o test.exe myb.cpp mya.cpp test.cpp

the call to MyB::foo() was not optimized away. MyB::foo() itself was slightly optimized to:

Dump of assembler code for function MyB::foo() const:
   0x00401350 <+0>:     push   %ebp
   0x00401351 <+1>:     mov    %esp,%ebp
   0x00401353 <+3>:     sub    $0x8,%esp
=> 0x00401356 <+6>:     leave
   0x00401357 <+7>:     jmp    0x401360 <MyA::foo() const>

Which is the entry prologue is left in place, but immediately undone (the leave instruction) and the code jumps to MyA::foo() to do the real work. However, this is an optimization that the compiler (not the linker) is doing since it realizes that MyB::foo() is simply returning whatever MyA::foo() returns. I'm not sure why the prologue is left in.

MSVC 16 (from VS 2010) handled things a little differently:

MyB::foo() ended up as two jumps - one to a 'thunk' of some sort:

0:000> u myb!MyB::foo
myb!MyB::foo:
001a1030 e9d0ffffff      jmp     myb!ILT+0(?fooMyAQBEHXZ) (001a1005)

And the thunk simply jumped to MyA::foo():

myb!ILT+0(?fooMyAQBEHXZ):
001a1005 e936000000      jmp     myb!MyA::foo (001a1040)

Again - this was largely (entirely?) performed by the compiler, since if you look at the object code produced before linking, MyB::foo() is compiled to a plain jump to MyA::foo().

So to boil all this down - it looks like without explicitly invoking LTO/LTCG, linkers today are unwilling/unable to perform the optimization of removing the call to MyB::foo() altogether, even if MyB::foo() is a simple jump to MyA::foo().

So I guess if you want link time optimization, use the -flto (for GCC) or /GL (for the MSVC compiler) and /LTCG (for the MSVC linker) options.