Tijn Porcelijn Tijn Porcelijn - 2 months ago 5x
Linux Question

MFXInit() in libmfx.a segfaults when called from shared object

(While Intel's forum is a more natural place to ask this question I'm posting it here hoping for more activity than Intel's total lack thereof -- so far)

I'm unable to create a dynamic link library that uses Intel Media SDK (linux server) to manipulate h264 video and noticed a problem in the design of the MFX library. The way I understand it, programs are supposed to link to static library, like:

$ g++ .... -L/opt/intel/mediasdk/lib/lin_x64 -lmfx

However, this
library appears to delegate all calls to a
ed dynamic library
. It is worth noting that function names (and signatures) exposed by static and dynamic libraries are identical, which is kind of confusing and dangerous.

While I don't understand the rationale behind this design, it should not be a problem by itself were it not that apparently some static/global initialization from within the library causes havoc when the (static)
is included in a shared object. Ie.:

+------+ +-----------+
| main | <-- | mylib.so |
+------+ | | +---------------+
| libmfx.a | (dlopen) | libmfxhw64.so |
| <------------- |
|+---------+| |+-------------+|
||MFXInit()|| || MFXInit() ||
||... || || ... ||
|| || || ||
+===========+ +===============+

The above library could be assembled like this:

$ g++ -shared -o mylib.so my1.o my2.o -lmfx

And then (dynamically) linked to
like so:

$ g++ -o main main.o mylib.so -ldl

(Note that the additional
is necessary to allow

Unfortunately, upon the first
call, the program causes a segmentation fault (accessing address 0x0000400). GDB backtrace:

#0 0x0000000000000400 in ?? ()
#1 0x00007ffff61fb4cd in MFXInit () from /opt/intel/mediasdk/lib64/libmfxhw64-p.so.1.13
#2 0x00007ffff7bd3a1f in MFX_DISP_HANDLE::LoadSelectedDLL(char const*, eMfxImplType, int, int) () from ./lib-a.so
#3 0x00007ffff7bd12b1 in MFXInit () from ./lib-a.so
#4 0x00007ffff7bd09c8 in test_mfx () at lib.c:12
#5 0x0000000000400744 in main (argc=1, argv=0x7fffffffe0d8) at main.c:8

(Observe that
at stackframe
is the one in
whereas the one at
is in

Note that there is no crash when
is created as a static library. Using breakpoints and disassembler, I managed to make following backtrace snapshot where in both cases
is at
, but they appear to hit different versions of
(absolute addresses are meaningless due to relocation):

#0 0x00007ffff6411980 in MFXQueryVersion () from /opt/intel/mediasdk/lib64/libmfxhw64-p.so.1.13
#1 0x00007ffff640c4cd in MFXInit () from /opt/intel/mediasdk/lib64/libmfxhw64-p.so.1.13
#2 0x000000000040484f in MFX_DISP_HANDLE::LoadSelectedDLL(char const*, eMfxImplType, int, int) ()
#3 0x00000000004020e1 in MFXInit ()
#4 0x0000000000401800 in test_mfx () at lib.c:12
#5 0x0000000000401794 in main (argc=1, argv=0x7fffffffe0e8) at main.c:8

Because both static and shared Intel libs expose the same API functions, I can link straight into
guts directly, but I suppose that bypassing the static "dispatcher" is without warranty(?)

Could someone explain Intel's idea behind said design? Spec., why provide a static library that only delegates to an
that has identical interface?

Also, it appears that the SEGV is caused by static/global data in either
. Is there a way to force a specific execution order on dynamically loaded static/global sections? What is the best approach to debug these kinds of problems?

Tested with Intel Media SDK R2 (ubuntu 12) and Intel Media SDK 2015R3-R5 (Centos 7, 1.13/1.15) on Intel Haswell i7-4790 @3.6Ghz

If you have a working Intel MSDK setup, please compile my example code to confirm the issue.


(OK, since no one seems eager, I'll do the inelegant thing and post an answer to my own question).

After considerable research trying to break the unintentional circular linking, I discovered that the ld option --exclude-libs provides solace. Essentially, I was looking for a way to force removal of any libmfx.a symbols after using them to resolve dependencies in lib.o while creating the DLL. This could be accomplished by creating the so like this:

g++ -shared -o lib-a.so lib.o -L/opt/intel/mediasdk/lib/lin_x64 -lmfx -Wl,--exclude-libs=libmfx

Once the library is created like this, Bob's you uncle:

g++ -o main-so-a main.o lib-a.so -ldl

(Note that libdl is still needed because Intel's MFX (now inside lib-a.so) still uses dlopen to discover libmfxhw64.so)

From the ld man page:

   --exclude-libs lib,lib,...
       Specifies a list of archive libraries from which symbols should not be
       automatically exported.  The library names may be delimited by commas or
       colons.  Specifying "--exclude-libs ALL" excludes symbols in all archive
       libraries from automatic export.  This option is available only for the
       i386 PE targeted port of the linker and for ELF targeted ports.  For i386
       PE, symbols explicitly listed in a .def file are still exported,
       regardless of this option.  For ELF targeted ports, symbols affected
       by this option will be treated as hidden.

So, essentially the trick is no make sure that the relevant ELF symbols are marked hidden. Normally this would be handled through #pragmas by the library developers (ie. Intel), but due to their negligence this needs to be retrofitted in this case.

I suppose the same could have been accomplished with a --version-script map file, but that might have turned out to be more fragile since we want to fully encapsulate libmfx.a anyway.