Karthik Karthik - 1 month ago 19
C Question

Application is getting killed without any reason. Suspecting high BSS. How to debug it?

I have been running my application successfully in CentOs6.6. Recently, the hardware(motherboard and RAM) was updated and my application is getting killed now without any reason at all.

[root@localhost PktBlaster]# ./PktBlaster
Killed


File and ldd output

[root@localhost PktBlaster]# file PktBlaster
PktBlaster: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, not stripped

[root@localhost PktBlaster]# ldd PktBlaster
not a dynamic executable


Output of strace

[root@localhost PktBlaster]# strace ./PktBlaster
execve("./PktBlaster", ["./PktBlaster"], [/* 30 vars */] <unfinished ...>
+++ killed by SIGKILL +++
Killed


GDB

[root@localhost PktBlaster]# gdb PktBlaster
(gdb) break main
Breakpoint 1 at 0x43d664: file VTP.c, line 544.
(gdb) run
Starting program: /root/Veryx/PktBlaster/PktBlaster
During startup program terminated with signal SIGKILL, Killed.


While debugging, observed that the bss memory is huge(~6GB). The system has 4GB RAM and I think this could be the reason for the issue.

[root@localhost PktBlaster_1Gig]# size build/unix/bin/PktBlaster
text data bss dec hex filename
375551 55936 6747541120 6747972607 19235e3ff build/unix/bin/PktBlaster


The application contains many
.h
files and many datastructures and so it is difficult for me to identify why BSS is been raised to 6GB.

Could anyone please suggest how to identify which file is causing this? or any other easier way to debug this?

Answer

It seems that problem really is huge BSS size. I have asked you to show output of LD_DEBUG=all /lib64/ld-linux-x86-64.so.2 /path/to/exe in comments.

/lib64/ld-linux-x86-64.so.2 is runtime linker which is used by OS to load your binary in process memory during execve system call. Runtime linker is responsible for parsing executable format, loading all sections and dependencies in memory, performing all required relocations and so on. Setting environment variable LD_DEBUG to all we instruct runtime linker to generate debug output.

[root@localhost PktBlaster]# LD_DEBUG=all /lib64/ld-linux-x86-64.so.2 /root/Veryx/PktBlaster/PktBlaster 851: file=/root/Veryx/PktBlaster/PktBlaster [0]; generating link map /root/Veryx/PktBlaster/PktBlaster: error while loading shared libraries: /root/Veryx/PktBlaster/PktBlaster: cannot map zero-fill pages: Cannot allocate memory

Searching for this error message in source code of runtime linker(glibc-2.17 elf/dl-load.c, lines ~1400) we see:

1393         if (zeroend > zeropage)
1394           {
1395         /* Map the remaining zero pages in from the zero fill FD.  */
1396         caddr_t mapat;
1397         mapat = __mmap ((caddr_t) zeropage, zeroend - zeropage,
1398                 c->prot, MAP_ANON|MAP_PRIVATE|MAP_FIXED,
1399                 -1, 0);
1400         if (__builtin_expect (mapat == MAP_FAILED, 0))
1401           {
1402             errstring = N_("cannot map zero-fill pages");
1403             goto call_lose_errno;
1404           }

dl-loader is in process of loading BSS segment, which by optimizations is stored in binary format as just number of bytes, that must be initialized to zero. Loader tries to allocate through mmap zero initialized memory block(MAP_ANONYMOUS) and get error from the OS:

 15 #define ENOMEM      12  /* Out of memory */

From man 2 mmap:

ENOMEM No memory is available, or the process's maximum number of mappings would have been exceeded.

So it seems that for whatever reason OS cannot fulfill loader request for memory. Either some limits are used(systemd, process limit, some security LKM, whatever) or simply there are not enough free memory in kernel.

To determine what object file generates most part of the BSS - use

objdump -j '.bss' -t *.o