friedrich friedrich - 1 month ago 10
C Question

SIGSTOP/SIGCONT POSIX behavior

I'm playing around with signals:

SIGSTOP
and
SIGCONT
in particular.
Here is a test program I wrote. The idea is to create a chain of N + 1
processes (including the main process). Each one has to wait for its child to stop, then stop
itself. The main process has to wake up its child when the latter has
stopped.

To do so, the
f
function recursively create the process chain. Each of
the process uses sigsuspend on the
SIGCHLD
signal apart from the last
child who stops itself directly. When its child has stopped, a process
will receive the
SIGCHLD
signal, then it can stop on its turn. When
the main process receives the
SIGCHLD
signal it means that all the
processes are in the stop state, so it sends the
SIGCONT
signal to its
child. Each process sends
SIGCONT
to its own child then exit, apart
from the last child who just exit.

I tried to make it clear: removed return code tests and wrote some
comments.

When executing the program everything seems to be okay but the
SIGCONT

chain. Some processes get awakened but not all of them. Looking at the
running programs (with ps for example) everything seems fine: no
blocked processes. I don't really get what could be wrong in this
program. Any help or hint would be welcome.

Here is a sample trace. As you can see, the "fork chain" went well, where processes are suspending on
SIGCHLD
. Then the last child spawns and stops. Which creates a "
SIGCHLD
chain" over the parents because each process stops itself. When the main process gets is notified of a
SIGCHLD
it sends
SIGCONT
to its child, which gets awakened and in turn sends
SIGCONT
to its own child etc. You can notice that this chain is not complete:

$ ./bin/trycont
n pid log
0 6257 "suspending on SIGCHLD"
1 6258 "suspending on SIGCHLD"
2 6259 "suspending on SIGCHLD"
3 6260 "suspending on SIGCHLD"
4 6261 "suspending on SIGCHLD"
5 6262 "last child - stopping"
4 6261 "got SIGCHLD"
4 6261 "stopping"
3 6260 "got SIGCHLD"
3 6260 "stopping"
2 6259 "got SIGCHLD"
2 6259 "stopping"
1 6258 "got SIGCHLD"
1 6258 "stopping"
0 6257 "got SIGCHLD"
0 6257 "sending SIGCONT to 6258"
1 6258 "awakened - sending SIGCONT to 6259"
2 6259 "awakened - sending SIGCONT to 6260"
# <- not the expected trace


Here is the program:
src/trycont.c


#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <signal.h>

/* number of created processes with fork
*/
#define N 5

#define printHeader() printf("n\tpid\tlog\n");
#define printMsg(i, p, str, ...) printf("%d\t%d\t" #str "\n", i, p, ##__VA_ARGS__)

void f(int n);
void handler(int sig);

sigset_t set;
struct sigaction action;

int main(int argc, char *argv[])
{
/* mask SIGCHLD
*/
sigemptyset(&set);
sigaddset(&set, SIGCHLD);
sigprocmask(SIG_SETMASK, &set, NULL);

/* handler will be called when SIGCHLD is sent to the process
* during the handler, SIGCHLD will be masked (sa_mask)
*/
action.sa_mask = set;
action.sa_handler = handler;
action.sa_flags = 0;

/* SIGCHLD will trigger action
*/
sigaction(SIGCHLD, &action, NULL);

/* start
*/
printHeader();
f(N);

exit(EXIT_SUCCESS);
}

void f(int n)
{
pid_t p, pc;
int myIndex;

myIndex = N - n;
p = getpid();

if (n == 0)
{
/* last child
*/
printMsg(myIndex, p, "last child - stopping");
kill(p, SIGSTOP);
printMsg(myIndex, p, "END REACHED");
exit(EXIT_SUCCESS);
}

pc = fork();

if (pc == 0)
{
/* recursion
*/
f(n - 1);

/* never reached
* because of exit
*/
}

/* father
*/

/* suspending on SIGCHLD
* need to unmask the signal
* and suspend
*/
printMsg(myIndex, p, "suspending on SIGCHLD");

sigfillset(&set);
sigdelset(&set, SIGCHLD);
sigsuspend(&set);

printMsg(myIndex, p, "got SIGCHLD");

if (n < N)
{
/* child process
* but not last
*/
printMsg(myIndex, p, "stopping");
kill(p, SIGSTOP);

printMsg(myIndex, p, "awakened - sending SIGCONT to %d", pc);
kill(pc, SIGCONT);
}
else
{
/* root process
*/
printMsg(myIndex, p, "sending SIGCONT to %d", pc);
kill(pc, SIGCONT);
}

exit(EXIT_SUCCESS);
}

void handler(int sig)
{
switch (sig)
{
case SIGCHLD:
/* when the process received SIGCHLD
* we can ignore upcoming SIGCHLD
*/
action.sa_handler = SIG_IGN;
sigaction(SIGCHLD, &action, NULL);
break;
default:
break;
}
}


Here is a Makefile if you need:

CC=gcc
DEFINES=-D_POSIX_C_SOURCE
STD=-std=c11 -Wall -Werror
OPTS=-O2
CFLAGS=$(STD) $(DEFINES) $(OPTS) -g
LDFLAGS=

SRC=src
OBJ=obj
BIN=bin

DIRS=$(BIN) $(OBJ)

.PHONY: mkdirs clean distclean

all: mkdirs $(BIN)/trycont

$(BIN)/%: $(OBJ)/%.o
$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $<

$(OBJ)/%.o: $(SRC)/%.c
$(CC) $(CFLAGS) -c -o $@ $<

mkdirs:
- mkdir $(DIRS)

clean:
rm -vf -- $(OBJ)/*.o

distclean: clean
rm -vfr -- $(DIRS)

Answer

Some (all?) of your descendant processes are dying of a system-generated SIGHUP when the first process terminates.

This is expected POSIX behavior under certain circumstances.

When you start the root process from your shell, it is a process group leader, and its descendants are members of that group. When that leader terminates, the process group is orphaned. When the system detects a newly-orphaned process group in which any member is stopped, then every member of the process group is sent a SIGHUP followed by a SIGCONT.

So, some of your descendant processes are still stopped when the leader terminates, and thus everyone receives a SIGHUP followed by a SIGCONT, which for practical purposes mean they die of SIGHUP.

Exactly which descendants are still stopped (or even just merrily advancing toward exit()) is a timing race. On my system, the leader terminates so quickly that none of the descendants are able to print anything.

Comments