user327407 user327407 -4 years ago 56
Linux Question

Why does behavior of set -e in a function change when that function is called in a compound command w/ || or &&?

I narrowed my problem to a simple example which puzzles me.

I have tested it with GNU bash 4.2.46 on Centos and 4.3.46 on Ubuntu.

Here is a bash function that returns non-zero (error) return code when called alone but reverses its behavior when I use either && or || to chain another command. It looks like a bug to me. Can someone explain why it is behaving as such?

$ echo $0
/bin/bash
$ function TEST() {( set -e; set -o pipefail; echo OK; false; echo NOT REACHED; )}
$ type TEST
TEST is a function
TEST ()
{
( set -e;
set -o pipefail;
echo OK;
false;
echo NOT REACHED )
}
$ TEST
OK
$ echo $?
1
$ TEST || echo "NON ZERO"
OK
NOT REACHED
$ echo $?
0
$ TEST && echo "UNEXPECTED"
OK
NOT REACHED
UNEXPECTED
$ echo $?
0

Answer Source

What you are seeing is the shell doing what it is specified to do. Non-zero return codes in if statements and loops, and || && logical operators do not trigger detection by set -e or traps. This makes serious error handling more difficult than in other languages.

The root of all problems is that, in the shell, there is no difference between returning a non-zero code as a meaningful and intended status, or as the result of a command failing in an uncontrolled manner. Furthermore the special cases the shell has will disable checking at all depths in the call stack, not just the first one, entirely hiding nested failures from set -e and traps (this is pure evil if you ask me).

Here is a short example that shows what I mean.

#!/bin/bash

nested_function()
{
returnn 0 ; # Voluntarily misspelled
}

test_function()
{
if
  [[ some_test ]]
then
  nested_function
else
  return 1
fi
}

set -e
trap 'echo Will never be called' ERR

if
  test_function
then
  echo "Test OK"
else
  echo "Test failed"
fi

There is an obvious bug in the first function. This function contains nothing that disables error checking, but since it is nested inside an if block (and not even directly, mind you), that error is completely ignored.

You do not have that problem in, say, Java, where a return value is one thing, and an exception is another thing, and where evaluating a return value in an if statement will not prevent an exception at any level in the call stack from doing its job. You have try/catch to handle exceptions, and there is no way to mix exceptions with return codes, they are fundamentally different things (exceptions can be used as return values, but do not trigger the exception mechanism then as when thrown).

If you want to have the same thing in shell programming, you have to build it for yourself. It can be done using a "try" function that is used in front of all calls and keeps state for each nested call, a "throw" equivalent that allows exceptions to be thrown (not as non-zero return codes, but stored inside variables), and trap ... ERR to intercept non-zero return codes and be able to do things like generate a stack trace and trigger a controlled exit (e.g. deleting temporary files, releasing other resources, performing notifications).

With this approach, "exceptions" are explicitly handled failures, and non-zero return codes are bugs. You trade a bit of performance I guess, it is not trivial to implement, and it requires a lot of discipline. In terms of ease of debugging and the level of complexity you can build in your script without being overwhelmed when trying to trace the source of a problem, it is a game changer though.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download