David Okwii David Okwii - 6 months ago 39
Linux Question

Return value from Linux bash recursive function

am facing some weird behavior with my bash script. It's basically a script that tries to ping a remote host a number of time if it fails the first time. I do this so as to rule out any false alert. I thought I would quickly achieve this by writing a recursive function that calls itself and attempts the ping again.

My problem is with the returned value. I've found out that the function returns the returned value multiple times corresponding to the number of times the recursion was made. This is very odd. For instance in my code below, the ip_up() function is supposed to return 1 for remote host up, and 0 for down. However, when the remote host is down, the function returns 0 twice which corresponds to the recursions made.

What could be the problem with my code or is this how bash works?

#!/bin/bash
ip_up(){
server_ip=$1
trials=$2
max_trials=2
status=0
echo "server ip is: $server_ip, trial $trials" >&2
if ping -i 1 -c 3 "$server_ip" &> /dev/null
then
status=1
else
status=0
while (( "$trials" < "$max_trials" )); do
echo -e "$server_ip is down: Trial $trials, checking again after 1 sec" >&2
sleep 1
((trials++))
ip_up "$server_ip" "$trials"
done
fi
echo "$status"
}

status=$(ip_up "$ip" 1)
echo -e "the returned status is: ====$status====\n"
if [ "$server_status" -eq 0 ]; then
msg="$timestamp: Server $hostname ($ip) is DOWN"; echo "$msg"
fi

<<'COMMENT'
//results

$ ./check_servers.sh
checking box1(173.36.232.6)
server ip is: 173.36.232.6, trial 1
173.36.232.6 is down: Trial 1, checking again after 1 sec
server ip is: 173.36.232.6, trial 2
the returned status is: ====0
0====

./check_servers.sh: line 41: [: 0
0: integer expression expected
Sat Jun 4 15:16:11 EAT 2016 box2 (173.36.232.7) is UP
checking box2 (173.36.232.7)
server ip is: 173.36.232.7, trial 1
the returned status is: ====1====

COMMENT

Answer

I can't imagine many circumstances where I'd be using code with a one second delay in the loop often enough to make it worth writing as a function — I'd use a relatively straight-forward (iterative) script. However, it is far from impossible to turn the script into a function if you're sure that's a benefit to you; your circumstances are different from mine.

#!/bin/sh

[ $# = 1 ] || [ $# = 2 ] || { echo "Usage: $0 ip-address [max-trials]" >&2; exit 1; }
server_ip="$1"
maxtrials="${2:-2}"
trial=1

while echo "server: $server_ip, trial $trial" >&2
      ! ping -i 1 -c 3 "$server_ip" > /dev/null 2>&1 || exit 0
do
    trial=$(($trial + 1))
    [ "$trial" -gt "$maxtrials" ] && break
    echo "$0: $server_ip is down: checking again after 1 sec" >&2
    sleep 1
done

echo "$(date +'%Y-%m-%d %H:%M:%S'): Server $server_ip is DOWN"
exit 1

The first block of code sets up the controls, defaulting to 2 attempts.

The while loop control contains the echo and then attempts to ping the IP address (or host name). If the command succeeds (the host is pingable), then the ! ping status is false, so the || exit 0 is executed, and the script exits with a 0 status, indicating success (the host is pingable). If the command fails (the host is not pingable), then the ! ping status is true, so the || exit 0 is not executed, and the body of the loop is entered. It increments the trial number and breaks the loop if the limit is reached. Otherwise, it prints its message and sleeps and goes back to the start of the loop.

The end block is only reached if the exit 0 was not executed, so the ping failed and the server is 'down' (or non-existent). You then get a time-stamped message indicating that the server is down, and exit with a non-zero status to indicate failure.

There are probably a myriad other ways to do this. I'd probably be more consistent with the error messaging — for example, I might well save arg0="$(basename "$0" .sh)" and then use $arg0 as a prefix to all messages (or possibly add it after the timestamp). It's possible to adapt this to report that the server is up. The code works with POSIX shells, not just Bash (so dash accepts it, for example, as does Korn shell, but the Heirloom (Bourne) Shell doesn't because it doesn't like either $(…) or $((…))).

It would also be possible to write it as a simple counting loop which tests the status of ping, exiting on success, and doing the reporting and retry. However, it's tricky to avoid a last sleep 1 when the loop will exit without double testing the value of $trial. That isn't expensive at run-time, but it is a source of repetition and DRY — Don't Repeat Yourself — is a worthwhile principle to live up to.

#!/bin/bash

[ $# = 1 ] || [ $# = 2 ] || { echo "Usage: $0 ip-address [max-trials]" >&2; exit 1; }
server_ip="$1"
maxtrials="${2:-2}"

for ((trial = 1; trial <= maxtrials; trial++))
do
    echo "server: $server_ip, trial $trial" >&2
    if ping -i 1 -c 3 "$server_ip" > /dev/null 2>&1
    then exit 0
    elif [ "$trial" -lt "$maxtrials" ]
    then
        echo "$0: $server_ip is down: checking again after 1 sec" >&2
        sleep 1
    fi
done

echo "$(date +'%Y-%m-%d %H:%M:%S'): Server $server_ip is DOWN"
exit 1

I'm not entirely keen on that, but it works with Bash and Korn shell.

Converting the last script to a function is basically trivial — change the exit statements into return statements, and wrap a function start and end around it:

#!/bin/bash

function upip()
{
    [ $# = 1 ] || [ $# = 2 ] || { echo "Usage: $0 ip-address [max-trials]" >&2; return 1; }
    server_ip="$1"
    maxtrials="${2:-2}"

    for ((trial = 1; trial <= maxtrials; trial++))
    do
        echo "server: $server_ip, trial $trial" >&2
        if ping -i 1 -c 3 "$server_ip" > /dev/null 2>&1
        then return 0
        elif [ "$trial" -lt "$maxtrials" ]
        then
            echo "$0: $server_ip is down: checking again after 1 sec" >&2
            sleep 1
        fi
    done

    echo "$(date +'%Y-%m-%d %H:%M:%S'): Server $server_ip is DOWN"
    return 1
}

Saved in upip-func.sh, I read the function:

$ . upip-func.sh
$ upip www.google.com
server: www.google.com, trial 1
$ echo $?
0
$ upip ping.google.com
server: ping.google.com, trial 1
bash: ping.google.com is down: checking again after 1 sec
server: ping.google.com, trial 2
2016-06-06 00:35:18: Server ping.google.com is DOWN
$ echo $?
1
$ if upip www.google.com; then echo OK; else echo Fail; fi
server: www.google.com, trial 1
OK
$ if upip ping.google.com; then echo OK; else echo Fail; fi
server: ping.google.com, trial 1
bash: ping.google.com is down: checking again after 1 sec
server: ping.google.com, trial 2
2016-06-06 00:38:32: Server ping.google.com is DOWN
Fail
$