kaan kaan - 4 months ago 15
Bash Question

Bash script size limitation?

I have a bash script that, when run on RHEL or OS X, gives the following error:


line 62484: syntax error near unexpected token `newline'

line 62484: ` -o_gz'


This is an auto-generated script to work around a limitation introduced by the grid engine compute cluster used in my company. It is all composed of a bunch of almost-identical
if/elif
's. I can't see anything special with the line where the error comes from. When I run the beginning part of the script before the error line, it works without problems. This makes me think that there may be some bash script length limitation. The only reference I could find on the web was the comment by iAdjunct.

The part of the script around the error looks like this (with some simplifications):

.
.
.
.
elif [ $task_number -eq 2499 ]
then
/some/tool/executable \
-use_prephased_g \
-m \
/some/text/file \
-h \
/some/zipped/file \
-l \
-int \
45063854 \
46063853 \
-Ne \
20000 \
-o \
/some/output/file \
-verbose \
-o_gz #==============> ****THIS IS LINE 62484****
elif [ $task_number -eq 2500 ]
then
/some/tool/executable \
-use_prephased_g \
-m \
/some/other/text/file \
-h \
/some/other/zipped/file \
-l \
-int \
98232182 \
99232182 \
-Ne \
20000 \
-o \
/some/other/output/file \
-verbose \
-o_gz
elif [ $task_number -eq 2501 ]
.
.
.
.


Does this ring any bells for anyone?

Answer

Yes, this is a limitation with bash.

It's not exactly a size limit; rather it is a limit to the depth of the parser stack, which has the effect of restricting the complexity of certain constructs. In particular, it will restrict the number of elif clauses in an if statement to about 2500.

There is a longer analysis of this problem with respect to a different syntactic construct (iterated pipes) in my answer to a question on the Unix & Linux stackexchange site.

case statements don't have this limitation, and the sample you provide certainly looks like a good match for a case statement.

(The difference with case statements is that the grammar for if conditional statements, like that of pipe constructs, is right recursive, while the grammar for case statements is left recursive. The reason the limitation on if statements is different from the limitation on pipes is that the grammatical construct for an elif clause has one more symbol, so each repetition uses four stack slots rather than three.)

If the case statement doesn't work for you -- or even if it does -- you could try building a precompiled binary search tree of if statements:

if (( task_number < 8 )); then
  if (( task_number < 4 )); then
    if (( task_number < 2 )); then
      if (( task_number < 1)); then
        # do task 0
      else
        # do task 1
      fi;
    elif (( task_number < 3 )); then
      # do task 2
    else
      # do task 3
    fi
  elif (( task_number < 6 )); then
    if (( task_number < 5 )); then
      # do task 4
    else
      # do task 5
    fi
  elif (( task_number < 7 )); then
    # do task 6
  else
    # do task 7
  fi
elif (( task_number < 12 )); then
  if (( task_number < 10 )); then
    if (( task_number < 9 )); then
      # do task 8
    else
      # do task 9
    fi
  elif (( task_number < 11 )); then
    # do task 10
  else
    # do task 11
  fi
elif (( task_number < 14 )); then
  if (( task_number < 13 )); then
    # do task 12
  else
    # do task 13
  fi
elif (( task_number < 15 )); then
  # do task 14
else
  # do task 15
fi

Because each complete if statement only occupies a single stack node after it is recognized, the complexity limitation will be on the nesting depth of the if statements rather than the number of clauses. As an additional bonus, it will execute a lot fewer comparisons in the average case.

Comments