heliophobicdude heliophobicdude - 6 months ago 17
Bash Question

Sed replacing only part of a longer match with a shorter replacement:

So I'm measuring the total time elapsed of c program. By doing so I have been running this shell script that uses sed to replace the value of a constant (below: N) defined somewhere in the middle of a line in my c program.

#define N 10 // This constant will be incremented by shell program


Before you tell me that I should be using a variable and timing the function that uses it, I have to time the whole execution of the program externally on a single run (meaning no reassignment of N).

I've been using the following in a shell script to help out:

tmp=$(sed "11s/[0-9][0-9]*/$INCREMENTINGVAR/" myprogram.c); printf "%s" "$tmp" > myprogram.c


That replaces a 3 digit number with whatever my INCREMENTINGVAR (replacement) is. However, this doesn't seem to work properly for me when the replacement is 2 digits long. Sed replaces only the first two characters and leaves the the previous 3rd digit from the previous run without deleting it.

TESTS=0
while [ $TESTS -lt 3 ]
do
echo "This is test: $TESTS"
INCREMENTINGVAR=10

while [ "$INCREMENTINGVAR" -lt 10 ]
do
tmp=$(sed "11s/[0-9][0-9]*/$INCREMENTINGVAR/" myprogram.c); printf "%s" "$tmp" > myprogram.c
rm -f myprog.c.bak
echo "$INCREMENTINGVAR"
gcc myprogram.c -o myprogram.out; ./myprogram.out
INCREMENTINGVAR=$((INCREMENTINGVAR+5))
done
TESTS=$((TESTS+1))
done


Is there something I should do instead?

edit: Added whole shell script; Changed pattern for sed.

Answer

Do you simply want to replace whatever digit string is on line 11 with the new value? If so, you'd write:

sed -e "11s/[0-9][0-9]*/$INCREMENTINGVAR/"

That looks for a sequence of one or more digits, and replaces it by the current value in $INCREMENTINGVAR. This will rollover from 9 to 10, and from 99 to 100, and from 999 to 1000, etc. Indeed, there's nothing to stop you jumping from 1 to 987,654 if that's what you want to do.

With the GNU and BSD (Mac OS X) versions of sed, you could overwrite the file automatically. The portable way (meaning, works the same with both GNU and BSD variants of sed), is:

sed -i.bak -e "11s/[0-9][0-9]*/$INCREMENTINGVAR/" myprog.c
rm -f myprog.c.bak

This creates a backup file (and removes it). The problem is that GNU sed requires just -i and BSD sed requires -i '' (two arguments) to do an in situ change without a backup. You can decide that portability is not relevant.


Note that using line number to identify what must be changed is delicate; trivial changes (a new header, more commentary) could change the line number. It would probably be better to use a context search:

sed -i.bak -e "/^#define N [0-9]/ s/[0-9][0-9]*/$INCREMENTINGVAR/" myprog.c
rm -f myprog.c.bak

This assumes spaces between define and N and the number. If you might have blanks or tabs in it, then you might write:

sed -i.bak -e "/^[[:space:]]*#[[:space:]]*define[[:space:]]\{1,\}N[[:space:]]*\{1,\}[0-9]/ s/[0-9][0-9]*/$INCREMENTINGVAR/" myprog.c
rm -f myprog.c.bak

That looks for optional leading white space before the #, optional white space between the # and the define, mandatory white space (at least one, possibly many) between define and N, and mandatory white space again between N and the first digit of the number. But probably your input isn't that sloppy and a simpler search pattern (like the first option) is sufficient to meet your needs. You could also write code to normalize eccentrically formatted #define lines into a canonical representation — but again, you most probably don't need to.

If you have somewhere else in the same file that contains something like this:

#undef N
#define N 100000

you would have to worry about the pattern matching this line too. However, few files do that; it isn't likely to be a problem in practice (and if it is, the code in general probably has more problems than can be dealt with here). One possibility would be to limit the range to the first 30 lines, assuming the first #define N 123 is somewhere in that range and the second is not.

sed -i.bak -e "1,30 { /^[[:space:]]*#[[:space:]]*define[[:space:]]\{1,\}N[[:space:]]*\{1,\}[0-9]/ s/[0-9][0-9]*/$INCREMENTINGVAR/; }" myprog.c
rm -f myprog.c.bak

There are multiple other tricks that could be pulled to limit the damage, with varying degrees of verbosity. For example:

sed -i.bak -e "1,/^[[:space:]]*#[[:space:]]*define[[:space:]]\{1,\}N[[:space:]]*\{1,\}[0-9]\{1,\}/ \
                s/^[[:space:]]*#[[:space:]]*define[[:space:]]\{1,\}N[[:space:]]*\{1,\}[0-9]\{1,\}/#define N $INCREMENTINGVAR/; }" myprog.c
rm -f myprog.c.bak

Working with regexes is generally a judgement call between specificity and verbosity — you can make things incredibly safe but incredibly difficult to read, or you can run a small risk that your more readable code will match something unintended.