Puneet Jain Puneet Jain - 3 months ago 15
Bash Question

String manipulation via script

I am trying to get a substring between

&DEST=
and the next
&
or a line break.
For example :


  1. MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546


    In this I need to extract "SFO"

  2. MYREQUESTISTO8764GETTHIS&DEST=SANFRANSISCO&ORIG=6546


    In this I need to extract "SANFRANSISCO"

  3. MYREQUESTISTO8764GETTHISWITH&DEST=SANJOSE


    In this I need to extract "SANJOSE"



I am reading a file line by line, and I need to update the text after
&DEST=
and put it back in the file. The modification of the text is to mask the dest value with X character.

So, SFO should be replaced with XXX.
SANJOSE should be replaced with XXXXXXX.

Output :
MYREQUESTISTO8764GETTHIS&DEST=XXX&ORIG=6546

MYREQUESTISTO8764GETTHIS&DEST=XXXXXXXXXXXX&ORIG=6546

MYREQUESTISTO8764GETTHISWITH&DEST=XXXXXXX


Please let me know how to achieve this in script (Preferably shell or bash script).

Thanks.

Answer

Replacing airports with an equal number of Xs

Let's consider this test file:

$ cat file
MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546
MYREQUESTISTO8764GETTHIS&DEST=SANFRANSISCO&ORIG=6546
MYREQUESTISTO8764GETTHISWITH&DEST=SANJOSE

To replace the strings after &DEST= with an equal length of X and using GNU sed:

$ sed -E ':a; s/(&DEST=X*)[^X&]/\1X/; ta' file
MYREQUESTISTO8764GETTHIS&DEST=XXX&ORIG=6546
MYREQUESTISTO8764GETTHIS&DEST=XXXXXXXXXXXX&ORIG=6546
MYREQUESTISTO8764GETTHISWITH&DEST=XXXXXXX

To replace the file in-place:

sed -i -E ':a; s/(&DEST=X*)[^X&]/\1X/; ta' file

The above was tested with GNU sed. For BSD (OSX) sed, try:

sed -Ee :a -e 's/(&DEST=X*)[^X&]/\1X/' -e ta file

Or, to change in-place with BSD(OSX) sed, try:

sed -i '' -Ee :a -e 's/(&DEST=X*)[^X&]/\1X/' -e ta file

If there is some reason why it is important to use the shell to read the file line-by-line:

while IFS= read -r line
do
   echo "$line" | sed -Ee :a -e 's/(&DEST=X*)[^X&]/\1X/' -e ta
done <file

How it works

Let's consider this code:

search_str="&DEST="
newfile=chart.txt
sed -E ':a; s/('"$search_str"'X*)[^X&]/\1X/; ta' "$newfile"
  • -E

    This tells sed to use Extended Regular Expressions (ERE). This has the advantage of requiring fewer backslashes to escape things.

  • :a

    This creates a label a.

  • s/('"$search_str"'X*)[^X&]/\1X/

    This looks for $search_str followed by any number of X followed by any character that is not X or &. Because of the parens, everything except that last character is saved into group 1. This string is replaced by group 1, denoted \1 and an X.

  • ta

    In sed, t is a test command. If the substitution was made (meaning that some character needed to be replaced by X), then the test evaluates to true and, in that case, ta tells sed to jump to label a.

    This test-and-jump causes the substitution to be repeated as many times as necessary.

Replacing multiple tags with one sed command

$ name='DEST|ORIG'; sed -E ':a; s/(&('"$name"')=X*)[^X&]/\1X/; ta' file
MYREQUESTISTO8764GETTHIS&DEST=XXX&ORIG=XXXX
MYREQUESTISTO8764GETTHIS&DEST=XXXXXXXXXXXX&ORIG=XXXX
MYREQUESTISTO8764GETTHISWITH&DEST=XXXXXXX

Answer for original question

Using shell

$ s='MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546'
$ s=${s#*&DEST=}
$ echo ${s%%&*}
SFO

How it works:

  • ${s#*&DEST=} is prefix removal. This removes all text up to and including the first occurrence of &DEST=.

  • ${s%%&*} is suffix removal_. It removes all text from the first & to the end of the string.

Using awk

$ echo 'MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546' | awk -F'[=\n]' '$1=="DEST"{print $2}' RS='&'
SFO

How it works:

  • -F'[=\n]'

    This tells awk to treat either an equal sign or a newline as the field separator

  • $1=="DEST"{print $2}

    If the first field is DEST, then print the second field.

  • RS='&'

    This sets the record separator to &.