PesaThe PesaThe - 3 months ago 9
Linux Question

Linux sed ^[:blank:] does not match dot

I have an input as follows:


INa.aa................... October 2010 after its previous U.S.-based owners failed to pay debts


My goal is to put brackets around every word starting with letter
i
/
I
. So I issued a command:

sed 's/\<i[^[:blank:]]*\>/(&)/gi' input_data


Which returned this output:


(INa.aa)................... October 2010 after (its) previous U.S.-based owners failed to pay debts


What I don't get is, why doesn't the
^[:blank:]*
also include the dots after
INa.aa
?

Thank you for any suggestions.

Answer

You use the \> "end of word" escape. A word boundary is defined as

the character to the left is a "word" character and the character to the right is a "non-word" character, or vice-versa

in the manual (referring to \b). In the case of \>, the "vice-versa" does not apply.

What is a "word" character?

A "word" character is any letter or digit or the underscore character.

And "non-word" are all the others. You expect the boundary between your periods and a blank to match \>, but it doesn't: both the period and the blank are non-word characters. The word boundary is between the last a and the first ..

The period between the as is also surrounded by word boundaries, but because there aren't any blanks involved, it's a part of the match.

If you want to match everything up to the next blank, you can just skip the \> in your regex.