PesaThe PesaThe - 5 months ago 26
Linux Question

Linux sed ^[:blank:] does not match dot

I have an input as follows:

INa.aa................... October 2010 after its previous U.S.-based owners failed to pay debts

My goal is to put brackets around every word starting with letter
. So I issued a command:

sed 's/\<i[^[:blank:]]*\>/(&)/gi' input_data

Which returned this output:

(INa.aa)................... October 2010 after (its) previous U.S.-based owners failed to pay debts

What I don't get is, why doesn't the
also include the dots after

Thank you for any suggestions.


You use the \> "end of word" escape. A word boundary is defined as

the character to the left is a "word" character and the character to the right is a "non-word" character, or vice-versa

in the manual (referring to \b). In the case of \>, the "vice-versa" does not apply.

What is a "word" character?

A "word" character is any letter or digit or the underscore character.

And "non-word" are all the others. You expect the boundary between your periods and a blank to match \>, but it doesn't: both the period and the blank are non-word characters. The word boundary is between the last a and the first ..

The period between the as is also surrounded by word boundaries, but because there aren't any blanks involved, it's a part of the match.

If you want to match everything up to the next blank, you can just skip the \> in your regex.