I am trying to find a regex for a bash shell script in Mac OS-X which replaces dots (.) into linebreaks (\n) in a big text file.
But dots used for common abbreviations like tel. etc. Mr. Ms. U.S. and some others should be excluded.
So far I am using sed for simple replacements already (but of course the ignore-part is missng):
LC_ALL=C sed -i "" -e "s/.*SEARCH.*/REPLACEMENT/" ascii.txt
Mr. Brown searches his fox. My tel. nr. can be found online. U.S. is a typical abbreviation for the United States.
Mr. Brown searches his fox.\n
My tel. nr. can be found online.\n
U.S. is a typical abbreviation for the United States.\n
You could use GNU
sed like this:
sed -r 's/\./\n/g; s/(Mr|tel|nr|U|S)\n/\1./g; s/\n */\n/g'
sed implementation does not support extended regular expressions, you need to say something like
sed 's/\./\n/g; s/\(Mr\|tel\|nr\|U\|S\)\n/\1./g; s/\n */\n/g'
sed implementation does not support that either, then you need to handle all abbreviations separately, e.g.
and so on. If your
sed implementation can handle that, either, then it's time to look for another operating system.