DomainsFeatured DomainsFeatured - 11 months ago 178
Linux Question

Fix Mismatch Between Data And Local In Awk Command

I am receiving the following error:

awk: cmd. line:1: (FILENAME=- FNR=798) warning: Invalid multibyte data detected. There may be a mismatch between your data and your locale.

The command I'm running is the following:

cat file.txt | awk 'length($0)<10000' > output-file.txt

The weird part is that if I pipe to other commands like
awk '{ sub("\r$", ""); print }'
, it works just fine without an error.

Anyone see why I would get this error? Or, should I just ignore it?

Answer Source

Make the locale as C to use only ASCII character set with single byte encoding, pass LC_ALL=C to awk's environment:

LC_ALL=C awk 'length($0)<10000' file.txt >output-file.txt

Also you don't need to use cat as awk takes filename(s) as argument(s).