iqbal_cs iqbal_cs - 1 month ago 6
Linux Question

Why does the 'tail' command return the whole file content when using -c+1 or -c-1?

As the question itself, I cannot find out why

tail
acts that way.
I have a file named
myfile.txt
and its content is:

firstline
secondline
thirdline


So when I use:

tail -c-1 myfile.txt


or

tail -c+1 myfile.txt


it outputs:

firstline
secondline
thirdline


man tail
:


-c, --bytes = [+] NUM

output the last NUM bytes; or use -c +NUM to output starting
with byte NUM of each file

Answer
  • tail -c+1 myfile.txt is the same as cat myfile.txt: you're telling tail to start output with the first (+1) byte (-c), in other words: the whole file.

  • tail -c-1 myfile.txt (more typically: tail -c1 myfile.txt) outputs only the last byte in myfile.txt.
    Assuming that myfile.txt is a properly formatted text file that ends with a trailing \n, and uses either a single-byte encoding such as ASCII or one that has single-byte ASCII encoding as a subset, such as UTF-8, this will output just that \n, i.e., a blank line.

To put tail's basic logic in general terms (covers both the GNU and the BSD/macOS implementation):

tail [-<unit-type>] [+-]<unit-count>

  • <unit-type>

    • defaults to -n, meaning lines; since -<unit-type> is optional, you often see only <unit-count> specified, which then invariably refers to lines (e.g., tail -3, tail +2).
    • -c refers to bytes(!) and is not UTF8-aware in either implementation.
    • BSD/macOS tail additionally supports -b for 512-byte blocks.
  • If <unit-count> has no sign (e.g., 1), or an explicit minus (e.g., -1; never necessary), <unit-count> units are returned from the end of the input.

  • If <unit-count> is +-prefixed, the portion of the input that starts at position <unit-count> - taken as a 1-based(!) index - is returned, notably including that position; e.g., tail -n +2 requests everything starting from (and including) the 2nd line.

  • Omitting <unit-count> is possible, and defaults to 10, but that only works if you also omit -<unit-type>, which implies tail -n 10 and therefore: tail's default behavior is to output the input's last 10 lines.


If we apply this logic to the OP's follow-up question regarding the behavior of -c+0 and -c-0:

  • -c+0 is treated the same as -c+1 and therefore outputs the entire input (same as cat): you're asking for everything starting at the "zeroth" byte, which doesn't exist, but since 0 < 1, with 1 being the first actual byte position, you still get the entire input as output.

  • -c-0 outputs nothing at all, because you're asking to return zero bytes (in other words: nothing) from the end of the input.

Comments