Dirty Sock Sniffer Dirty Sock Sniffer - 3 months ago 6
R Question

Why do powers of 10 print in scientific notation at the 5th power?

I would like to know if and how the powers of 10 are related to the printing of scientific notation in the console. I've searched R docs and haven't found anything relevant, or that I really understand.

First off, my

scipen
and
digits
settings are

unlist(options("scipen", "digits"))
# scipen digits
# 0 7


Now, powers of 10 are printed normally up to the 4th power, and then printing switches to scientific notation at the 5th power.

10^(1:4)
# [1] 10 100 1000 10000
10^(1:5)
# [1] 1e+01 1e+02 1e+03 1e+04 1e+05


Interestingly, this does not happen for some other numbers larger than 10.

11^(1:5)
# [1] 11 121 1331 14641 161051


Judging from the following, 5 digits seem significant.

100^(1:2)
# [1] 100 10000
100^(1:3)
# [1] 1e+02 1e+04 1e+06


So my questions then are:

Why is scientific notation activated between the 4th and 5th power for 10 and not for other numbers? Is the number 5 significant? Furthermore, why 5 and not a number closer to the maximum digits option of 22?

Answer

Well, the answer is actually there in the definition of scipen in ?options, although it's pretty hard to understand what it means without playing around with some examples:

‘scipen’: integer. A penalty to be applied when deciding to print numeric values in fixed or exponential notation. Positive values bias towards fixed and negative towards scientific notation: fixed notation will be preferred unless it is more than ‘scipen’ digits wider.

To see what that means, examine the following three pairs of exactly identical numbers. In the first two cases, the width in characters of the fixed notation that is less than or equal to the width of the scientific, so fixed notation is preferred.

In the third case, though, the fixed notation is wider (i.e. "more than 0 digits wider"), because the 5 zeros amount to more characters than the 4 characters used to represent the same value using e+nn. As a result, in that case scientific notation is preferred.

1e+03
1000
# [1] 1000

1e+04
10000
# [1] 10000

1e+05
100000      ## <- wider
# [1] 1e+05

Next, examine some numbers that also end with lots of zeros, but whose representation in scientific notation will require use of a .. For these numbers, scientific notation will be used once you have 6 or more zeros (i.e. more than the 5 characters taken up by one . and the characters e+nn).

1.1e+06
1100000
# [1] 1100000


1.1e+07
11000000     ##  <- wider
# [1] 1.1e+07

Reasoning about the tradeoff gets a bit trickier for most other numbers, for which the values of both options("scipen") and options("digits") come into play, but the general idea is exactly the same.

To see some of the slightly surprising complications that come into play, you might want to paste the following into your console (perhaps after first trying to predict where within each series the switch to scientific notation will occur).

100001
1000001
10000001
100000001
1000000001
10000000001
100000000001
1000000000001

111111
1111111
11111111
111111111
1111111111
11111111111
111111111111
1111111111111
Comments