baz - 1 year ago 164
R Question

I have a set of data which looks something like this:

``````anim <- c(25499,25500,25501,25502,25503,25504)
sex  <- c(1,2,2,1,2,1)
wt   <- c(0.8,1.2,1.0,2.0,1.8,1.4)
data <- data.frame(anim,sex,wt)

data
anim sex  wt anim2
1 25499   1 0.8     2
2 25500   2 1.2     2
3 25501   2 1.0     2
4 25502   1 2.0     2
5 25503   2 1.8     2
6 25504   1 1.4     2
``````

I would like a zero to be added before each animal id:

``````data
anim sex  wt anim2
1 025499   1 0.8     2
2 025500   2 1.2     2
3 025501   2 1.0     2
4 025502   1 2.0     2
5 025503   2 1.8     2
6 025504   1 1.4     2
``````

And for interest sake, what if I need to add two or three zeros before the animal id's?

The short version: use `formatC` or `sprintf`.

The longer version:

There are several functions available for formatting numbers, including adding leading zeroes. Which one is best depends upon what other other formatting you want to do.

The example from the question is quite easy since all the values have the same number of digits to begin with, so let's try a harder example of making powers of 10 width 8 too.

``````anim <- 25499:25504
x <- 10 ^ (0:5)
``````

`paste` (and it's variant `paste0`) are often the first string manipulation functions that you come across. They aren't really designed for manipulating numbers, but they can be used for that. In the simple case where we always have to prepend a single zero, `paste0` is the best solution.

``````paste0("0", anim)
## [1] "025499" "025500" "025501" "025502" "025503" "025504"
``````

For the case where there are a variable number of digits in the numbers, you have to manually calculate how many zeroes to prepend, which is horrible enough that you should only do it out of morbid curiosity.

`str_pad` from `stringr` works similarly to `paste`, making it more explicit that you want to pad things.

``````library(stringr)
## [1] "025499" "025500" "025501" "025502" "025503" "025504"
``````

Again, it isn't really designed for use with numbers, so the harder case requires a little thinking about. We ought to just be able to say "pad with zeroes to width 8", but look at this output:

``````str_pad(x, 8, pad = "0")
## [1] "00000001" "00000010" "00000100" "00001000" "00010000" "0001e+05"
``````

You need to set the scientific penalty option so that numbers are always formatted using fixed notation (rather than scientific notation).

``````library(devtools)
## [1] "00000001" "00000010" "00000100" "00001000" "00010000" "00100000"
``````

`stri_pad` in `stringi` works exactly like `str_pad` from `stringr`.

`formatC` is an interface to the C function `printf`. Using it requires some knowledge of the arcana of that underlying function (see link). In this case, the important points are the `width` argument, `format` being `"d"` for "integer", and a `"0"` `flag` for prepending zeroes.

``````formatC(anim, width = 6, format = "d", flag = "0")
## [1] "025499" "025500" "025501" "025502" "025503" "025504"
formatC(x, width = 8, format = "d", flag = "0")
## [1] "00000001" "00000010" "00000100" "00001000" "00010000" "00100000"
``````

This is my favourite solution, since it is easy to tinker with changing the width, and the function is powerful enough to make other formatting changes.

`sprintf` is an interface to the C function of the same name; like `formatC` but with a different syntax.

``````sprintf("%06d", anim)
## [1] "025499" "025500" "025501" "025502" "025503" "025504"
sprintf("%08d", x)
## [1] "00000001" "00000010" "00000100" "00001000" "00010000" "00100000"
``````

The main advantage of `sprintf` is that you can embed formatted numbers inside longer bits of text.

``````sprintf(
"Animal ID %06d was a %s.",
anim,
sample(c("lion", "tiger"), length(anim), replace = TRUE)
)
## [1] "Animal ID 025499 was a tiger." "Animal ID 025500 was a tiger."
## [3] "Animal ID 025501 was a lion."  "Animal ID 025502 was a tiger."
## [5] "Animal ID 025503 was a tiger." "Animal ID 025504 was a lion."
``````

`format`, a generic function for formatting any kind of object, with a method for numbers. It works a little bit like `formatC`, but with yet another interface.
`prettyNum` is yet another formatting function, mostly for creating manual axis tick labels. It works particularly well for wide ranges of numbers.
The `scales` package has several functions such as `percent`, `date_format` and `dollar` for specialist format types.