I have data like this:
x <- c("France:4|Morroco:8|Italy:2", "Scotland:6|Mexico:2", "Scotland:2")
> player_country_info <- data.frame(x)
> setnames(player_country_info, "player_country_data")
> names(player_country_info)
[1] "player_country_data"
> is.data.frame(player_country_info)
[1] TRUE
> head(player_country_info)
country_data
1 France:4|Morocco:8|Italy:2
2 Scotland:6|Mexico:2
3 Scotland:2
player_country_data.1 player_country_data.2 player_country_data.3
France:4 Morocco:8 Italy:2
Scotland:6 Mexico:2 NA
Scotland:2 NA NA
player_country_info %>% separate( col=player_country_data.1, into=c("country_name.1","player_count.1), sep=":")
country_name.1 player_count.1 country_name.2 player.2 country_name.3 player.3
France 4 Morocco 8 Italy 2
Scotland 6 Mexico 2
Scotland 2
With separate
from the tidyr
package:
library(tidyr)
country_info %>%
separate(country_data,
into = sprintf('%s.%s', rep(c('country','player.count'),3), rep(1:3, each=2)))
the result:
country.1 player.count.1 country.2 player.count.2 country.3 player.count.3
1 France 4 Morroco 8 Italy 2
2 Scotland 6 Mexico 2 <NA> <NA>
3 Scotland 2 <NA> <NA> <NA> <NA>
Separate automatically recognizes :
and |
as characters on which it has to separate. If you want to separate on a specific character, you need to specify that with the sep
argument. In this case you could use sep = '[:|]'
. This also prevents misbehavior of the automatic detection when there are missing values (see discussion in the comments).
With sprintf
you paste together the two vectors rep(c('country','player.count'),3)
and rep(1:3, each=2)
into a vector of column names where %s.%s
tells sprintf
to treat the two vectors are string-vectors and paste them together with a dot as separator. See ?sprintf
for more info. The each
argument tells rep
not to repete the whole vector a number of times, but to repete each element of the vector a number of times.