Zlo Zlo - 17 days ago 6
R Question

dplyr summarize by string

I have a dataframe that has numeric and string values, for example:

mydf <- data.frame(id = c(1, 2, 1, 2, 3, 4),
value = c(32, 12, 43, 6, 50, 20),
text = c('A', 'B', 'A', 'B', 'C', 'D'))


The value of
id
variable always corresponds to
text
variable, e.g.,
id == 1
will always be
text == 'A'
.

Now, I want to summarize this dataframe by
id
(or by
text
, since it's the same thing):

mydf %>%
group_by(id) %>%
summarize(mean_value = mean(value))


This works nicely, but I also need the
text
variable, since I wan t to do text analysis.

However, when I add
text
to the dplyr pipe:

mydf %>%
group_by(id) %>%
summarize(mean_value = mean(value),
text = text)


I get the following error:


Error: expecting a single value


Since
text
for
id
is always the same, is it possible to append it to the summarized dataframe?

Answer

summarize function needs to apply some functions on input, so we can either keep text out of it and keep together with id within group_by, or use first function within summarize:

# text should be in group_by to show up in result
mydf %>%
  group_by(id, text) %>%
  summarize(mean_value = mean(value))

# or within summarise use first function, to take the first value when grouped
mydf %>%
  group_by(id) %>%
  summarize(mean_value = mean(value),
            text = first(text))
Comments