wishihadabettername - 1 year ago 131

R Question

I'm plotting a categorical variable and instead of showing the counts for each category value,

I'm looking for a way to get ggplot to display the percentage of values in that category. Of course, it is possible to create another variable with the calculated percentage and plot that one, but I have to do it several dozens of times and I hope to achieve that in one command.

I was experimenting with something like

`qplot(mydataf) +`

stat_bin(aes(n = nrow(mydataf), y = ..count../n)) +

scale_y_continuous(formatter = "percent")

but I must be using it incorrectly, as I got errors.

To easily reproduce the setup, here's a simplified example:

`mydata <- c ("aa", "bb", null, "bb", "cc", "aa", "aa", "aa", "ee", null, "cc");`

mydataf <- factor(mydata);

qplot (mydataf); #this shows the count, I'm looking to see % displayed.

In the real case I'll probably use ggplot instead of qplot, but the right way to use stat_bin still eludes me.

I've also tried these four approaches:

`ggplot(mydataf, aes(y = (..count..)/sum(..count..))) +`

scale_y_continuous(formatter = 'percent');

ggplot(mydataf, aes(y = (..count..)/sum(..count..))) +

scale_y_continuous(formatter = 'percent') + geom_bar();

ggplot(mydataf, aes(x = levels(mydataf), y = (..count..)/sum(..count..))) +

scale_y_continuous(formatter = 'percent');

ggplot(mydataf, aes(x = levels(mydataf), y = (..count..)/sum(..count..))) +

scale_y_continuous(formatter = 'percent') + geom_bar();

but all 4 give:

`Error: ggplot2 doesn't know how to deal with data of class factor`

The same error appears for the simple case of

`ggplot (data=mydataf, aes(levels(mydataf))) +`

geom_bar()

so it's clearly something about how ggplot interacts with a single vector. I'm scratching my head, googling for that error gives a single result.

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

Since this was answered there have been some meaningful changes to the ggplot syntax. Summing up the discussion in the comments above:

```
require(ggplot2)
require(scales)
p <- ggplot(mydataf, aes(x = foo)) +
geom_bar(aes(y = (..count..)/sum(..count..))) +
## version 3.0.9
# scale_y_continuous(labels = percent_format())
## version 3.1.0
scale_y_continuous(labels=percent)
```

Here's a reproducible example using `mtcars`

:

```
ggplot(mtcars, aes(x = factor(hp))) +
geom_bar(aes(y = (..count..)/sum(..count..))) +
## scale_y_continuous(labels = percent_format()) #version 3.0.9
scale_y_continuous(labels = percent) #version 3.1.0
```

This question is currently the #1 hit on google for 'ggplot count vs percentage histogram' so hopefully this helps distill all the information currently housed in comments on the accepted answer.

**Remark:** If `hp`

is not set as a factor, ggplot returns:

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**