mike1781 mike1781 - 2 months ago 14x
R Question

Adding sample n-values to Likert plot in R

The Bryer Likert package has many useful features for plotting diverging bar charts of Likert-type data. However, one basic feature is missing -- there does not appear to be any way to show the total number of sample points for each question/group when printing out a bar chart. If one wants to include the histogram chart, then these n-values will appear in the histogram. But often I find the histogram makes the entire plot too busy.

For example, using the pisa dataset, I can plot a diverging bar chart for results grouped by country below.


items28 <- pisaitems[, substr(names(pisaitems), 1, 5) == "ST24Q"]

# Create the likert object using country as a grouping variable.
l28g <- likert(items28, grouping = pisaitems$CNT)

# Optional - print a summary.

# Plot the bar chart.

The resulting plot should look like this:
diverging bar chart

But unless I also include a histogram somehow (which I don't want to do), there is no option to report the number of data points underlying each group/question. Currently I have no way of knowing (just by looking at the bar chart) whether the results are based on 5,000 responses or 10 responses. This information is easily accessed from the underlying data in many ways, for example, the following code yields the number of data points by each country for question ST24Q01:

margin.table(table(pisaitems$CNT, items28$ST24Q01), 1)

Ideally, I could create the plot of the data and somewhere on the graph (perhaps off the right hand side, like the HH package does?) report the n-value for each bar on the chart (i.e., each question/country).

I've fooled around with the
function but have been so far unable to figure out how to include the n-values in the output, and then translate those to the final plot/chart.

Any insights much appreciated!


In this case the counts don't vary by question, so you only need one table for number of responses. Below are ways to put number of responses next to each question, for cases where the number of responses varies, or as a single table.

Add Number of Responses by Question

One way to do this would be to modify the underlying code for likert.bar.plot to include the ability to add counts of responses. Here I've just hacked the output of likert.bar.plot to add the response counts after the fact.


First, get response counts by Item for each CNT. The variable=NA at the end is there because the original data frame that likert.bar.plot generates in creating the plot creates and uses a column called variable. Even though we don't use that column in our subsequent call to geom_text with the new data frame below, ggplot still expects that colunmn to be present in the new data frame.

counts = pisaitems %>%
  select(CNT, matches("ST24Q")) %>% 
  melt(id.var="CNT", variable.name="Item") %>%
  count(CNT, Item) %>%

We use geom_text to add response counts by item, but we need to make a few other changes to the output of plot(l28g), as follows:

  1. Expand the y-axis limits using scale_y_continuous out to 150 so that the text values (which I've put at 145) will be visible. This overrides the y-scale in the original plot created by plot(l28g) (which calls likert.bar.plot to actually produce the plot).

  2. Set the visible y-axis range to stop at 110. We do this inside coord_flip(), which overrides the original coord_flip() from likert.bar.plot. We do this so that the text for the number of responses will be just to the right of the plot area, rather than inside it.

  3. Increase the right plot margin, so that there will be some space to the right of the plot.

  4. Turn off clipping, so that text printed outside the plot area will be visible.

Here's the plot code. It might take several seconds to render, so be patient.

p = plot(l28g) + 
            aes(label=format(n,big.mark=","), x=CNT, y=145), 
            size=2.5, colour="grey30", hjust=1) +
  scale_y_continuous(limits=c(-100,150)) +
  coord_flip(ylim=c(-110,110)) +

# Turn off clipping
# http://stackoverflow.com/a/9691256/496488
p <- ggplot_gtable(ggplot_build(p))
p$layout$clip <- "off"

enter image description here

Add Number of Responses in A Single Table

One option would be to create a table grob (grob = graphical object) and lay it out along side or below the main plot. For example:


tt <- ttheme_default(

                         textGrob("Number of Responses", 
                         tableGrob(pisaitems %>% 
                                     rename(Country=CNT) %>% 
                                     count(Country) %>%
                                     mutate(n=format(n, big.mark=",")), 
                                   theme=tt, rows=NULL),

enter image description here