lawyeR - 2 months ago 3x

R Question

What I would like to do is:

**a)** have the plot produced by the

`ggplot`

Here is some data:

`dput(df)`

structure(list(Firm = c("a verylongname", "b verylongname", "c verylongname",

"d verylongname", "e verylongname", "f verylongname", "g verylongname",

"h verylongname", "i verylongname", "j verylongname"), Sum = c(74,

77, 79, 82, 85, 85, 88, 90, 90, 92)), .Names = c("Firm", "Sum"

), row.names = c(NA, 10L), class = "data.frame")

Here is

`ggplot`

`ggplot(df, aes(x = reorder(Firm, Sum, mean), y = Sum)) +`

geom_text(aes(label = Firm), size = 3, show.guides = FALSE, position = position_jitter(height = .9)) +

theme(axis.text.x = element_blank()) +

scale_x_discrete(expand = c(-1.1, 0)) + # to show the lower left name fully

labs(x = "", y = "", title = "")

Notice one version of the plot still overlaps h and i -- each time I run the above code the locations of the text labels change.

BTW, this question conditional jitter shifts the discrete values on the x-axis a bit, but I would like to shift the overlapping points (only) on the y-axis.

Answer

One option is to add a column to mark overlapping points and then plot those separately. A better option might be to directly shift the y-values of the overlapping points, so that we get direct control over their placement. I show both options below.

**Option 1 (jitter):** First, add a column to mark overlaps. In this case, because the points pretty much fall on a line, we can mark any points as overlapping if their y-values are too close. You can include more complex conditions if it's important to check whether the x-values are close as well.

```
df$overlap = lapply(1:nrow(df), function(i) {
if(min(abs(df[i, "Sum"] - df$Sum[-i])) <= 1) "Overlap" else "Ignore"
})
```

In the plot, I've colored the jittered points red so it's easy to tell which ones were affected.

```
# Add set.seed() here to make jitter reproducible
ggplot(df, aes(x = reorder(Firm, Sum, mean))) +
geom_text(data=df[df$overlap=="Overlap",],
aes(label = Firm, y = Sum), size = 3,
position = position_jitter(width=0, height = 1), colour="red") +
geom_text(data=df[df$overlap=="Ignore",],
aes(label = Firm, y = Sum), size = 3) +
theme(axis.text.x = element_blank()) +
scale_x_discrete(expand = c(-1.1, 0)) + # to show the lower left name fully
labs(x = "", y = "", title = "")
```

**Option 2 (direct placement):** Another option is to directly control how much the labels are shifted, rather than taking whatever `jitter`

happens to give us. In this case, we know that we want to shift each pair of points with the same y-value. More complex logic would be necessary in cases where we need to worry about both x and y values, more than two points in the same overlap, and/or where we need to shift values that are close, but not exactly the same.

```
library(dplyr)
# Create a new column that shifts pairs of points with the same y-value by +/- 0.25
df = df %>% group_by(Sum) %>%
mutate(SumNoOverlap = if(n()>1) Sum + c(-0.25,0.25) else Sum)
ggplot(df, aes(x = reorder(Firm, Sum, mean), y = SumNoOverlap)) +
geom_text(aes(label = Firm), size = 3) +
theme(axis.text.x = element_blank()) +
scale_x_discrete(expand = c(-1.1, 0)) + # to show the lower left name fully
labs(x = "", y = "", title = "")
```

**Note:** To make jitter reproducible, add `set.seed(153)`

(or whatever seed value you want) before the jittered plot code.

Source (Stackoverflow)

Comments