I am trying to produce a geom_violin() plot overlayed with a geom_point() plot, in which the geom_point() plot has different colors of the points based on which subset I have categorized the data into.
I have an error saying "Error in eval(expr, envir, enclos) : object 'ind' not found" when attempting to load the subset dataframe when I do it within the geom_point() function, but I don't understand what I am doing wrong from poking around or googling the error.
(Without that row, the code runs and generates this output, which is what I want other than the color coding of the points: PDF output when the second geom_point is commented out)
Here is the nonsense dataset I used to try and make this work (gene1,2,3 are rownames):
#Make gene names into rownames
rownames(df_small_raw) <- df_small_raw$Name
#Remove "Name" column
df_small_raw$Name <- NULL
matrix_trim_transp <- t(df_small_raw)
#Make matrix_trim_transp matrix into dataframe
df_trim_transp <- as.data.frame(as.matrix(matrix_trim_transp))
#Change name of dataframe after transposition is complete
#Subset gene1 positive and negatve cells
df.positive <- subset(df, gene1 > 0)
#Convert data in data frames to log scale
df.log <- log(df+1)
df.positive.log <- log(df.positive+1)
#Violin plot for each gene with all cells (positive and negative with color coded scatter)
plot <- ggplot(stack(df.log), aes(x = ind, y = values, fill=ind)) +
geom_point(position = position_jitterdodge(jitter.width=4)) +
geom_point(data=df.positive.log, aes(x = ind, y = values, fill=ind), position = position_jitterdodge(jitter.width=4), color="red") +
xlab("Gene") + ylab("Expression level (TPM log)") +
theme_classic(base_size = 14, base_family = "Helvetica") +
theme(axis.title.y=element_text(size=14, face="bold")) +
theme(axis.title.x=element_text(size=14, face="bold")) +
plot + coord_cartesian(ylim = c(0, 8))
The below answer overlays a coloured violin plot with a jittered set of points that are coloured by positive or negative.
library(dplyr); library(ggplot2); library(tidyr) #read in data. df2 <-read.csv(textConnection(df), header=TRUE, row.names = 1) # Add in the rownames and gather the dataset df3 <- df2 %>% mutate(Gene= rownames(.)) %>% gather(., key= "cell", value="value", -Gene) %>% mutate(positive = value>0, absolute= abs(value), logabs= log(absolute+1)) df3 %>% ggplot(. , aes(x = Gene, y=logabs, fill=Gene)) + geom_violin() +geom_jitter( aes(colour= positive))
Is this what you were looking for?
EDIT: The read in data line, line pastes in the data you presented above into a text string, then converts the text string to a dataframe. If you already have the data frame it isn't necessary. It is only used as there was not dput() object available to use.