RAS - 1 year ago 137

R Question

I have a data frame in this format:

`row.names 100 50 25 0`

metabolite1 113417.2998 62594.7067 39460.7705 1.223243e+02

metabolite2 3494058.7972 2046871.7446 1261278.2476 6.422864e+03

The columns refer to the concentrations of quality controls (%): 100, 50, 25, 0.

Currently to plot a single graph I am extracting the data into a new data frame and plotting it like this:

`metabolite1 <- data.frame(Numbers = c(100,50,25,0), Signal = c(113417.2998,62594.7067,39460.7705,122.3243))`

# Extract coefficient of variance for line of best fit

Coef <- coef(lm(Signal ~ Numbers, data = metabolite1))

# plot data

ggplot(metabolite1, aes(x = Numbers, y = Signal)) +

geom_point() +

xlim(0,100) +

geom_abline(intercept = Coef[1], slope = Coef[2])

This is extremely inefficient and I am trying to find a better way to plot separate scatter plots for each row rather than creating separate data frames. What would be a better way to do this? I have 160 metabolites I need to produce graphs for. I have attempted the melt the data frame into the format:

`Name variable value`

metabolite1 100 113417.2998

metabolite2 100 3494058.7972

metabolite1 50 62594.7067

metabolite2 50 2046871.7446

metabolite1 25 39460.7705

metabolite2 25 1261278.2476

metabolite1 0 1.223243e+02

metabolite2 0 6.422864e+03

and then use ggplot and faceting to plot the data

`ggplot(data = df, aes(x = variable, y = value)) +`

geom_point() + facet_grid(~ Name)

but the plots produced all have the same y axis scale which is not appropriate for the data I am working with. I'm assuming because of this I cannot use faceting to produce the plots.

EDIT: I do not know how to add separate lines of best fit to each plot without using geom_smooth, which I do not wish to do.

Answer Source

You're on the right track with your method of melting and faceting:

```
ggplot(data = df, aes(x = variable, y = value)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, lwd = .5, col = "black") +
facet_wrap(~ Name, scales = "free_y")
```

This yields similar plots as those you get from running `ggplot`

on subsets:

```
out <- lapply(list(metabolite1, metabolite2), function(d) {
Coef <- coef(lm(Signal ~ Numbers, data = d))
# plot data
p <- ggplot(d, aes(x = Numbers, y = Signal)) +
geom_point() +
xlim(0,100) +
geom_abline(intercept = Coef[1], slope = Coef[2])
})
gridExtra::grid.arrange(out[[1]], out[[2]], nrow = 1)
```