Shaxi Liver Shaxi Liver - 2 months ago 13
R Question

Find the deviation from the norm and make a graph

I expected that it will be easier to do but I am stuck a little bit, maybe too tired today. Let's start with a data:

That's a data which I will call a reference:

> dput(data_db))
structure(list(`Name` = c("Mark", "Taylor", "Greg",
"Matt", "Jose", "Tito"), `App` = c(13.8,
5.8, 5.7, 7, 2.2, 0.8)), .Names = c("Name", "App"
), row.names = c(1L, 2L, 3L, 4L, 5L, 7L), class = "data.frame")

That's the data frame with only two columns and I would like to use the values stored in this data as reference.

That's "experimental" data:

> dput(vec_app)
structure(c(11.2486020246044, 27.9095887912373, 2.66645609602021,
2.98274862650751, 4.59749360062788, 2.55364011307289, 11.7322396774642,
19.7441226589095, 28.5664707877918, 3.57742181540809, 2.49765817934088,
22.7248069645865, 2.19587564508074, 5.84484370131893, 16.5705533218457
), .Names = c("Mark_1", "Mark_2", "Taylor_1", "Taylor_2",
"Greg_1", "Greg_2", "Greg_3", "Matt_1", "Matt_2",
"Jose_1", "Jose_2", "Jose_3", "Jose_4", "Jose_5",

Data is stored in form of numeric vector. As we can see the names in this vector are similar to the one coming from reference data. Values coming from different experiments are separated by
and the number of experiment. As you see the number of experiment is different for each variable.

I would like to find across the all the experiments the closest value to the one coming from reference and plot it in form of "regression". Look at attached example draw in paint.

My drawing skills

Red line shows the data for reference. Blue dots represents the closest value for each name established in one of the experiments. Of course there are more dots than in provided data. It's just an example.

Hopefully, you understand what I would like to show here and maybe you would like to offer any other way to visualize it.


First you need to get the names corresponding to the experiment:

names_vec_app <- sub("([^_])_\\d+", "\\1", names(vec_app))

You also need to compute the difference, with reference to the value in the first data.frame with corresponding name:

diff_app_ref <- vec_app-data_db$App[match(names_vec_app, data_db$Name)]

Finally, you need to get the one with the absolute minimum difference per name:

absminbyname <- aggregate(diff_app_ref ~ names_vec_app, FUN=function(x) x[which.min(abs(x))])
#  names_vec_app absdiff_app_ref
#1          Greg     -1.102506399
#2          Jose     -0.004124355
#3          Mark     -2.551397975
#4          Matt     12.744122659
#5        Taylor     -2.817251373
#6          Tito     15.770553322

Then you can plot your values in a way you find the most suitable.
For example:

plot(1:nrow(absminbyname), absminbyname$diff_app_ref, axes=FALSE, xlab="names", ylab="min difference", pch=19, col="blue", ylim=c(floor(min(absminbyname$diff_app_ref)), ceiling(max(absminbyname$diff_app_ref))))
abline(h=0, col="red")
axis(2, at=floor(min(absminbyname$diff_app_ref)): ceiling(max(absminbyname$diff_app_ref)))
axis(1, at=1:nrow(absminbyname), labels=absminbyname$names_vec_app)

enter image description here