agenis - 6 months ago 51

R Question

I want to represent the structure of a data.frame (or matrix, or data.table whatever) on a single plot with colorcoding. I guess that could be very useful for many people handling various types of data, to visualize it in a single glance.

Perhaps someone have already developed a package to do it, but I couldn't find one (just this). So here is a rough mockup of my "vision", kind of a heatmap, showing in color codes:

- the NA locations,
- the class of variables (factors (how many levels?), numeric (with color gradient, zeros, outliers...), strings)
- dimensions
- etc.....

So far I have just written a function to plot the NA locations it goes like this:

`ggSTR = function(data, alpha=0.5){`

require(ggplot2)

DF <- data

if (!is.matrix(data)) DF <- as.matrix(DF)

to.plot <- cbind.data.frame('y'=rep(1:nrow(DF), each=ncol(DF)),

'x'=as.logical(t(is.na(DF)))*rep(1:ncol(DF), nrow(DF)))

size <- 20 / log( prod(dim(DF)) ) # size of point depend on size of table

g <- ggplot(data=to.plot) + aes(x,y) +

geom_point(size=size, color="red", alpha=alpha) +

scale_y_reverse() + xlim(1,ncol(DF)) +

ggtitle("location of NAs in the data frame")

pc <- round(sum(is.na(DF))/prod(dim(DF))*100, 2) # % NA

print(paste("percentage of NA data: ", pc))

return(g)

}

It takes any data.frame in input and returns this image:

It's too big a challenge for me to achieve the first image.

Answer

Have you encountered the CSV fingerprint service? It creates a similar image, althought not with all the details you have outlined above, and it's not based on R. There is an R version of a similar idea at R-ohjelmointi.org, but the text is in Finnish. The main function is `csvSormenjalki()`

. Maybe that could be adapted further to fulfill your whole vision?