ragesz - 2 months ago 10x
R Question

# R convert summary result (statistics with all dataframe columns) into dataframe

[I'm new to R...] I have this dataframe:

``````df1 <- data.frame(c(2,1,2), c(1,2,3,4,5,6), seq(141,170)) #create data.frame
names(df1) <- c('gender', 'age', 'height') #column names
``````

I want the
`df1`
's summary in a dataframe object that looks like this:

``````         count     mean    std      min      25%      50%      75%      max
age    30.0000   3.5000 1.7370   1.0000   2.0000   3.5000   5.0000   6.0000
gender 30.0000   1.6667 0.4795   1.0000   1.0000   2.0000   2.0000   2.0000
height 30.0000 155.5000 8.8034 141.0000 148.2500 155.5000 162.7500 170.0000
``````

I've generated this in Python with
`df1.describe().T`
. How can I do this in R?

It would be a gratis if my summary dataframe would contain the "dtype", "null" (number of
`NULL`
values), (number of) "unique" and "range" values as well to have a comprehensive summary statistics:

``````         count     mean    std      min      25%      50%      75%      max  null  unique  range  dtype
age    30.0000   3.5000 1.7370   1.0000   2.0000   3.5000   5.0000   6.0000     0       6      5  int64
gender 30.0000   1.6667 0.4795   1.0000   1.0000   2.0000   2.0000   2.0000     0       2      1  int64
height 30.0000 155.5000 8.8034 141.0000 148.2500 155.5000 162.7500 170.0000     0      30     29  int64
``````

The Python code of above result is:

``````df1.describe().T.join(pd.DataFrame(df1.isnull().sum(), columns=['null']))\
.join(pd.DataFrame.from_dict({i:df1[i].nunique() for i in df1.columns}, orient='index')\
.rename(columns={0:'unique'}))\
.join(pd.DataFrame.from_dict({i:(df1[i].max() - df1[i].min()) for i in df1.columns}, orient='index')\
.rename(columns={0:'range'}))\
.join(pd.DataFrame(df1.dtypes, columns=['dtype']))
``````

Thank you!

Answer

I commonly use a little function (adapted from a script found on the net) to do this kind of transformation:

``````sumstats = function(x) {
mean.k=function(x) {if (is.numeric(x)) round(mean(x), digits = 2)
else "N*N"}
median.k=function(x) {  if (is.numeric(x)) round(median(x), digits = 2)
else "N*N"}
sd.k=function(x) {  if (is.numeric(x)) round(sd(x), digits = 2)
else "N*N"}
min.k=function(x) {  if (is.numeric(x)) round(min(x), digits = 2)
else "N*N"}
max.k=function(x) {  if (is.numeric(x)) round(max(x), digits = 2)
else "N*N"}
sumtable <- cbind(as.matrix(colSums(!is.na(x))), sapply(x,mean.k), sapply(x,median.k), sapply(x,sd.k),  sapply(x,min.k), sapply(x,max.k))
sumtable <- as.data.frame(sumtable);  names(sumtable) <- c("Count","Mean","Med","sd","min","max")
return(sumtable)
}
sumstats(df1)
####        Count   Mean   Med   sd min max
#### gender    30   1.67   2.0 0.48   1   2
#### age       30   3.50   3.5 1.74   1   6
#### height    30 155.50 155.5 8.80 141 170
``````

You might easily adapt it to add more descriptive columns, such as quantiles, nulls, range, etc. It does return a data.frame. You also might want to specify in advance the behaviour with NAs in the arguments.

Hope it helps.

Comments