abhiieor abhiieor - 3 months ago 6
R Question

Different spacing while printing to log

I am printing importance matrix of xgBoost into log using write command (write works with file connection and direct it to

stderr
well). Here is the command I am using:

importance_matrix <- xgb.importance(names, model=bst)
write("The top 30 variables are:",stderr())
write(paste0("Feature",'\t','\t','Gain','\t','Cover','\t','Frequency'),stderr())
write(t(as.matrix(importance_matrix[1:30,])),sep="\t",ncolumns = length(names(importance_matrix)),stderr())


Output comes in format:

Feature Gain Cover Frequency
pctTillDate 0.560359696 0.1314074664 0.024278250
colr_per 0.183149483 0.0962457545 0.049618673
date 0.050528297 0.1143752021 0.066395735
GREG_D 0.025648433 0.0381476142 0.018070143
LNGTD_I 0.020346020 0.0485235001 0.101322109
LATTD_I 0.019241497 0.0421892270 0.093867103


which make it look a bit clumsy (much clumsy in log than appearing here in SO). So in order to make it better looking I want to change last line of
t(as.matrix(importance_matrix[1:30,])),sep="\t"
such that first
sep
will be 2 tabs ('\t','\t') and rest single tab ('\t'); instead of current uniform spacing. Simple but search doesn't give any idea. Any suggestions?

Answer

Consider padding the column names and first char column of matrix with whitespace to align each to largest character size of first column:

write.table(importance_matrix, sep="\t", row.names = FALSE,  quote = FALSE)
# Feature   Gain    Cover   Frequency
# pctTillDate   0.56035970  0.13140747  0.02427825
# colr_per  0.18314948  0.09624575  0.04961867
# date  0.05052830  0.11437520  0.06639573
# GREG_D    0.02564843  0.03814761  0.01807014
# LNGTD_I   0.02034602  0.04852350  0.10132211
# LATTD_I   0.01924150  0.04218923  0.09386710

new_matrix <- importance_matrix

# FIRST COLUMN LARGEST CHAR LENGTH
charmax <- max(nchar(new_matrix[,1]))

# PAD COLUMN HEADERS
colnames(new_matrix) <- lapply(1:ncol(new_matrix), function(i)
       paste0(colnames(new_matrix)[i],
              paste(rep(" ", charmax - nchar(colnames(new_matrix)[i])), collapse=""))
)

# PAD FIRST COLUMN
new_matrix[,1] <- sapply(1:nrow(new_matrix), function(i)
       paste0(new_matrix[i,1], 
              paste(rep(" ", charmax - nchar(new_matrix[i,1])), collapse=""))
)

write.table(new_matrix, sep="\t", row.names = FALSE,  quote = FALSE)
# Feature       Gain        Cover       Frequency  
# pctTillDate   0.56035970  0.13140747  0.02427825
# colr_per      0.18314948  0.09624575  0.04961867
# date          0.05052830  0.11437520  0.06639573
# GREG_D        0.02564843  0.03814761  0.01807014
# LNGTD_I       0.02034602  0.04852350  0.10132211
# LATTD_I       0.01924150  0.04218923  0.09386710
Comments