gibbz00 gibbz00 - 2 months ago 6x
R Question

Concatenating and pasting two columns using line breaks within data-frame/data.table

This is what my dataframe looks like


dt <- fread('
Batch Score Type Description
A 1 fruit apple
A 2 beverage pepsi
A 3 food rice
B 1 beverage coke
B 2 fruit banana
C 1 food butter
D 1 food bread

Once I identify the row where a particular batch has the highest Score, I want to concatenate
and paste them all the way(all the elements from the first row to the last row with the highest Score) using line breaks and gap between Type and Description. The pasting is done upto the row with the highest score for that batch so that it looks like the following:

Batch Score(Max) Description2
A 3 fruit apple
beverage pepsi
food rice
B 2 beverage coke
fruit banana
C 1 food butter
D 1 food bread

The row with the highest score for the batch also happens to be the last row for that batch in my dataframe. I have tried:

dt[, .(MaxScore = max(Score),
Description2 = cat(paste(Type, Description), sep="\n")),by = .(Batch)]

The goal is to show all that information in one column(Description2) and one row for each batch.Your help is highly appreciated!


Note: This does not fully answer the OP's question.

I do something similar (but not exactly the same):

mx = max(Score)

subdat = .SD[, !"Score", with=FALSE]
newrow0 = lapply(subdat, function(x) as("",class(x)))
newrow  = newrow0
newrow[length(newrow)] = paste0(" Max Score: ", mx)
}, by=Batch], nrows=Inf, row.names=FALSE)

which gives

 Batch     Type   Description
     A    fruit         apple
     A beverage         pepsi
     A     food          rice
     A           Max Score: 3
     B beverage          coke
     B    fruit        banana
     B           Max Score: 2
     C     food        butter
     C           Max Score: 1
     D     food         bread
     D           Max Score: 1

I just use this to review the data in the console. I put the metadata for the group on a row instead of a column because I have enough columns to nearly span the screen already.

Note that the as("", class(x)) is very unreliable (e.g., not working with Date class). It might be necessary to just coerce the entire table to string first.

For printing multiple columns as one, use sprintf:

mx = max(Score)
subdat = .(
  Description = as.character(Description), 
  Type = as.character(Type)
.(mx, sprintf("%10s %15s", subdat$Type, subdat$Description))
}, by=Batch], nrows=Inf, row.names=FALSE)

 Batch V1                         V2
     A  3      fruit           apple
     A  3   beverage           pepsi
     A  3       food            rice
     B  2   beverage            coke
     B  2      fruit          banana
     C  1       food          butter
     D  1       food           bread

This is very manual but it should be obvious how it can be done more programmatically, starting with sapply(dt, function(x) max(nchar(x))).