RNN RNN - 2 months ago 12
R Question

R spread data frame

I have this data set dput:

structure(list(Account.Name = c("CMD", "CMD", "CMD", "CMD", "CMD",
"CMD", "CMD", "CMD", "CMD", "CMD", "CMD", "CMD", "CMD", "CMD",
"CMD", "CMD", "CMD", "CMD", "CMD", "CMD", "CMD", "CMD", "Colimbra",
"Colimbra", "Colimbra", "Colimbra", "Colimbra", "Colimbra", "Colimbra",
"Colimbra", "Colimbra", "Colimbra", "Colimbra", "Colimbra", "Colimbra",
"Colimbra", "Colimbra", "Colimbra", "Colimbra", "Colimbra", "Colimbra",
"Colimbra", "Colimbra", "Colimbra", "Colimbra", "Colimbra", "Colimbra",
"Colimbra", "Colimbra", "Colimbra", "Colimbra", "Colimbra", "Colimbra",
"Colimbra"), Date.y = structure(c(47L, 38L, 39L, 46L, 29L, 30L,
31L, 37L, 36L, 34L, 43L, 45L, 41L, 42L, 33L, 40L, 27L, 28L, 32L,
35L, 44L, 26L, 9L, 24L, 17L, 23L, 18L, 6L, 8L, 5L, 12L, 10L,
7L, 11L, 35L, 25L, 19L, 16L, 34L, 27L, 4L, 26L, 20L, 29L, 15L,
33L, 32L, 30L, 14L, 22L, 31L, 13L, 21L, 28L), .Label = c("",
"2012-12-01", "2013-01-01", "2013-02-01", "2013-03-01", "2013-04-01",
"2013-05-01", "2013-06-01", "2013-07-01", "2013-08-01", "2013-09-01",
"2013-10-01", "2013-11-01", "2013-12-01", "2014-01-01", "2014-02-01",
"2014-03-01", "2014-04-01", "2014-05-01", "2014-06-01", "2014-07-01",
"2014-08-01", "2014-09-01", "2014-10-01", "2014-11-01", "2014-12-01",
"2015-01-01", "2015-02-01", "2015-03-01", "2015-04-01", "2015-05-01",
"2015-06-01", "2015-07-01", "2015-08-01", "2015-09-01", "2015-10-01",
"2015-11-01", "2015-12-01", "2016-01-01", "2016-02-01", "2016-03-01",
"2016-04-01", "2016-05-01", "2016-06-01", "2016-07-01", "2016-08-01",
"2016-09-01"), class = "factor"), EI = c(0.172413778757433, 0.283582069077747,
0.304347804744803, 0.278195468486632, 0.675675653544559, 0.965738751378275,
0.79789472055251, 0.571428546702807, 0.364238387240035, 0.333333310925928,
0.333333310925928, 0.267175552791797, 0.30935249644739, 0.30935249644739,
0.547169786306516, 0.342465730716834, 0.25581393431044, 0.593220290504169,
0.529411739555941, 0.538461513372782, 0.333333310925928, 0.119266044513089,
0.00689655157368212, 0.0932835783248028, 0.117967327490881, 0.111415832683409,
0.0864661618980282, 0.0170648454887846, 0.0380999488912474, 0.00803673911715819,
0.0500855092066307, 0.00942675138629104, 0.0201612894472413,
0.0046082948309584, 0.00435339299151454, 0.144554447192982, 0.0830188645366324,
0.0825861213183505, 0.0129474483080438, 0.0240963850193243, 0.00917431152659711,
0.0215175530933231, 0.0953932023013541, 0.00917431172607524,
0.0873239401148782, 0.00892174336861008, 0.018429689070739, 0.0352357312529589,
0.0470588220329153, 0.059847657373831, 0.00588084875970071, 0.0479921625133198,
0.229030327296333, 0.00613496919149197)), .Names = c("Account.Name",
"Date.y", "EI"), row.names = c(69L, 70L, 71L, 72L, 73L, 74L,
75L, 76L, 77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 86L, 87L,
88L, 89L, 90L, 91L, 95L, 96L, 99L, 101L, 104L, 105L, 107L, 108L,
109L, 110L, 111L, 113L, 114L, 116L, 117L, 118L, 119L, 120L, 121L,
122L, 123L, 125L, 126L, 127L, 128L, 129L, 130L, 131L, 132L, 133L,
134L), class = "data.frame")


and I need to spread it (pivot it) in such way that each row will be one Account.Name and each column will be the related Date.y column with a column name starting from 0_date (if it's the last Date.y date value for that account) and ending with i_date (i is the index of the first date record for the account counting from the end to the beginning).
for instance:

Account.Name date_0, date_1, date_2...
CMD 0.333333311 0.333333311 0.309352496


  • where date_0 corresponds to 2016-06-01
    date_1 corresponds to 2016-05-01
    date_2 corresponds to 2016-04-01
    and so on ...
    I tried to use tidyr::spread however, the column name are assigned to the original date values, and I want to make a relative date columns names (counting from 0_date, 1_date until last date for each account)
    Any idea appreciated


Answer

let x be your data frame

library(data.table)
library(lubridate)
dt <- data.table(x)
# date should not be factors
dt[, Date.y := ymd(Date.y)]  
setorder(dt, Account.Name, -Date.y)
dt[, col_index := 0:(.N-1L), by = Account.Name]
dt_casted <- dcast(dt, Account.Name ~ col_index, value.var = "EI")

Note I didn't use "date_0" format because I believe you will want them sorted, while "date_10" will have wrong order compare to "date_2". Better keep the index as numeric, or pad with leading 0.

Comments