JerryN JerryN - 2 months ago 9
R Question

r data table dcast order of new columns

The csv below is from a much longer data table, call it

temp
. I want to cast it to
temp.wide
with
region_code
as columns and with the vertical order of
region_code
(SAS, SSA, EUR, ...) as the order of the columns. I just noticed that dcast orders the new columns alphabetically.

scenario region_code region_name value
1: 2010 SAS South Asia 61.17716
2: 2010 SSA Africa south of the Sahara 62.08588
3: 2010 EUR Europe 63.76123
4: 2010 LAC Latin America and Caribbean 68.84806
5: 2010 FSU Former Soviet Union 59.04499
6: 2010 EAP East Asia and Pacific 64.00579
7: 2010 NAM North America 66.18235
8: 2010 MEN Middle East and North Africa 58.03167
9: SSP2-NoCC-REF SAS South Asia 57.29973
10: SSP2-NoCC-REF SSA Africa south of the Sahara 65.14987
11: SSP2-NoCC-REF EUR Europe 63.99204
12: SSP2-NoCC-REF LAC Latin America and Caribbean 68.21118
13: SSP2-NoCC-REF FSU Former Soviet Union 60.10807
14: SSP2-NoCC-REF EAP East Asia and Pacific 63.86103
15: SSP2-NoCC-REF NAM North America 65.97859
16: SSP2-NoCC-REF MEN Middle East and North Africa 58.98356

temp = setDT(structure(list(scenario = c("2010", "2010", "2010", "2010", "2010",
"2010", "2010", "2010", "SSP2-NoCC-REF", "SSP2-NoCC-REF", "SSP2-NoCC-REF",
"SSP2-NoCC-REF", "SSP2-NoCC-REF", "SSP2-NoCC-REF", "SSP2-NoCC-REF",
"SSP2-NoCC-REF"), region_code = c("SAS", "SSA", "EUR", "LAC",
"FSU", "EAP", "NAM", "MEN", "SAS", "SSA", "EUR", "LAC", "FSU",
"EAP", "NAM", "MEN"), region_name = c("South Asia", "Africa south of the Sahara",
"Europe", "Latin America and Caribbean", "Former Soviet Union",
"East Asia and Pacific", "North America", "Middle East and North Africa",
"South Asia", "Africa south of the Sahara", "Europe", "Latin America and Caribbean",
"Former Soviet Union", "East Asia and Pacific", "North America",
"Middle East and North Africa"), value = c(61.1771623260257,
62.0858809906661, 63.7612306428217, 68.84805628195, 59.0449875464304,
64.0057851485101, 66.182351351389, 58.0316719859857, 57.299725759211,
65.1498720847705, 63.9920412193261, 68.2111842947542, 60.1080745513644,
63.86103368494, 65.9785850777114, 58.9835574681585)), .Names = c("scenario",
"region_code", "region_name", "value"), row.names = c(NA, -16L
), class = "data.frame"))


Here's the code I used.

formula.wide <- "scenario ~ region_code"
temp.wide <- data.table::dcast(
data = temp,
formula = formula.wide,
value.var = "value")


scenario EAP EUR FSU LAC MEN NAM SAS SSA
1: 2010 64.00579 63.76123 59.04499 68.84806 58.03167 66.18235 61.17716 62.08588
2: SSP2-NoCC-REF 63.86103 63.99204 60.10807 68.21118 58.98356 65.97859 57.29973 65.14987


The new column names are
scenario, EAP, EUR, FSU, LAC, MEN, NAM, SAS, SSA
.

I can grab the correct order from
temp
and then use
setcolorder
to give
temp.wide
the correct column order. But I was wondering if there some way to not have the new column order alphabetized.

Also, the help text for dcast says


Names for columns that are being cast are generated in the same order
(separated by an underscore, _) from the (unique) values in each
column mentioned in the formula RHS.


If I am understanding this correctly, I don't think it describes what dcast actually does. But I don't understand what the parenthetical phrase (separated by an underscore, _) means.

Answer

with the vertical order of region_code (SAS, SSA, EUR, ...) as the order of the columns

Just pass a factor with appropriate levels:

dcast(temp, scenario ~ factor(region_code, levels=unique(region_code)))

        scenario      SAS      SSA      EUR      LAC      FSU      EAP      NAM      MEN
1:          2010 61.17716 62.08588 63.76123 68.84806 59.04499 64.00579 66.18235 58.03167
2: SSP2-NoCC-REF 57.29973 65.14987 63.99204 68.21118 60.10807 63.86103 65.97859 58.98356

The documentation quoted in the OP sounds correct to me; in z ~ x + y -- x's unique values come before y's unique values in the order of the resulting column names.

Comments