quickreaction quickreaction - 3 months ago 10
R Question

How to transform a data.frame Long to Wide, but not using factors and not equal length associations

I've tried a few solutions but they don't fit into this use case. I've got 32 GB of RAM and I'm still hitting the ceiling when cast, dcast, or spread are used. I looped to find a list and it took more than 24 hours to do a portion of the 42 million rows I have. I'd like a base solution using the apply family, but I understand if that's not gonna work... I feel like it won't since I have to query the following row to determine if the addressID matches (or if the data.frame is unsorted, it has to search the rest of the column).

Here's an example of a (sorted) data.frame:

addressID MAKE
104554 LINCOLN
104554 LINCOLN
104628 HYUNDAI
104628 TOYOTA
104628 SUBARU
104677 HYUNDAI
104677 HYUNDAI


I want the long format, or a list, such that I've got:

$`104554`
[1] LINCOLN LINCOLN

$`104628`
[1] HYUNDAI TOYOTA SUBARU

$`104677`
[1] HYUNDAI HYUNDAI


I'm gonna have some addresses with 6 cars. I've got at least 45 MAKEs of cars, so I'd prefer not to have a 20 million X 50 column matrix/data.frame the way that dcast, cast, spread, etc. seem to require to make this run (they're not working due to the RAM limits). I feel like this should be an easy solution, but it escapes me.

Answer

Will aggregate(addressID ~ MAKE, data = df, c) work for you?

Alternatively, by(df, df[,"addressID"], c) will give you a list

Comments