view raw
Alex Bădoi Alex Bădoi - 7 months ago 29
R Question

How to filter large data-sets by two attributes and split into subsets? R / Grep

I found myself at the limits of the

function or perhaps there are efficient ways of doing this.

Start off a sample data-frame:

Date <- c( "31-DEC-2014","31-DEC-2014","31-DEC-2014","30-DEC-2014",
"30-DEC-2014","30-DEC-2014", "29-DEC-2014","29-DEC-2014","29-DEC-2014" )

ISIN <- c("LU0168343191", "TW0002418001", "GB00B3FFY088","LU0168343191",
"TW0002418001", "GB00B3FFY088","LU0168343191", "TW0002418001", "GB00B3FFY088")

price <-c(seq(1:9))

df <-, ISIN, price))

And the desired Result is a
containing subsets of the main data file which looks like the below (x3 for the 3 individual Identifiers in

The idea is that the data should first filter by ISIN and then filter by Date. this 2 step process should keep my data intact.

Result_d <- c("31-DEC-2014", "30-DEC-2014","29-DEC-2014")
Result_I <- c("LU0168343191","LU0168343191","LU0168343191")
Result_P <- c(1,4,7)

Result_df <- cbind(Result_d, Result_I, Result_P)

Please keep in mid the above is for demo purposes and the real data-set has 5M rows and 50 columns over a period of 450+ different dates as per
so i am lookign for something that is applicable irrespective of nrow or ncol

What i have so far:

I take all unique dates and store:

Unique_Dates <- unique(df$Date)

The same for the Identifiers:

Unique_ID <- unique(df$ISIN)

Now the grepping issue:

If i wanted all rows containing
i would do something like:

pattern <- paste(Unique_dates, collapse = "|")

result <- as.matrix(df[grep(pattern, df$Date),])

and this will retrieve basically the entire data set. i am wondering if anyone knows an efficient way of doing this.

Thanks in advance.


We can try

setDT(df)[grep("LU", ISIN), .SD, by = Date]
#           Date         ISIN price
#1: 31-DEC-2014 LU0168343191     1
#2: 30-DEC-2014 LU0168343191     4
#3: 29-DEC-2014 LU0168343191     7