Marie-Eve - 1 year ago 86
R Question

# Checking whether a string is present in a bunch of other strings by row and expand columns to sign this test

I would like to have a data frame marked if a string from a vector of strings is present or not in a given column of a data frame by row. The following is a toy data and next is how I would like the outcome to be. It can go ok with loops, but if possible, I'd like to not use loop, once this data is about 3 million rows.

``````  mydata <- structure(list(X7 = c("00019", "00019", "00019", "00019", "00035", "00035"), X17 = c("A / BG / C / D / E", "E / D", "B / F", "B / C", "A / BE / G / F", "AB / G" ), n = c(10L, 4L, 4L, 4L, 8L, 4L)), .Names = c("X7", "X17", "n"), row.names = c(NA, -6L), class = c("data.frame"))
``````

.

``````> mydata
X7                X17  n
1 00019 A / BG / C / D / E 10
2 00019              E / D  4
3 00019              B / F  4
4 00019              B / C  4
5 00035     A / BE / G / F  8
6 00035             AB / G  4
``````

In the outcome data the columns can go until the last letter of alphabet, here I just print a subset from it.

`````` > outcome
X7               X17   n A B C D E F G
1 00019 A / BG / C / D / E  10 1 0 1 1 1 0 0
2 00019              E / D   4 0 0 0 1 1 0 0
3 00019              B / F   4 0 1 0 0 0 1 0
4 00019              B / C   4 0 1 1 0 0 0 0
5 00035     A / BE / G / F   8 1 0 0 0 0 1 1
6 00035             AB / G   4 0 0 0 0 0 0 1
``````

Here is one method using `sapply` and `grepl`:

``````outcome2 <- cbind(mydata, sapply(LETTERS[1:7], function(i) as.integer(grepl(i, mydata\$X17))))
``````

`sapply` loops through the letters A-G created by `LETTERS[1:7]`. `grepl` checks if each letter is present in a row of `mydata\$X17` and is transformed from a logical (TRUE / FALSE) to a binary integer (0 /1) with `as.integer`.

``````# test that the outcomes are the same
identical(outcome, outcome2)
[1] TRUE
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download