SAF SAF - 1 month ago 14
R Question

Searching in dataframe and updating if found in column

In R, I have a dataframe on which I have run data mining and excluded irrelavent words. Then I created two word and three word combinations; now I want to search these combinations in my main dataframe and add column with the matching two word or three word combination so that I can get accurate number instead of frequency of occurrence

Here is the sample:

##ID Title
###123 Product - power supply has failed
###124 Product - hard drive has been degraded
###125 Product - hard drive failed
###126 Product - hard drive is failed
###127 Product - hdd failed
###128 Product - power supply is down
###129 Product - hard drive is not working
###130 Product - hard drive not functioning
###131 Product - hard drive is not working
###132 Product - Power supply is not working

Output should be:
## ID Title [Keywords Matched]
example
##ID Title [Keywords Matched]
###123 Product - power supply has failed `power supply`


I have come-up with a function which loops through a set of keywords and searching them one at a time in dataframe and marks if found - however, it gives error when I try adding new column; but it works fine if I do it outside the function; can you pls check where is the issue:

# Function to write keywords
AssignKeywords <- function(x){

keyword <<- as.character(freq2.df$word[x])

#print (which(grepl('hard drive',tolower(Working.Data$Case.Title))))
MatchingList <- which(grepl(keyword,tolower(Data.New$Issue)))

for(i in MatchingList)
{

if(is.na(Data.New$keywords[i]))
{
print('keyword is not null')
print(Data.New$keywords[i])

Data.New$keywords[i] <<- as.character(na.omit(keyword))
}
}
#print (x)
#print (MatchingList)
#print ('completed')
}

# Function to Add column for keywords and loop through keywords and update matching ones in data frame
AddKeywords <- function(){

# Add keywords Column and set to NA
Data.New$keywords <- NULL
if("keywords" %in% colnames(Data.New))
{
print('keyword column exists')
} else
{
print('Keywords column does not exist')
Data.New$keywords <- NA

print('keyword column does not exists')

}

# Run counter and loop through all the keywords and add to main data frame
counter <- 0
while(counter != (length(freq2.df$word)))
{
counter <- counter + 1

AssignKeywords(counter)

}
}

Sam Sam
Answer

Well, here's an example I drew up that takes care of the data that you have in the sample. It would probably need to be extended out to handle the actual data you're dealing with, since I'm not sure of the complexity of your data, or the keywords you're trying to match (can a Title have multiple keyword matches? etc.) Either way, I used grepl to match keywords, and str_replace_all to change hdd to hard drive. I imagine you put that in there since you could very easily have multiple aliases that you all want to mean a single word. Let me know if this does or doesn't work for you.

### Loads data frame

df <- data.frame(ID = c('###123', '###124', '###125', '###126', '###127', '###128', 
                        '###129', '###130', '###131', '###132'),
                 Title = c('Product - power supply has failed',
                           'Product - hard drive has been degraded',
                           'Product - hard drive failed',
                           'Product - hard drive is failed',
                           'Product - hdd failed',
                           'Product - power supply is down',
                           'Product - hard drive is not working',
                           'Product - hard drive not functioning',
                           'Product - hard drive is not working',
                           'Product - Power supply is not working'),
                 stringsAsFactors = FALSE)

### Change hdd to hard drive

library('stringr')
df$Title <- str_replace_all(df$Title, 'hdd', 'hard drive')

### Create keywords column

df$keywordsMatched <- ''
df$keywordsMatched <- ifelse(grepl('power supply', df$Title, ignore.case = TRUE), 
                             'power supply', df$keywordsMatched)
df$keywordsMatched <- ifelse(grepl('hard drive', df$Title, ignore.case = TRUE), 
                             'hard drive', df$keywordsMatched)

df

       ID                                  Title keywordsMatched
1  ###123      Product - power supply has failed    power supply
2  ###124 Product - hard drive has been degraded      hard drive
3  ###125            Product - hard drive failed      hard drive
4  ###126         Product - hard drive is failed      hard drive
5  ###127            Product - hard drive failed      hard drive
6  ###128         Product - power supply is down    power supply
7  ###129    Product - hard drive is not working      hard drive
8  ###130   Product - hard drive not functioning      hard drive
9  ###131    Product - hard drive is not working      hard drive
10 ###132  Product - Power supply is not working    power supply