cexplorer cexplorer - 10 months ago 65
R Question

Can't grepl be used in Apply function?

I have a dataframe with values as below:

BrandName Expense
Apple $1.8B
Google $3.2B
GE -
facebook $281M
McDonald $719M

I want to clean these expense values such that they are finally on same scale (in billions). For ex the final data frame should look like:

BrandName Expense
Apple 1.8
Google 3.2
facebook 0.281
McDonald 0.719

$ can be simply removed by gsub. This is fine. But I am facing problem afterwards.
I am applying a function A which uses grepl to check if the value contains 'M', if true (strip 'M', convert to numeric value, and divide by 1000)
and if it returns false (strip 'B', convert to numeric value)

A <- function(x){
if (grepl("M", x))
str_replace(x, "M", "")
x <- x/1000
else if (grepl("B", x))
str_replace(x, "B", "")
frame <- data.frame(frame[1], apply(frame[2],2, A))

But all the expense values are coming out to be NA in final result.
On further analysis, I noticed for all values, its going in elseif part.
Am i making a bad use of grepl in apply function ? If yes how can i fix it.

or any other better solution to solve this particular problem?

Answer Source

Here is a base R solution which might be more sensible for your problem, depending on your needs:

df$ExpenseScaled <- as.numeric(gsub("[$MB]", "", df$Expense))
m.index          <- substr(df$Expense, nchar(df$Expense), nchar(df$Expense)) == 'M'
df$ExpenseScaled[m.index] <- df$ExpenseScaled[m.index] / 1000

 BrandName Expense ExpenseScaled
1     Apple   $1.8B         1.800
2    Google   $3.2B         3.200
3  Facebook   $281M         0.281
4 McDonalds   $719M         0.719

The first line of code removes the dollar sign and amount symbol (B or M) to obtain a numerical amount. The next two lines of code conditionally divide millions figures by 1000 per your specification.