blast00 - 11 months ago 73

R Question

I am looking for help writing a function that can identify a trend ("positive/negative/mixed", see definition below) in a value for a given customer in a dataset.

I have the following transactional data; all customers have between 3-13 transactions each.

`customer_ID transaction_num sales`

Josh 1 $35

Josh 2 $50

Josh 3 $65

Ray 1 $65

Ray 2 $52

Ray 3 $49

Ray 4 $15

Eric 1 $10

Eric 2 $13

Eric 3 $9

I would like to write a function in R that populates a new dataframe as follows

`Customer_ID Sales_Slope`

Josh Positive

Ray Negative

Eric Mixed

where:

Josh's slope is

Ray's slope is

Eric's slope is

I have tried quite extensively to do this myself but am stuck.. here is some pseudo-code I have been able to put together

`counter = max(transaction_num)`

while counter >= 0

if sales at max transaction_num are greater than sales at max transaction_num - 1)

then counter = counter - 1 ; else "not positive slope trend"

Answer Source

I think I would start with something like this. `data.table`

is usually pretty efficient with bigger datasets.

```
#Make fake data
require("data.table")
data <- data.table(customer_ID=c(rep("Josh",3),rep("Ray",4),rep("Eric",3)),
sales=c(35,50,65,65,52,49,15,10,13,9))
data[,transaction_num:=seq(1,.N),by=c("customer_ID")]
```

Now for the actual code.

```
data <- data.table(data)
#Calculate difference in rolling two time periods
rolled.up <- data[,list(N.Minus.1=.N-1,Change=list(
sales[transaction_num+1]-sales[transaction_num])),
by=c("customer_ID")]
#Sum up positive and negative values
rolled.up[,Pos.Values:=as.numeric(lapply(Change,FUN=function(x) {sum(1*(x>0),na.rm=T)}))]
rolled.up[,Neg.Values:=(N.Minus.1-Pos.Values)]
#Make Sales Slope variable
rolled.up[,Sales_Slope:=ifelse(Pos.Values>0 & Neg.Values==0,"Positive",
ifelse(Pos.Values==0 & Neg.Values>0,"Negative","Mixed"))]
#Make final table
final.table <- rolled.up[,list(customer_ID,Sales_Slope)]
final.table
# customer_ID Sales_Slope
# 1: Josh Positive
# 2: Ray Negative
# 3: Eric Mixed
#You can always merge this result back onto your main dataset if you want
data <- merge(x=data,y=final.table,by=c("customer_ID"),all.x=T)
```