np2000 np2000 - 29 days ago 10
R Question

ranking across a column of a list

I have a list with various symbols for which I want to create a column and rank a certain column relative to the rest of the list.

For example, I have a list

x
which contains the time series of
SPY
and
IWM
. for each list item I calculate
rsi
. I then want to create a new column in each list which assigns a rank to the lowest
rsi
value between
SPY
and
IWM
.

I always get a rank of 1, which cannot be correct, so something has to be wrong in my code. As I said, I need the rank of
rsi
.

library(quantmod)

stockData <- new.env()

symbols = c("IWM","SPY")
getSymbols(symbols, src='yahoo',from = "2016-10-01",to = Sys.Date())


x <- list()
for (i in 1:length(symbols)) {
x[[i]] <- get(symbols[i], pos=stockData) # get data from stockData environment
x[[i]]$rsi <-RSI(Cl(x[[i]]),14)
x[[i]]$rank <- NA
x[[i]]$rank<-apply(-x[[i]]$rsi,1,rank)
}

Answer
library(quantmod)
stockData <- new.env() 

symbols = c("IWM","SPY")
getSymbols(symbols, src='yahoo',from = "2016-10-01",to = Sys.Date())

fulldata <- lapply(symbols, get, pos = stockData)
closedata <- lapply(fulldata, Cl)
rsi <- lapply(closedata, RSI, n = 14) # or e.g. n = 2, if RSI based on two periods  

In order to use rank later on, we need to transform the data to a data.frame, because rank() doesn't like the class of the output that RSI() gives.

rsi <- lapply(rsi, as.data.frame)

RSI() depends on a moving average of 14 periods, causing the outcome for the first 14 periods to be NA, because no moving average could be calculated for them.

There are a couple of options to deal with the NA-values while doing the ranking. The option you find most suitable will depend on what you are going to use your data for later on:

  • You can choose to replace all NA-values in rsi with zeros for the ranking:

    for(i in 1:length(rsi)){
        rsi[[i]][is.na(rsi[[i]])] <- 0
        }
    ranks <- lapply(rsi, rank)
    
  • You can ignore all NA-value and simply remove them before ranking

     ranks <- lapply(rsi, rank, na.last = NA)
    
  • Rank the NA-values as either the lowest or highest ranks.

    # If NA be put last, use "na.last = TRUE".
    # If NA be put first, use "na.last = FALSE"
    ranks <- lapply(rsi, rank, na.last = TRUE)  
    

Rank among symbols for a given day

I would combine the lists into one data frame and then calculate the row-wise rank:

rsiDF <- data.frame(rsi)
rsiDF <- cbind(rsiDF, t(apply(rsiDF, 1, rank)))

Note that here you can, again, decide how to deal with tied values and NA-values in your rank calculation (as described above and in ?rank)

If you wish to turn it back into lists again:

k <- length(symbols)
interranks <- list()
for(i in 1:k){
   interranks[[i]] <- rsiDF[,c(i, i+k)]
}
Comments