Polla A. Fattah Polla A. Fattah - 1 month ago 17
R Question

Normalize and transpose data efficiently

I am trying to normalize and transpose stock market data according to each symbol and I have created this method, but I see the loop inside the function so slow. Is there any way to make it faster or do it in better way?

The function is:

normalize <- function(data, col){
norm <- function(x){
((x - min(x))/ (max(x) - min(x)))
}
symbols <- t(unique(data['Symbol']))
result <- data.frame()

for (s in symbols)
result <- rbind(result, norm(data[which(data['Symbol'] == s) , col]))
result = cbind(t(symbols), result)
colnames(result) <- c('Symbol', paste0('D', t(unique(data['Date']))))
rownames(result) <- NULL
return(result)
}


And console results are:

> r = normalize(allData, 'Close')
> head(allData, 20) # allData is the input data frame for the function

Date Open High Low Close Volume Adj.Close Symbol
1 1 41.18 41.31 40.37 40.56 1529200 40.35932 A
2 2 40.32 40.46 39.70 39.80 2041800 39.60308 A
3 3 39.81 40.02 39.02 39.18 2080600 38.98615 A
4 4 39.52 39.81 39.29 39.70 3359700 39.50358 A
5 5 40.24 40.98 40.18 40.89 2116300 40.68769 A
6 6 41.00 41.00 40.29 40.59 1643900 40.38917 AA
7 7 40.61 40.72 39.95 40.11 2770800 39.91155 AA
8 8 40.47 40.70 39.33 39.55 2013100 39.35432 AA
9 9 39.03 39.10 38.21 39.06 5134000 38.86674 AA
10 10 39.06 39.41 37.99 38.01 2628900 37.82194 AA
11 11 37.83 38.46 37.76 38.25 3004000 38.06075 AAL
12 12 38.43 38.66 37.76 37.93 5033600 37.74233 AAL
13 13 37.75 38.41 37.68 38.16 2721600 37.97120 AAL
14 14 38.51 39.86 38.12 39.65 4856600 39.45382 AAL
15 15 39.60 39.60 38.76 38.81 1519300 38.61798 AAL
16 16 38.79 39.17 38.47 39.15 1510900 38.95630 AAP
17 17 38.70 39.25 38.58 38.75 1703500 38.55828 AAP
18 18 39.01 39.05 37.96 38.00 2033500 37.81199 AAP
19 19 38.00 38.47 37.69 38.46 2330000 38.26971 AAP
20 20 38.01 38.32 37.71 37.77 3054300 37.58313 AAP

> head(r, 4)
A 1.898358 1.138356 0.518357 1.038358 2.228356
AA 1.928357 1.448358 0.888356 0.398358 -0.651645
AAL -0.411643 -0.731643 -0.501643 0.988359 0.148358
AAP 0.488359 0.08835696 -0.661643 -0.201644 -0.891643


A portion of the actual data:

> dput(allData)
structure(list(Date = 2:20, Open = c(40.32, 39.81, 39.52, 40.24,
41, 40.61, 40.47, 39.03, 39.06, 37.83, 38.43, 37.75, 38.51, 39.6,
38.79, 38.7, 39.01, 38, 38.01), High = c(40.46, 40.02, 39.81,
40.98, 41, 40.72, 40.7, 39.1, 39.41, 38.46, 38.66, 38.41, 39.86,
39.6, 39.17, 39.25, 39.05, 38.47, 38.32), Low = c(39.7, 39.02,
39.29, 40.18, 40.29, 39.95, 39.33, 38.21, 37.99, 37.76, 37.76,
37.68, 38.12, 38.76, 38.47, 38.58, 37.96, 37.69, 37.71), Close = c(39.8,
39.18, 39.7, 40.89, 40.59, 40.11, 39.55, 39.06, 38.01, 38.25,
37.93, 38.16, 39.65, 38.81, 39.15, 38.75, 38, 38.46, 37.77),
Volume = c(2041800L, 2080600L, 3359700L, 2116300L, 1643900L,
2770800L, 2013100L, 5134000L, 2628900L, 3004000L, 5033600L,
2721600L, 4856600L, 1519300L, 1510900L, 1703500L, 2033500L,
2330000L, 3054300L), Adj.Close = c(39.60308, 38.98615, 39.50358,
40.68769, 40.38917, 39.91155, 39.35432, 38.86674, 37.82194,
38.06075, 37.74233, 37.9712, 39.45382, 38.61798, 38.9563,
38.55828, 37.81199, 38.26971, 37.58313), Symbol = structure(c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L,
4L, 4L, 4L), .Label = c("A", "AA", "AAL", "AAP"), class = "factor")), .Names = c("Date",
"Open", "High", "Low", "Close", "Volume", "Adj.Close", "Symbol"
), class = "data.frame", row.names = c(NA, -19L))

Answer

Change your function with this one:

normalize <- function(data, col){
norm <- function(x){
    ((x - min(x))/ (max(x) - min(x)))
}
symbols <- t(unique(data['Symbol']))

listSymbols<-lapply(symbols,function(s){
    norm(data[as.character(data$Symbol) == s , col])
})
symbolsValue<-as.data.frame(matrix(unlist(listSymbols),nrow=length(symbols)))
symbolsValue$Symbol<-t(symbols)

symbolsValue
}
Comments