heesemonster heesemonster - 3 months ago 9
R Question

How to generate unique combinations of cases to maximize a value while minimizing another?

I am currently trying to work on some code that I hope will be able to accomplish two things for data that looks like this (with about ~100 observations):

Lines <- " key error money
1 7224 0.5500000 2483118
2 7223 0.5200000 2451469
3 7222 1.6600000 2425693
4 7247 0.6400000 2324070
5 7256 0.4400000 1785569
6 7248 0.2541168 1476720"
DF <- read.table(text = Lines)


I want to write a function that will create various combinations of "key" (perhaps with the ability to threshold to n cases) in order to maximize "money" (at perhaps an arbitrary amount, say >=50,000 or <=30,000) while minimizing the "error" amount. A good way to think about it is that I want to create various portfolios of these keys on-the-fly.

I am still somewhat of an R beginner, so I understand that this may be a complicated function - I would mainly want a way to get started, but I am happy with a complete explanation as well. Thank you!

Answer

If the problem is to find the subset of rows whose sum of errors is minimized subject to the sum of money being greater than or equal to some known constant M then it can be expressed as a 0-1 integer linear programming problem:

minimize error'x subject to money'x >= M and x is a vector of 0's and 1's

In terms of R code:

library(lpSolve)
M <- 4000000
res <- lp("min", DF$error, t(DF$money), ">=", M, all.bin = TRUE)

res
## Success: the objective function is 0.96 

DF$key[res$solution == 1]
## [1] 7223 7256

To get next best solutions, this is supposed to work but it seems it's buggy here. ?lp does warn of this. It would still be worthwhile to try it on your problem in case it does work there:

res <- lp("min", DF$error, t(DF$money), ">=", M, all.bin = TRUE, 
          num.bin.solns = 3, use.rw = TRUE)

Another possibility for next best solutions is that for the ith solution cut off the first i-1 solutions by adding a constraint that excludes them and re-run:

res <- list(objval = 0)
N <- 3 # no of solutions desired

for(i in 1:N) {
  res <- lp("min", DF$error, rbind(DF$money, DF$error), ">=", c(M, res$objval * 1.0001), 
           all.bin = TRUE)
  print(res)
  print(DF$key[res$solution == 1])
}

giving:

Success: the objective function is 0.96 
[1] 7223 7256
Success: the objective function is 0.99 
[1] 7224 7256
Success: the objective function is 1.07 
[1] 7224 7223

Note: The input DF in reproducible form is:

Lines <- "   key     error   money
1 7224 0.5500000 2483118
2 7223 0.5200000 2451469
3 7222 1.6600000 2425693
4 7247 0.6400000 2324070
5 7256 0.4400000 1785569
6 7248 0.2541168 1476720"
DF <- read.table(text = Lines)