Thirst for Knowledge - 10 months ago 41

R Question

I have the following data set:

`observation <- c(1:10)`

pop.d.rank <- c(1:10)

cost.1 <- c(101:110)

cost.2 <- c(102:111)

cost.3 <- c(103:112)

all <- data.frame(observation,pop.d.rank,cost)

And I want to allocate the following amount of money over three years:

`annual.investment <- 500`

I can do this for the first year with the following script:

`library(dplyr)`

all <- all %>%

mutate(capital_allocated.5G = diff(c(0, pmin(cumsum(cost), annual.investment)))) %>%

mutate(capital_percentage.5G = capital_allocated.5G / cost * 100) %>%

mutate(year = ifelse(capital_percentage.5G >= 50, "Year.1",0))

But when I try to do this for the second year, taking into account the previous year's investment, the code does not work. Here is my attempt at putting an ifelse statement in the mutate loop so that it does not overwrite the money allocated in the previous year:

`all <- all %>%`

mutate(capital_allocated.5G = ifelse(year == 0, diff(c(0, pmin(cumsum(cost), annual.investment))), 0) %>%

mutate(capital_percentage.5G = capital_allocated.5G / cost * 100) %>%

mutate(year = ifelse(capital_percentage.5G >= 50, "Year.2",0))

I want the data to look like the following, where the amount allocated goes first to any row that hasn't been 100% completed from the previous year.

`capital_allocated.5G <- c(101, 102, 103, 104, 105, 106, 107, 108, 109, 55)`

capital_percentage.5G <- c(100, 100, 100, 100, 100, 100, 100, 100, 100, 50)

year <- c("Year.1", "Year.1","Year.1", "Year.1","Year.1", "Year.2", "Year.2","Year.2", "Year.2","Year.2")

example.output <- data.frame(observation,pop.d.rank,cost, capital_allocated.5G, capital_percentage.5G, year)

Edit: cost.1 is the cost variable for year 1, cost.2 is the variable for year 2 and cost.3 is the cost variable for year 3

Answer Source

The original issue with your code is that `ifelse`

just provide a switch on the **output** based on the condition and not the input `cost`

used within the `TRUE`

branch of the `ifelse`

. Therefore, `cumsum(cost)`

computes the `cumsum`

over all `cost`

and not only on the portion of the `TRUE`

branch of the `ifelse`

. To fix this, we can define the following function that can then be executed for each year in turn.

```
library(dplyr)
alloc.invest <- function(df, ann.invest, y) {
df %>% mutate(not.yet.alloc = ifelse(capital_percentage.5G < 100,cost-capital_allocated.5G,0),
capital_allocated.5G = capital_allocated.5G + ifelse(capital_percentage.5G < 100,diff(c(0, pmin(cumsum(not.yet.alloc), ann.invest))), 0),
capital_percentage.5G = capital_allocated.5G / cost * 100,
year = ifelse(is.na(year) & capital_percentage.5G >= 50, paste0("Year.",y), year)) %>%
select(-not.yet.alloc)
}
```

Note:

- Create a new
*temporary*column`not.yet.alloc`

from which we compute the resulting`cumsum`

for the year's allocation. - Don't need separate
`mutate`

statements. - Need to also check
`is.na(year)`

before setting`year`

. Otherwise, previous`year`

already labelled will be overwritten.

To use this function, we must first augment the input data with some initial values for `capital_allocated.5G`

, `capital_percentage.5G`

, and `year`

:

```
capital_allocated.5G <- rep(0,10) ## initialize to zero
capital_percentage.5G <- rep(0,10) ## initialize to zero
year <- rep(NA,10) ## initialize to NA
all <- data.frame(observation,pop.d.rank,cost,capital_allocated.5G,capital_percentage.5G,year)
```

Then for Year 1:

```
annual.investment <- 500
all <- alloc.invest(all,annual.investment,1)
print(all)
## observation pop.d.rank cost capital_allocated.5G capital_percentage.5G year
##1 1 1 101 101 100.00000 Year.1
##2 2 2 102 102 100.00000 Year.1
##3 3 3 103 103 100.00000 Year.1
##4 4 4 104 104 100.00000 Year.1
##5 5 5 105 90 85.71429 Year.1
##6 6 6 106 0 0.00000 <NA>
##7 7 7 107 0 0.00000 <NA>
##8 8 8 108 0 0.00000 <NA>
##9 9 9 109 0 0.00000 <NA>
##10 10 10 110 0 0.00000 <NA>
```

and for Year 2:

```
all <- alloc.invest(all,annual.investment,2)
print(all)
## observation pop.d.rank cost capital_allocated.5G capital_percentage.5G year
##1 1 1 101 101 100 Year.1
##2 2 2 102 102 100 Year.1
##3 3 3 103 103 100 Year.1
##4 4 4 104 104 100 Year.1
##5 5 5 105 105 100 Year.1
##6 6 6 106 106 100 Year.2
##7 7 7 107 107 100 Year.2
##8 8 8 108 108 100 Year.2
##9 9 9 109 109 100 Year.2
##10 10 10 110 55 50 Year.2
```

If costs are different per year, then the function needs to readjust the `capital_percentage.5G`

and possibly the `year`

columns first:

```
library(dplyr)
alloc.invest <- function(df, ann.invest, y) {
df %>% mutate_(cost=paste0("cost.",y)) %>%
mutate(capital_percentage.5G = capital_allocated.5G / cost * 100,
year = ifelse(capital_percentage.5G < 50, NA, year),
not.yet.alloc = ifelse(capital_percentage.5G < 100,cost-capital_allocated.5G,0),
capital_allocated.5G = capital_allocated.5G + ifelse(capital_percentage.5G < 100,diff(c(0, pmin(cumsum(not.yet.alloc), ann.invest))), 0),
capital_percentage.5G = capital_allocated.5G / cost * 100,
year = ifelse(is.na(year) & capital_percentage.5G >= 50, paste0("Year.",y), year)) %>%
select(-cost,-not.yet.alloc)
}
```

Note that creating another *temporary* column `cost`

using `mutate_`

is only for convenience as the cost column needs to be dynamically selected based on the input `y`

(otherwise, we need to use `mutate_`

for all computations, which will be somewhat messier).

With the updated data similarly augmented with initial values for `capital_allocated.5G`

, `capital_percentage.5G`

, and `year`

, Year 1:

```
annual.investment <- 500
all <- alloc.invest(all,annual.investment,1)
print(all)
## observation pop.d.rank cost.1 cost.2 cost.3 capital_allocated.5G capital_percentage.5G year
##1 1 1 101 102 103 101 100.00000 Year.1
##2 2 2 102 103 104 102 100.00000 Year.1
##3 3 3 103 104 105 103 100.00000 Year.1
##4 4 4 104 105 106 104 100.00000 Year.1
##5 5 5 105 106 107 90 85.71429 Year.1
##6 6 6 106 107 108 0 0.00000 <NA>
##7 7 7 107 108 109 0 0.00000 <NA>
##8 8 8 108 109 110 0 0.00000 <NA>
##9 9 9 109 110 111 0 0.00000 <NA>
##10 10 10 110 111 112 0 0.00000 <NA>
```

Year 2: Note that last asset has less than `50%`

allocated so its `year`

is still `NA`

.

```
all <- alloc.invest(all,annual.investment,2)
print(all)
## observation pop.d.rank cost.1 cost.2 cost.3 capital_allocated.5G capital_percentage.5G year
##1 1 1 101 102 103 102 100.00000 Year.1
##2 2 2 102 103 104 103 100.00000 Year.1
##3 3 3 103 104 105 104 100.00000 Year.1
##4 4 4 104 105 106 105 100.00000 Year.1
##5 5 5 105 106 107 106 100.00000 Year.1
##6 6 6 106 107 108 107 100.00000 Year.2
##7 7 7 107 108 109 108 100.00000 Year.2
##8 8 8 108 109 110 109 100.00000 Year.2
##9 9 9 109 110 111 110 100.00000 Year.2
##10 10 10 110 111 112 46 41.44144 <NA>
```

Year 3:

```
all <- alloc.invest(all,annual.investment,3)
print(all)
## observation pop.d.rank cost.1 cost.2 cost.3 capital_allocated.5G capital_percentage.5G year
##1 1 1 101 102 103 103 100 Year.1
##2 2 2 102 103 104 104 100 Year.1
##3 3 3 103 104 105 105 100 Year.1
##4 4 4 104 105 106 106 100 Year.1
##5 5 5 105 106 107 107 100 Year.1
##6 6 6 106 107 108 108 100 Year.2
##7 7 7 107 108 109 109 100 Year.2
##8 8 8 108 109 110 110 100 Year.2
##9 9 9 109 110 111 111 100 Year.2
##10 10 10 110 111 112 112 100 Year.3
```