isjoy - 4 months ago 10

R Question

I am basically new to using R software.

I have a list of repeating codes (numeric/ categorical) from an excel file. I need to add another column values (even at random) to which every same code will get the same value.

`Codes Value`

1 122

1 122

2 155

2 155

2 155

4 101

4 101

5 251

5 251

Thank you.

Answer

We can use `match`

:

```
n <- length(code0 <- unique(code))
value <- sample(4 * n, n)[match(code, code0)]
```

or `factor`

:

```
n <- length(unique(code))
value <- sample(4 * n, n)[factor(code)]
```

The random integers generated are between 1 and `4 * n`

. The number `4`

is arbitrary; you can also put `100`

.

**Example**

```
set.seed(0); code <- rep(1:5, sample(5))
code
# [1] 1 1 1 1 1 2 2 3 3 3 3 4 4 4 5
n <- length(code0 <- unique(code))
sample(4 * n, n)[match(code, code0)]
# [1] 5 5 5 5 5 18 18 19 19 19 19 12 12 12 11
```

**Comment**

The above gives **the most general treatment**, assuming that `code`

is not readily sorted or taking consecutive values.

If `code`

is sorted (no matter what value it takes), we can also use `rle`

:

```
if (!is.unsorted(code)) {
n <- length(k <- rle(code)$lengths)
value <- rep.int(sample(4 * n, n), k)
}
```

If `code`

takes consecutive values `1, 2, ..., n`

(but not necessarily sorted), we can skip `match`

or `factor`

and do:

```
n <- max(code)
value <- sample(4 * n, n)[code]
```

**Further notice**: If `code`

is not numerical but categorical, `match`

and `factor`

method will still work.