user98235 - 8 months ago 36

R Question

Let's assume that we have the following 4 states: (A, B, C, D)

The table I have has the following format

`old new`

A B

A A

B C

D B

C D

. .

. .

. .

. .

I would like the calculate the following probabilities based on the data given in the table:

`P(new=A | old=A)`

P(new=B | old=A)

P(new=C | old=A)

P(new=D | old=A)

P(new=A | old=B)

.

.

.

.

P(new=C | old=D)

P(new=D | old=D)

I can do it in a manual way, summing up all the values when each transition happens and dividing by the number of rows, but I was wondering if there's a built-in function in R that calculates those probabilities or at least helps to fasten calculating those probabilities.

Any help/input would be greatly appreciated. If there's no such function, oh well.

Answer

In base R, you could use `prop.table`

on a table object:

```
transMat <- prop.table(with(df, table(old, new)), 2)
transMat
new
old A B C D
A 0.26315789 0.27272727 0.18181818 0.22222222
B 0.31578947 0.36363636 0.09090909 0.22222222
C 0.21052632 0.27272727 0.45454545 0.33333333
D 0.21052632 0.09090909 0.27272727 0.22222222
```

Here, the columns sum to 1:

```
colSums(transMat)
A B C D
1 1 1 1
```

**edit**
On further reflection, I think using margin=1 is actually the desired outcome since old (the conditioned variable) is in the rows and because p(A|A) + p(B|A) + p(C|A) + p(D|A) should equal 1. In this scenario,

```
transMat <- prop.table(with(df, table(old, new)), 1)
transMat
new
old A B C D
A 0.41666667 0.25000000 0.16666667 0.16666667
B 0.46153846 0.30769231 0.07692308 0.15384615
C 0.26666667 0.20000000 0.33333333 0.20000000
D 0.40000000 0.10000000 0.30000000 0.20000000
```

will work. alternatively, the transpose `prop.table(with(df, table(new, old)), 2)`

.

**data**

```
set.seed(1234)
df <- data.frame(old=sample(LETTERS[1:4], 50, replace=TRUE),
new=sample(LETTERS[1:4], 50, replace=TRUE))
```

Source (Stackoverflow)