disposedtrolley disposedtrolley - 2 months ago 16
R Question

R - weighted adjacency matrix from two columns

I have a dataset similar to the following:

AuthorID ThreadID
1 A
2 A
1 A
2 A
2 C
3 B
1 C
4 B
4 C
4 C


where
AuthorID
identifies a particular author in a
ThreadID
. Threads can contain posts from many authors and authors can post in many threads.

I'm after a weighted adjacency matrix in R which I can use with igraph, that shows the number of times a particular
AuthorID
has communicated with another
AuthorID
within a
ThreadID
. So for these data the matrix should look like this (
AuthorID
as column and row headings):

1 2 3 4
1 . 3 0 1
2 . . 0 1
3 . . . 1
4 . . . .


Thanks in advance!

Answer

Here's a solution using base R function. First, your sample data in a easily copy/paste-able format

dd<-read.table(text="AuthorID    ThreadID
   1           A
   2           A
   1           A
   2           A
   2           C
   3           B
   1           C
   4           B
   4           C
   4           C
", header=T)

Then you can do

x <- xtabs(~ThreadID+AuthorID, unique(dd)); 
mm <- crossprod(x,x)
mm[lower.tri(mm, TRUE)] <- NA

to get

        AuthorID
AuthorID  1  2  3  4
       1 NA  2  0  1
       2 NA NA  0  1
       3 NA NA NA  1
       4 NA NA NA NA

We use xtabs to count occurrences. We make sure to use unique so we don't count an author on a thread twice (to agree with your desired output. Then we use crossprod to get the author-author frequencies from the author-thread table. Finally we use lower.tri to get rid of the lower triangle as per your desired output.

Comments