AGG AGG - 3 months ago 6
R Question

R: How to separate values in a data frame object (such as df[1,1]), that are separated by comma?

I have a output data, where in each row there are multiple isoforms for each gene. Isoforms are seperated by comma ','. When I import the table to R, data frame looks like as below.

Df:
gene isoform sample1_read_number p-value
A 'A1','A2','A3' 0:23,1:12,2:122 0.9,0.01,0.5
B 'B1','B2','B3' 0:3,1:45,2:76 0.43,0.001,0.12
C 'C1','C2','C3','C4' 0:5,1:56,2:166,3:7 0.004,0.002,0.23,0.12
D 'D1','D2' 0:43,1:100 0.1,0.0003


For each gene, there are multiple isoforms. For each isoform, I have read numbers, seperated by comma (0:23 read for A1 meaning A1 read is 23) and p-values seperated by comma (p-value for A1 is 0.9 and A2 is 0.01). So everything is in an order by comma separation in each object.

For example when I call,
df[1,2]
the result is
[1] 'A1','A2','A3''


or
df[1,4]
the result is
[1] 0.9,0.01,0.5
as one object. I couldn't figure how to make R to separate those values in df[X,Y].

The reason I want to do this is because, I want to filter this data to based on p-value or read number. To be able to do that, first I should be able to break this data frame by each isoform and to do that I need to find a way to separate values on each spot.

Final data frame should be like that (only showing for gene A and B here):

Df_I:
gene isoform sample1_read_number p-value
A A1 0:23 0.9
A A2 1:12 0.01
A A3 2:122 0.5
B B1 0:3 0.43
B B2 1:45 0.001
B B3 2:76 0.12


Anybody can give me ideas to make this second data frame?
Any help would be appreciated a lot!

Cheers!
A

Answer

This can be easily done with cSplit from splitstackshape

library(splitstackshape)
na.omit(cSplit(Df, 2:ncol(Df), ",", "long"))
#    gene isoform sample1_read_number p.value
# 1:    A      A1                0:23  0.9000
# 2:    A      A2                1:12  0.0100
# 3:    A      A3               2:122  0.5000
# 4:    B      B1                 0:3  0.4300
# 5:    B      B2                1:45  0.0010
# 6:    B      B3                2:76  0.1200
# 7:    C      C1                 0:5  0.0040
# 8:    C      C2                1:56  0.0020
# 9:    C      C3               2:166  0.2300
#10:    C      C4                 3:7  0.1200
#11:    D      D1                0:43  0.1000
#12:    D      D2               1:100  0.0003
Comments