AGG - 3 months ago 6
R Question

# R: How to separate values in a data frame object (such as df[1,1]), that are separated by comma?

I have a output data, where in each row there are multiple isoforms for each gene. Isoforms are seperated by comma ','. When I import the table to R, data frame looks like as below.

``````Df:
A    'A1','A2','A3'         0:23,1:12,2:122            0.9,0.01,0.5
B    'B1','B2','B3'         0:3,1:45,2:76              0.43,0.001,0.12
C    'C1','C2','C3','C4'    0:5,1:56,2:166,3:7         0.004,0.002,0.23,0.12
D    'D1','D2'              0:43,1:100                 0.1,0.0003
``````

For each gene, there are multiple isoforms. For each isoform, I have read numbers, seperated by comma (0:23 read for A1 meaning A1 read is 23) and p-values seperated by comma (p-value for A1 is 0.9 and A2 is 0.01). So everything is in an order by comma separation in each object.

For example when I call,
`df[1,2]`
the result is
`[1] 'A1','A2','A3''`

or
`df[1,4]`
the result is
`[1] 0.9,0.01,0.5`
as one object. I couldn't figure how to make R to separate those values in df[X,Y].

The reason I want to do this is because, I want to filter this data to based on p-value or read number. To be able to do that, first I should be able to break this data frame by each isoform and to do that I need to find a way to separate values on each spot.

Final data frame should be like that (only showing for gene A and B here):

``````Df_I:
A    A1      0:23                 0.9
A    A2      1:12                 0.01
A    A3      2:122                0.5
B    B1      0:3                  0.43
B    B2      1:45                 0.001
B    B3      2:76                 0.12
``````

Anybody can give me ideas to make this second data frame?
Any help would be appreciated a lot!

Cheers!
A

This can be easily done with `cSplit` from `splitstackshape`

``````library(splitstackshape)
na.omit(cSplit(Df, 2:ncol(Df), ",", "long"))
# 1:    A      A1                0:23  0.9000
# 2:    A      A2                1:12  0.0100
# 3:    A      A3               2:122  0.5000
# 4:    B      B1                 0:3  0.4300
# 5:    B      B2                1:45  0.0010
# 6:    B      B3                2:76  0.1200
# 7:    C      C1                 0:5  0.0040
# 8:    C      C2                1:56  0.0020
# 9:    C      C3               2:166  0.2300
#10:    C      C4                 3:7  0.1200
#11:    D      D1                0:43  0.1000
#12:    D      D2               1:100  0.0003
``````