abbas786 abbas786 - 3 months ago 15
R Question

strsplit not behaving as expected R

I have a basic problem in R, everything I'm working with is familiar to me (data, functions) but for some reason I can't get the

strsplit
or the
gsub
function to work as expected. I also tried the
stringr
package. I'm not going to bother putting up code using that package because I know this problem is simple and can be done with the two functions mentioned above. Personally, I feel like putting up a page for this isn't even necessary but my patience is pretty thin at this point.

I am trying to remove the "." and the number followed by the '.' in an Ensemble Gene ID. Simple, I know.

id <- "ENSG00000223972.5"
gsub(".*", "", id)
strsplit(id, ".")


The asterisk symbol was meant to catch anything after the '.' and remove it but I don't know for sure if that's what it does. The
strsplit
should definitely output a list of two items, the first being everything before the '.' and the second being the one digit after. All it returns is a list with 17 "" symbols, for no space and one for each character in the string. I think it's an obvious thing that I'm missing but I haven't been able to figure it out. Thanks in advance.

Kou Kou
Answer

Read the help file for ?strsplit, you cannot use "."

id <- "ENSG00000223972.5"
gsub("[.]", "", id)
strsplit(id, split = "[.]")

Output:

> gsub("[.]", "", id)
[1] "ENSG000002239725"
> strsplit(id, split = "[.]")
[[1]]
[1] "ENSG00000223972" "5"  

Help:

unlist(strsplit("a.b.c", "."))
## [1] "" "" "" "" ""
## Note that 'split' is a regexp!
## If you really want to split on '.', use
unlist(strsplit("a.b.c", "[.]"))
## [1] "a" "b" "c"
## or
unlist(strsplit("a.b.c", ".", fixed = TRUE))