Adam Majdi - 1 month ago 6x
R Question

# Split a string in column and count occurrence of characters

I have a very huge file with dim: 47,685 x 10,541. In that file, there is no spaces between the characters in each row in the second column, as following:

File # 1

``````Row1 01205201207502102102…..

Row2 20101020100210201022…..

Row3 21050210210001120120…..
``````

I want to do some statistics on that file and may be delete some columns or rows. So, using R, I want to add one space between each two characters in the second column to get something like this:

File # 2

``````Row1 0 1 2 0 5 2 0 1 2 0 7 5 0 2 1 0 2 1 0 2…..

Row2 2 0 1 0 1 0 2 0 1 0 0 2 1 0 2 0 1 0 2 2…..

Row3 2 1 0 0 0 2 1 0 2 1 0 0 0 1 1 2 0 1 2 0…..
``````

And then, after I finish editing, remove the spaces between the characters in the second column, so the final format will be just like
`File # 1`
.

What is the best and faster way to do that?

Here is a solution using `tidyr` and `stringr`. However, this considers that your string is of equal length for the column2.

``````library(stringr)
library(tidyr)

count<-str_count(data\$Column_2) # Get the length of the string in column 2
index<-1:count[1] # Generate an index based on the length

# Count the number of 5 and 7 in each string by row and add it as new column
data\$Total_5_Rowcount <- str_count(data\$Column_2, "5")
data\$Total_7_Rowcount <- str_count(data\$Column_2, "7")

new.data <- separate(data, Column_2, into = paste("V", 1:count[1], sep = ""), sep = index)
new.data
``````

output

``````   Column_1 V1 V2 V3 V4 V5 NA Rowcount_5 Rowcount_7
1     Row1  0  1  2  0  5             1          0
2     Row2  2  0  7  0  5             1          1
3     Row3  2  7  0  5  7             1          2
``````

Sample data

``````data<-data.frame( Column_1 =c("Row1","Row2","Row3"),
Column_2 = c("01205", "20705", "27057"),
stringsAsFactors = FALSE)
``````