Adam Majdi Adam Majdi - 2 months ago 7
R Question

Split a string in column and count occurrence of characters

I have a very huge file with dim: 47,685 x 10,541. In that file, there is no spaces between the characters in each row in the second column, as following:

File # 1

Row1 01205201207502102102…..

Row2 20101020100210201022…..

Row3 21050210210001120120…..


I want to do some statistics on that file and may be delete some columns or rows. So, using R, I want to add one space between each two characters in the second column to get something like this:

File # 2

Row1 0 1 2 0 5 2 0 1 2 0 7 5 0 2 1 0 2 1 0 2…..

Row2 2 0 1 0 1 0 2 0 1 0 0 2 1 0 2 0 1 0 2 2…..

Row3 2 1 0 0 0 2 1 0 2 1 0 0 0 1 1 2 0 1 2 0…..


And then, after I finish editing, remove the spaces between the characters in the second column, so the final format will be just like
File # 1
.

What is the best and faster way to do that?

Answer

Here is a solution using tidyr and stringr. However, this considers that your string is of equal length for the column2.

library(stringr)
library(tidyr)

count<-str_count(data$Column_2) # Get the length of the string in column 2
index<-1:count[1] # Generate an index based on the length

# Count the number of 5 and 7 in each string by row and add it as new column
data$Total_5_Rowcount <- str_count(data$Column_2, "5")
data$Total_7_Rowcount <- str_count(data$Column_2, "7")

new.data <- separate(data, Column_2, into = paste("V", 1:count[1], sep = ""), sep = index) 
new.data

output

   Column_1 V1 V2 V3 V4 V5 NA Rowcount_5 Rowcount_7
1     Row1  0  1  2  0  5             1          0
2     Row2  2  0  7  0  5             1          1
3     Row3  2  7  0  5  7             1          2

Sample data

data<-data.frame( Column_1 =c("Row1","Row2","Row3"), 
                  Column_2 = c("01205", "20705", "27057"),
                  stringsAsFactors = FALSE)