Venky Venky - 3 months ago 12
R Question

Combining CSV files and splitting the column into 2 columns using R

I have 40 CSV files with only 1 column each. I want to combine all 40 files data into 1 CSV file with 2 columns.

Data format is like this :

Input data structure

I want to

split
this column by
space
and combine all 40 CSV files into 1 file. I want to
preserve the number format
as well.

I tried below code but Number format is not fixed and and extra 3rd column added for
Negative
numbers. Not sure why.

My Code :

filenames <- list.files(path="C://R files", full.names=TRUE)

merged <- data.frame(do.call("rbind", lapply(filenames, read.csv, header = FALSE)))

data <- do.call("rbind", strsplit(as.character(trimws(merged$V1))," ",fixed=FALSE))

write.csv(data, "export1.csv", row.names=FALSE, na="NA")


The output which i got is as shown below. If you observe, the negative numbers are put into extra column. I just want to split by space and put in 2 columns in the exact number format as in the input.

Output file structure

R Output:

enter image description here

Answer

The problem is that the source data is delimited by:

  • one space when the second number is negative, and
  • two spaces when the second number is positive (space for the absent minus sign).

The trick is to split the string on one or more spaces:

 data <- do.call("rbind", strsplit(as.character(trimws(merged$V1))," +",fixed=FALSE))

I'm a bit OCD on charsets, unreliable files, etc, so I tend to use splitters such as "[[:space:]]+" instead, since it'll catch whitespace-variants instead of the space " " or tab "\t".

(In regex-speak, the + says "one or more". Other modifiers include ? as zero or one, and * as zero or more.)