Jash Shah Jash Shah - 3 months ago 9
R Question

Deleting rows that have more than a certain number of columns in a comma delimited file

I have rows/observations in a comma delimited file that ideally should have 55 columns. But there are fields such as addresses that have an an extra comma within them. Such as Manhattan, New York should be one field Manhattan, New York but I get two fields Manhattan and New York when I read the file which increases the number of columns.

Is there anyway I can delete such observations using R or any tool such as Delimit or Excel?

I would eventually like to load this file into R for analysis.

I agree my question is similar to Delete lines or rows in a tab-delimited file, by number of cells in that lines or rows but I am looking for a solution in R.

Input

Name, Address, DOB
John, Manhattan, New York, 2/8/1990
Jacob, Arizona, 9/10/2012
Smith, New Jersey, 8/10/2016


Expected Output

Name, Address, DOB
Jacob, Arizona, 9/10/2012
Smith, New Jersey, 8/10/2016

Answer

In general, I do not advocate doing what you want to do, which is to throw away records. Nonetheless, if this is what you want to do, you could do so as follows.

Assuming your data is stored as a text in a file called foo, you can use the count.fields function to count fields defined by the presence of sep. Then just omit them from the readLines function.

text <-
"Name, Address, DOB
John, Manhattan, New York, 2/8/1990
Jacob, Arizona, 9/10/2012
Smith, New Jersey, 8/10/2016
"
cat(text, file = "foo", sep = ",")
fields <- count.fields("foo", sep = ",")
readLines("foo")[fields == 3]