eddie_cat eddie_cat - 3 months ago 9
C# Question

How can I deal with parsing bad csv data?

I know that the data should be correct. I have no control over the data and my boss is just going to tell me that I need to figure out a way to deal with someone else's mistake. So please don't tell me it's not my problem that the data is bad, because it is.

Anywho, this is what I'm looking at:

"Words","email@email.com","","4253","57574","FirstName","","LastName, MD","","","576JFJD","","1971","","Words","Address","SUITE "A"","City","State","Zip","Phone","",""


Data has been scrubbed for confidentiality reasons.

So as you see, the data contains quotation marks and there are commas inside some of these quoted fields. So I cannot remove them. But the "Suite A""" is throwing off the parser. There are too many quotation marks. >.<

I'm using the TextFieldParser in the Microsoft.VisualBasic.FileIO namespace with these settings:

parser.HasFieldsEnclosedInQuotes = true;
parser.SetDelimiters(",");
parser.TextFieldType = FieldType.Delimited;


The error is


MalformedLineException: Line 9871 cannot be parsed using the current
delimiters.


I would like to scrub the data somehow to account for this but I'm not sure how to do it. Or maybe there's a way to just skip this line? Although I suspect my higher ups will not approve of me just skipping data that we might need.

Pat Pat
Answer

If you are only trying to get rid of the stray " marks in your csv, you can use the following regex to find them and replace them with '

String sourcestring = "source string to match with pattern";
String matchpattern = @"(?<!^|,)""(?!(,|$))";
String replacementpattern = @"$1'";
Console.WriteLine(Regex.Replace(sourcestring,matchpattern,replacementpattern,RegexOptions.Multiline));

Explanation:

@"(?<!^|,)""(?!(,|$))"; will find will find any " that is not preceded by the beginning of the string, or a , and that is not followed by the end of the string or a ,

Comments