Fellow Stranger Fellow Stranger - 1 year ago 42
Ruby Question

How to escape both " and ' when importing each row

I import a text file and save each row as a new record:

CSV.foreach(csv_file_path) do |row|
# saving each row to a new record

How do I escape both the characters

Strangely enough, the following escapes double quotes, but I have no clue how to escape different characters:

CSV.foreach(csv_file_path, {quote_char: "\'"}) do |row|

Answer Source

Note that you have additional options available to configure the CSV handler. The useful options for specifying character delimiter handling are these:

  • :col_sep - defines the column separator character
  • :row_sep - defines the row separator character
  • :quote_char - defines the quote separator character

Now, for traditional CSV (comma-separated) files, these values default to { col_sep: ",", row_sep: "\n", quote_char: "\"" }. These will satisfy many needs, but not necessarily all. You can specify the right set to suit your well-formed CSV needs.

However, for non-standard CSV input, consider using a two-pass approach to reading your CSV files. I've done a lot of work with CSV files from Real Estate MLS systems, and they're basically all broken in some fundamental way. I've used various pre- and post-processing approaches to fixing the issues, and had quite a lot of success with files that were failing to process with default options.

In the case of handling single quotes as a delimiter, you could possibly strip off leading and trailing single quotes after you've parsed the file using the standard double quotes. Iterating on the values and using a gsub replacement may work just fine if the single quotes were used in the same way as double quotes.

There's also an "automatic" converter that the CSV parser will use when trying to retrieve values for individual columns. You can specify the : converters option, like so: { converters: [:my_converter] }

To write a converter is pretty simple, it's just a small function that checks to see if the column value matches the right format, and then returns the re-formatted value. Here's one that should strip leading and trailing single quotes:

CSV::Converters[:strip_surrounding_single_quotes] = lambda do |field|
    return nil if field.nil?

    match = field ~= /^'([^']*)'$/
    return match.nil? ? field : match[1]        

CSV.parse(input, { converters: [:strip_surrounding_single_quotes] }

You can use as many converters as you like, and they're evaluated in the order that you specify. For instance, to use the pre-defined :all along with the custom converter, you can write it like so:

CSV.parse(input, { converters: [:all, :strip_surrounding_single_quotes] }

If there's an example of the input data to test against, we can probably find a complete solution.