Why does processing of header converters stop with the first non-
ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-darwin15]
CSV::HeaderConverters[:throw_an_exception] = lambda do |header|
raise 'Exception triggered.'
csv_str = "Numbers\n" +
So I reached out to JEG2.
I was thinking that converters were intended to be a series of steps in a chain, where all elements were supposed to go through every step. In fact, that's not the way to best use the CSV library, especially if you have a very large amount of data.
The way it should be used (and this is the answer to the "why" question and the explanation for why this is better for performance) is to have the converters work like a series of matchers, where the first matched converter returns a non-
String, which indicates to the CSV library that the current value has been converted successfully. When you do that, the parser can stop as soon as it's a non-
String, and move on to the next header/cell value.
In this way you remove a TON of overhead when parsing CSV data. The larger the file you're processing, the more overhead you eliminate.
Here is the email response I got back:
The converters are basically a pipeline of conversions to try. Let's say you're using two converters, one for dates and one for numbers. Without a linked line, we would try both for every field. However, we know a couple of things:
- An unconverterd CSV field is a
String, because that's how we read it in
- A field that is now a non-
String, has been converted, so we can stop searching for a converter that matches.
Given that, the optimization helps our example skip checking the number converter if we already have a