jefflunt jefflunt - 1 year ago 73
Ruby Question

Why do CSV::HeaderConverters stop processing when a non-String is returned?

Why does processing of header converters stop with the first non-

that's returned from a header converter?

The specific Ruby version I'm using, if it's relevant is
ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-darwin15]


After the built-in
header converter is triggered, no other converters will be processed. It seems that processing of header converters stops with the first converter that returns anything that's not a
(i.e. same behavior if you write a custom header converter that returns a
, or anything else).

This code works as expected, throwing the exception in

require 'csv'

CSV::HeaderConverters[:throw_an_exception] = lambda do |header|
raise 'Exception triggered.'

csv_str = "Numbers\n" +
"1\n" +
"4\n" +

puts CSV.parse(
headers: true,
header_converters: [

However, if you switch the order of the header converters so that the
converter comes first, the
lambda is never called.


header_converters: [


Answer Source

So I reached out to JEG2.

I was thinking that converters were intended to be a series of steps in a chain, where all elements were supposed to go through every step. In fact, that's not the way to best use the CSV library, especially if you have a very large amount of data.

The way it should be used (and this is the answer to the "why" question and the explanation for why this is better for performance) is to have the converters work like a series of matchers, where the first matched converter returns a non-String, which indicates to the CSV library that the current value has been converted successfully. When you do that, the parser can stop as soon as it's a non-String, and move on to the next header/cell value.

In this way you remove a TON of overhead when parsing CSV data. The larger the file you're processing, the more overhead you eliminate.

Here is the email response I got back:


The converters are basically a pipeline of conversions to try. Let's say you're using two converters, one for dates and one for numbers. Without a linked line, we would try both for every field. However, we know a couple of things:

  • An unconverterd CSV field is a String, because that's how we read it in
  • A field that is now a non-String, has been converted, so we can stop searching for a converter that matches.

Given that, the optimization helps our example skip checking the number converter if we already have a Date object.


Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download