Hardik Hardik - 1 month ago 4x
Ruby Question

flatten and compact an array more efficiently

On many occasions, we need to perform two or more different operations on an array like



My concern here is that it will loop over the array two times. Is there more efficient way of doing this?


I actually think this is a great question. But first off, why is everyone not too concerned about this? Here's the performance of flatten and flatten.compact compared:

flatten performance

Here's the code I used to generate this chart, and one that includes memory.

Hopefully now you see why most folks won't worry: it is just another constant factor you're adding when you compose a flatten with a compact, maybe it's valuable at least theoretically to say: how can we shave off the time and space of this intermediate structure? Again, asymptotically not super valuable, but curious to think about.

As far as I can tell, you can't do this by making use of flatten:

Before looking at the source, I hoped that flatten could take a block like so:

[[3, [3, 3, 3]], [3, [3, 3, 3]], [3, [3, 3, 3]], nil].flatten {|e| e unless e.nil? }

No dice though. We get this as a return:

[3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, nil]

This is weird in that it basically tosses the block away as a no-op. But it makes sense with the source. The C method flatten used in Ruby's core isn't parameterized to take a block.

The procedure in the Ruby source code reads kinda weird to me (I am not a C programmer) but it's basically doing something like depth first search. It's using a stack that it adds every new nested array in to process that it encounters. (It terminates when none remain.) I've not calculated this formally, but it leads me to guess the complexity is on par with DFS.

So the source code could've been written such this would work by allowing for extra setup if a block is passed in. But without that, you're stuck with the (small) performance hit!