verbatross verbatross - 21 days ago 6
SQL Question

Unexpected behavior in FIRST_VALUE() with IGNORE NULLS (Vertica)

I'm seeing unexpected behavior in Vertica's FIRST_VALUE() analytic function with the IGNORE NULLS parameter. It appears to return NULL when it shouldn't.

The issue occurs in this very tiny table:

drop table if exists temp;
create table temp (time_ timestamp(6), name varchar(10));
insert into temp (time_) values ('2016-03-18 20:32:16.144');
insert into temp (time_, name) values ('2016-03-18 20:52:09.062', 'abc');


Here are the contents of the table (select * from temp):

time_ | name
------------------------+--------
2016-03-18 20:32:16.144 | <null>
2016-03-18 20:52:09.062 | abc


Here is the query I'm running:

select time_,
first_value(name ignore nulls) over (order by time_) first_name
from temp;


Here are the results this query returns:

time_ | first_name
------------------------+------------
2016-03-18 20:32:16.144 | <null>
2016-03-18 20:52:09.062 | abc


Here are the results I would expect (and desire) from this query:

time_ | first_name
------------------------+------------
2016-03-18 20:32:16.144 | abc
2016-03-18 20:52:09.062 | abc


Does the above query have a very fundamental syntax mistake? This issue occurs on Vertica Community Edition 7.1.1.

Answer

The function works as expected.
over (order by time_) is a shortcut for over (order by time_ range unbounded preceding) which is a shortcut for over (order by time_ range between unbounded preceding and current row), which means every rows sees only the rows that preceded it, including itself.
The first row sees only itself therefore there isn't a non NULL value in its scope.

If you want the first non NULL value of the whole scope, you have to specify the whole scope:

first_value(name ignore nulls) over (order by time_ range between unbounded preceding and unbounded following) first_name