Trylks Trylks - 1 month ago 52
SQL Question

How to calculate an exponential moving average on postgres?

I'm trying to implement an exponential moving average (EMA) on postgres, but as I check documentation and think about it the more I try the more confused I am.

The formula for

EMA(x)
is:

EMA(x1) = x1
EMA(xn) = α * xn + (1 - α) * EMA(xn-1)


It seems to be perfect for an aggregator, keeping the result of the last calculated element is exactly what has to be done here. However an aggregator produces one single result (as reduce, or fold) and here we need a list (a column) of results (as map). I have been checking how procedures and functions work, but AFAIK they produce one single output, not a column. I have seen plenty of procedures and functions, but I can't really figure out how does this interact with relational algebra, especially when doing something like this, an EMA.

I did not have luck searching the Internets so far. But the definition for an EMA is quite simple, I hope it is possible to translate this definition into something that works in postgres and is simple and efficient, because moving to NoSQL is going to be excessive in my context.

Thank you.

PD: here you can see an example:

https://docs.google.com/spreadsheet/ccc?key=0AvfclSzBscS6dDJCNWlrT3NYdDJxbkh3cGJ2S2V0cVE

Answer

You can define your own aggregate function and then use it with a window specification to get the aggregate output at each stage rather than a single value.

So an aggregate is a piece of state, and a transform function to modify that state for each row, and optionally a finalising function to convert the state to an output value. For a simple case like this, just a transform function should be sufficient.

create function ema_func(numeric, numeric) returns numeric
  language plpgsql as $$
declare
  alpha numeric := 0.5;
begin
  -- uncomment the following line to see what the parameters mean
  -- raise info 'ema_func: % %', $1, $2;
  return case
              when $1 is null then $2
              else alpha * $2 + (1 - alpha) * $1
         end;
end
$$;
create aggregate ema(basetype = numeric, sfunc = ema_func, stype = numeric);

which gives me:

steve@steve@[local] =# select x, ema(x, 0.1) over(w), ema(x, 0.2) over(w) from data window w as (order by n asc) limit 5;
     x     |      ema      |      ema      
-----------+---------------+---------------
 44.988564 |     44.988564 |     44.988564
   39.5634 |    44.4460476 |    43.9035312
 38.605724 |   43.86201524 |   42.84396976
 38.209646 |  43.296778316 |  41.917105008
 44.541264 | 43.4212268844 | 42.4419368064

These numbers seem to match up to the spreadsheet you added to the question.

Also, you can define the function to pass alpha as a parameter from the statement:

create or replace function ema_func(state numeric, inval numeric, alpha numeric)
  returns numeric
  language plpgsql as $$
begin
  return case
         when state is null then inval
         else alpha * inval + (1-alpha) * state
         end;
end
$$;

create aggregate ema(numeric, numeric) (sfunc = ema_func, stype = numeric);

select x, ema(x, 0.5 /* alpha */) over (order by n asc) from data

Also, this function is actually so simple that it doesn't need to be in plpgsql at all, but can be just a sql function, although you can't refer to parameters by name in one of those:

create or replace function ema_func(state numeric, inval numeric, alpha numeric)
  returns numeric
  language sql as $$
select case
       when $1 is null then $2
       else $3 * $2 + (1-$3) * $1
       end
$$;
Comments