jjardel jjardel - 9 months ago 47
Python Question

Should I do numerical calculations in-database?

I have a database containing results from many models I've run. I need to calculate each model's chi^2 value. I'm using python and the
psycopg2 package to interface with my PostgreSQL database. I can think of two ways to do this, and I'm not sure which is faster or if there's even a difference.

Method 1: Select the rows that correspond to each model from my database, then do the chi^2 calculations in python. Each model has about 500-1000 rows associated with it that I'd have to read in and there are 20,000 models.

Method 2: Write a SQL function to calculate chi^2 in the database itself, then select this one number for each model.

I'm much more familiar with Python than SQL, so I'm tempted to go with Method 1 on that alone. However, I can see an advantage to letting postgres optimize the relational algebra and then only having to read in to my python script a single number for each model. Is there a general strategy I should follow in choosing one method over the other?

Joe Joe
Answer Source

First, chi^2 is not a complex calculation. Whichever method you use is doesn't matter. But I still recommend you use method 2 because the method 1 may lead your application to the OUT OF MEMORY error.