Miles Abarr Miles Abarr - 3 months ago 12
Python Question

Find parent with certain combination of child rows - SQLite with Python

There are several parts to this question. I am working with sqlite3 in Python 2.7, but I am less concerned with the exact syntax, and more with the methods I need to use. I think the best way to ask this question is to describe my current database design, and what I am trying to accomplish. I am new to databases in general, so I apologize if I don't always use correct nomenclature.

I am modeling refrigeration systems (using Modelica--not really important to know), and I am using the database to manage input data, results data, and models used for that data.

My top parent table is

Model
, which contains the columns:

id, name, version, date_created


My child table under
Model
is called
Design
. It is used to create a unique id for each combination of design input parameters and the model used. the columns it contains are:

id, model_id, date_created


I then have two child tables under
Design
, one called
Input
, and the other called
Result
. We can just look at Input for now, since one example should be enough. The columns for input are:

id, value, design_id, parameter_id, component_id


parameter_id
and
component_id
are foreign keys to their own tables.The
Parameter
table has the following columns:

id, name, units


Some example rows for
Parameter under
name are: length, width, speed, temperature, pressure (there are many dozens more). The Component table has the following columns:

id, name


Some example rows for
Component
under name are: compressor, heat_exchanger, valve.

Ultimately, in my program I want to search the database for a specific design. I want to be able to search a specific design to be able to grab specific results for that design, or to know whether or not a model simulation with that design has already been run previously, to avoid re-running the same data point.

I also want to be able to grab all the parameters for a given design, and insert it into a class I have created in Python, which is then used to provide inputs to my models. In case it helps for solving the problem, the classes I have created are based on the components. So, for example, I have a compressor class, with attributes like compressor.speed, compressor.stroke, compressor.piston_size. Each of these attributes should have their own row in the Parameter table.

So, how would I query this database efficiently to find if there is a design that matches a long list (let's assume 100+) of parameters with specific values? Just as a side note, my friend helped me design this database. He knows databases, but not my application super well. It is possible that I designed it poorly for what I want to accomplish.

Here is a simple picture trying to map a certain combination of parameters with certain values to a design_id, where I have taken out component_id for simplicity:

Picture of simplified tables

Answer

Simply join the necessary tables. Your schema properly reflects normalization (separating tables into logical groupings) and can scale for one-to-many relationships. Specifically, to answer your question --So, how would I query this database efficiently to find if there is a design that matches a long list (let's assume 100+) of parameters with specific values?-- consider below approaches:

Inner Join with Where Clause

For handful of parameters, use an inner join with a WHERE...IN() clause. Below returns design fields joined by input and parameters tables, filtered for specific parameter names where you can have Python pass as parameterized values even iteratively in a loop:

SELECT d.id, d.model_id, d.date_created
FROM design d
INNER JOIN input i ON d.id = i.design_id
INNER JOIN parameters p ON p.id = i.parameter_id
WHERE p.name IN ('param1', 'param2', 'param3', 'param4', 'param5', ...)

Inner Join with Temp Table

Should values be over 100+ in a long list, consider a temp table that filters parameters table to specific parameter values:

# CREATE EMPTY TABLE (SAME STRUCTURE AS parameters)
sql = "CREATE TABLE tempparams AS SELECT id, name, units FROM parameters WHERE 0;"
cur.execute(sql)
db.commit()

# ITERATIVELY APPEND TO TEMP
for i in paramslist:                                  # LIST OF 100+ ITEMS
    sql = "INSERT INTO tempparams (id, name, units) \
           SELECT p.id, p.name, p.units \
           FROM parameters p \
           WHERE p.name = ?;"
    cur.execute(sql, i)                               # CURSOR OBJECT COMMAND PASSING PARAM
    db.commit()                                       # DB OBJECT COMMIT ACTION 

Then, join main design and input tables with new temp table holding specific parameters:

SELECT d.id, d.model_id, d.date_created
FROM design d
INNER JOIN input i ON d.id = i.design_id
INNER JOIN tempparams t ON t.id = i.parameter_id 

Same process can work with components table as well.