W.Zhang W.Zhang - 1 month ago 14
SQL Question

SQL: speed up searching multiple criteria

I am a rookie in SQL, so it may be a simple question for someone. Now I have two tables, let's say A and B.

A has columns of "gender", "age", "academic degree", "years of working experience", "state of residence", "skills level", and "career".

B has columns of "gender", "age", "academic degree", "years of working experience", "state of residence", and "annual income". (first 5 are the same as in A)

Now I want to search by these 5 criteria from B and add a column "annual income" to table A. I tried to used INNER JOIN like:


SELECT A.*, B.annual_income

FROM B INNER JOIN A ON (A.gender = B.gender)

AND (A.age = B.age)

AND (A.academic_degree = B.academic_degree)

AND (A.years_of_working_experience = B.years_of_working_experience)

AND (A.state_of_residence = B.state_of_residence);


The code did work but it took too long. Does anyone know if there is a much faster algorthm to solve the problem? Thanks!

Answer

Query performance tuning can be a tricky topic... I can maybe offer a few tips, though you'll need to consider more information than just what's provided in the question to really nail down a solution.

I gather from your question's tags that you're using MS Access, so that's my working assumption here... I also assume that the data is local (i.e. stored in the .MDB file, rather than linked remote tables).

The biggest impact you can have on query performance is to make sure your query is cooperating with your physical data model. The table structure you've described is likely to make life difficult to be honest.

Are there additional columns, like maybe a surrogate key, in each table? If you can use a simpler join it may help performance. Whether you have a simpler join key or not, making sure that the join key is properly indexed in both tables is probably the biggest single step that would affect performance.

You ask if there's another algorithm... When writing SQL you don't really think about specific algorithms - the database takes care of that - but to the spirit of the question, I'd say no. A join is what you need to do.

If this is something you'll need to do multiple times, then it may be worth improving the physical data model. (If you only need to join the data once, then this may not be worth it because converting the data into the improved model will likely take as long as running the join once.)

So assuming it's the case that each combination of (age,gender,degree,years,state) appears only once in each table, you could create a new table that assigns a unique integer to each combination; then add an indexed integer column to each table to store the assigned value. A join on that should run better than what you have.

Comments