beeker beeker - 2 months ago 6
SQL Question

Find 'Most Similar' Items in Table by Foreign Key

I have a child table with a number of charact/value pairs for a given 'material' (MaterialID). Any material can have a number of charact values and may have several of the same name (see id's 2,3).

The table has a large number of records (8+ million). What I'm trying to do is find the materials that are the most similar to a supplied material. That is, when I supply a MaterialID, I would like an ordered list of the most similar other materials (those with the most matching charact/value pairs).

I've done some research but, I may be missing some key terms or just not conceptualizing the problem correctly.

Any hints as to how to go about this would be very much appreciated.

ID MaterialID Charact Value
1 1 ROT_DIR CCW
2 1 SPECIAL_FEATURE CATALOG_CP
3 1 SPECIAL_FEATURE CHROME
4 1 SCHEDULE 80
5 2 BEARING_TYPE SB
6 2 SCHEDULE 80
7 3 ROT_DIR CCW
8 3 SPECIAL_FEATURE CATALOG_HSB
9 3 BEARING_TYPE SP
10 4 NDE_STYLE W_FAN
11 4 BEARING_TYPE SB
12 4 ROT_DIR CW*

Answer

You can do this with a self join:

select t.materialid, count(*) as nummatches
from t join
     t tmat
     on t.Charact = tmat.Charact and t.value = tmat.value
where tmat.materialid = @MaterialId
group by t.materialid
order by nummatches desc;

Notes:

  • You might want to remove the specified material, by adding where t.MaterialId <> tmat.MaterialId to the where clause.
  • If you want all materials, then make the join a left join and move the where condition to the on clause.
  • If you want only one material with the most matches, use select top 1.
  • If you want all materials with the most matches when there are ties, use `select top (1) with ties.
Comments