I have a child table with a number of charact/value pairs for a given 'material' (MaterialID). Any material can have a number of charact values and may have several of the same name (see id's 2,3).
The table has a large number of records (8+ million). What I'm trying to do is find the materials that are the most similar to a supplied material. That is, when I supply a MaterialID, I would like an ordered list of the most similar other materials (those with the most matching charact/value pairs).
I've done some research but, I may be missing some key terms or just not conceptualizing the problem correctly.
Any hints as to how to go about this would be very much appreciated.
ID MaterialID Charact Value
1 1 ROT_DIR CCW
2 1 SPECIAL_FEATURE CATALOG_CP
3 1 SPECIAL_FEATURE CHROME
4 1 SCHEDULE 80
5 2 BEARING_TYPE SB
6 2 SCHEDULE 80
7 3 ROT_DIR CCW
8 3 SPECIAL_FEATURE CATALOG_HSB
9 3 BEARING_TYPE SP
10 4 NDE_STYLE W_FAN
11 4 BEARING_TYPE SB
12 4 ROT_DIR CW*
You can do this with a self join:
select t.materialid, count(*) as nummatches from t join t tmat on t.Charact = tmat.Charact and t.value = tmat.value where tmat.materialid = @MaterialId group by t.materialid order by nummatches desc;
where t.MaterialId <> tmat.MaterialIdto the
left joinand move the
wherecondition to the
select top 1.