Michele Mariotti Michele Mariotti - 2 months ago 18
SQL Question

SQL find all homologous categories

I have a many-to-many relation between Category and Item, which is implemented with the "category_item" table, having 2 columns (PK):

category_id
and
item_id
.

Two homologous categories have the same items (order is not relevant):

A
1
2


is homologous to

B
1
2


but is not to

C D E
1 1 1
3 2
3


Given a category id, how can I find all homologous categories?






Suppose this data in
item_category
:

A 1
A 2
B 1
B 2
C 1
C 3
D 1
E 1
E 2
E 3


I want to find all categories homologous to
A
(the expected result is just
B
)

I'm currently trying something like:

select r2.category_id
from category_item r1, category_item r2
where r1.category_id = ?
and r2.category_id <> r1.category_id
and r1.item_id = r2.item_id


that builds the table:

A 1 B 1
A 1 C 1
A 1 D 1
A 1 E 1
A 2 B 2
A 2 D 2
A 2 E 2


but I don't know how to continue...




I'm using MySQL 5.7, but I'd like to do it with just generic SQL.

note that this is not a homework (also I don't think any teacher will assign such a complex one), it's just a extra-simplified use case for a real world problem

Answer

You can also do it using subqueries and exists/ not exists

Select Distinct category_id 
from category_item ci
Where not exists   -- this allows only cats that do not have all req items
      (select * from category_item
       Where category_id = ci.category_id 
         and item_id Not in 
            (Select item_id from category_item
             Where category_Id = @catId))
  and not exists   -- this filters out cats that have xtra items
      (Select * from category_item
       Where category_Id = @catId
          and item_id Not in
            (Select item_id from category_item
             Where category_Id = ci.category_Id )) 
  and category_Id <> @catId -- <- categoryId of category you are matching
                            -- this line filters out the category you are 
                            -- matching against. Remove it if you want all
                            -- homologous categories

EXPLANATION:

Select All distinctCategory_Ids from the join table where: First subquery: where there is not another join table row for that same category with anItem_idthat is *not* in the set ofItem_ids for the category you are matching against, and Second subquery: where there is not an item in the category you are matching against that is not also in the category you are selecting (ci table)