fyodorfranz fyodorfranz - 6 months ago 17
SQL Question

Selecting objects that are associated with similar datasets

I'm trying to select all company rows from a [Company] table that share with at least one other company, the same number of employees (from an [Employee] table that has a CompanyId column), where each group of respective employees share the same set of LocationIds (a column in the [Employee] table) and in the same proportion.

So, for instance, two companies with three employees each that have the locationIds 1,2, and 2, would be selected by this query.

[Employee]

EmployeeId | CompanyId | LocationId |
========================================
1 | 1 | 1
2 | 1 | 2
3 | 1 | 2
4 | 2 | 1
5 | 2 | 2
6 | 2 | 2
7 | 3 | 3



[Company]

CompanyId |
============
1 |
2 |
3 |


Returns the CompanyIds:
======================
1
2


CompanyIds 1 and 2 are selected because they share in common with at least one other company: 1. the number of employees (3 employees); and 2. the number/proportion of LocationIds associated with those employees (1 employee has LocationId 1 and 2 employees have LocationId 2).

So far I think I want to use a
HAVING COUNT(?) > 1
statement, but I'm having trouble working out the details. Does anyone have any suggestions?

Answer

This is ugly, but the only way I can think of to do it:

;with CTE as (
    select c.Id,
        (
            select e.Location, count(e.Id) [EmployeeCount]
            from Employee e
            where e.IdCompany=c.Id
            group by e.Location
            order by e.Location
            for xml auto
        ) LocationEmployeeData
    from Company c
)
select c.Id
from Company c
join (
    select x.LocationEmployeeData, count(x.Id) [CompanyCount]
    from CTE x
    group by x.LocationEmployeeData
    having count(x.Id) >= 2
) y on y.LocationEmployeeData = (select LocationEmployeeData from CTE where Id = c.Id)

See fiddle: http://www.sqlfiddle.com/#!6/6bc16/5

It works by encoding the Employee count per Location data (multiple rows) into an xml string for each Company.

The CTE code on its own:

select c.Id,
    (
        select e.Location, count(e.Id) [EmployeeCount]
        from Employee e
        where e.IdCompany=c.Id
        group by e.Location
        order by e.Location
        for xml auto
    ) LocationEmployeeData
from Company c

Produces data like:

Id  LocationEmployeeData
1   <e Location="1" EmployeeCount="2"/><e Location="2" EmployeeCount="1"/>
2   <e Location="1" EmployeeCount="2"/><e Location="2" EmployeeCount="1"/>
3   <e Location="3" EmployeeCount="1"/>

Then it compares companies based on this string (rather than trying to ascertain whether multiple rows match, etc).