KB3 KB3 - 4 years ago 180
SQL Question

How does SQL Group By Clause Divide Up SQL Table

I have this table below:

enter image description here

And when I run this query:

Select Gender, City, SUM(Salary) as TotalSalary, COUNT(ID) as [Total Employees]
from tblEmployee
Group by Gender, City


The output table is:

enter image description here

I am a confused on how from the query is the Total Employees being sorted out. I have
Count(ID) as [Total Employees]
in the Query but how is the Group by clause in the SQL code dividing up the Total Employees? Since I thought
Count(ID) as [Total Employees]
will return 10 since I have 10 records in my table but I am not sure how this group by clause is dividing up the total employees equally? Is it by City or by Gender?

Answer Source

General info about grouping

Taken from Postgres documentation. GROUP BY clause behavior is described below.

The optional GROUP BY clause has the general form:

GROUP BY expression [, ...]

GROUP BY will condense into a single row all selected rows that share the same values for the grouped expressions.

[...]

Aggregate functions, if any are used, are computed across all rows making up each group, producing a separate value for each group.

Response to your question

Question :

I am a confused on how from the query is the Total Employees being sorted out. I have Count(ID) as [Total Employees] in the Query but how is the Group by clause in the SQL code dividing up the Total Employees? Since I thought Count(ID) as [Total Employees] will return 10 since I have 10 records in my table but I am not sure how this group by clause is dividing up the total employees equally? Is it by City or by Gender?

Answer :

Remember that what name you assign to a column (as alias) doesn't actually impact the behavior and data presented in this column. Your column TotalEmployees counts every occurence of column id for every group that consists of those columns from the GROUP BY clause.

In your particular case each different pair of values from columns (Gender, City) is being condensed into exactly one row. Without your TotalEmployees column this would behave the same as applying DISTINCT to your query, so that both below presented queries would yield the same result:

1. Select Gender, City from tblEmployee Group by Gender, City;

2. Select Distinct Gender, City from tblEmployee;

Going back to your questions, aggregate function COUNT(*) (and any aggregate function for that matter) is being applied on entire grouped expression. This means that both below queries would yield the same number of id column occurence:

1. SELECT Count(ID) as [Total Employees] from tblEmployee;

2. SELECT SUM([Total Employees])
FROM (
  SELECT 
    Gender, City, SUM(Salary) as TotalSalary, COUNT(ID) as [Total Employees] 
  FROM 
    tblEmployee 
  GROUP BY Gender, City
  ) foo
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download