Pirate X Pirate X - 1 month ago 7
SQL Question

JOIN with GROUP BY causing SUM() logic issues

Query -

sel TableName, DatabaseName, sum(CurrentPerm/(1024*1024*1024)) as Size_in_GB
from dbc.tablesize
group by 1,2
order by GB desc


result -

+-----------+--------+------------+
| TableName | DBName | Size_in_GB |
+-----------+--------+------------+
| WRP | A | 28,350.01 |
| CPC | B | 19,999.37 |
| SDF | C | 13,263.67 |
| DB1400 | D | 13,200.26 |
+-----------+--------+------------+


From above simple query I can see that table WRP of database A is near 28350 GB

Now I am trying to join another table
dbc.indices
to use the column
IndexType
for filtering but now the Size_in_GB changes for all tables.

sel a.TableName,a.DatabaseName, sum(CurrentPerm/(1024*1024*1024)) as Size_in_GB from dbc.tablesize a
join dbc.indices b on a.TableName = b.TableName and a.DatabaseName=b.DatabaseName
--where b.indexType='P'
group by 1,2
order by Size_in_GB desc


Result is this -

+-----------+--------+------------+
| TableName | DBName | Size_in_GB |
+-----------+--------+------------+
| WRP | A | 56,700.02 |
| CPC | B | 39,998.74 |
| DB1400 | D | 39,600.78 |
+-----------+--------+------------+


Now the same table is twice the size i.e. WRP is 56700 GB. (Similar for other tables)

I am not sure what's wrong with the logic I'm using for join.

P.S - My aim is to find all the tables which are greater than 100GB in Size and have indexType as 'P'

EDIT - Sharing relevant columns from
DBC.INDICES
table

+--------------+------------+-------------+-----------+------------+---------------+------------+----------------+
| DatabaseName | TableName | IndexNumber | IndexType | UniqueFlag | IndexName | ColumnName | ColumnPosition |
+--------------+------------+-------------+-----------+------------+---------------+------------+----------------+
| Some DB | Some Table | 1 | P | N | IndexNamehere | ColumnA | 1 |
+--------------+------------+-------------+-----------+------------+---------------+------------+----------------+

Answer

Probably your key is duplicated in dbc.indices table. For a single TableName ,dbc.indices table has more then one entry so when you join dbc.tablesize table records got duplicated so the SUM is applied on duplicate records so the mistake in calculation.

Try this way

SELECT a.TableName,
       a.DatabaseName,
       Sum(CurrentPerm / ( 1024 * 1024 * 1024 )) AS Size_in_GB
FROM   dbc.tablesize a
       JOIN (SELECT DISTINCT b.TableName,
                             b.DatabaseName
             FROM   dbc.indices b
             --where b.indexType='P'
             ) b
         ON a.TableName = b.TableName
            AND a.DatabaseName = b.DatabaseName

GROUP  BY a.TableName,
          a.DatabaseName
ORDER  BY Size_in_GB DESC