Ketan Bhokray Ketan Bhokray - 6 months ago 134
MySQL Question

Sequelize GROUP BY aggregating only on main table instead of the complete joined table in a join

I have five tables

USER_ATTRIB
,
QUESTION_MAIN
,
REPLY_MAIN
,
CATEGORY_MAIN
,
QSTN_CATG
related with one another as follows:

m.QUESTION_MAIN.belongsTo(m.USER_ATTRIB, { foreignKey: 'POSTER_I', targetKey: 'USER_I'});
m.QUESTION_MAIN.hasMany(m.REPLY_MAIN, { foreignKey: 'QSTN_I' });
m.QUESTION_MAIN.belongsToMany(m.CATEGORY_MAIN, { through: m.QSTN_CATG, foreignKey: 'QSTN_I' });
m.QUESTION_MAIN.hasMany(m.QSTN_CATG, { foreignKey: 'QSTN_I' });


I want to run a query on
QUESTION_MAIN
to get details about the questions. One of the detail needed is the number of replies to the questions, which can be queried by

SELECT COUNT(REPLY_MAINs.QSTN_I) GROUP BY QSTN_I;


The combined query I want to run is:



SELECT `QUESTION_MAIN`.*
,`USER_ATTRIB`.`USERATTRIB_ID` AS `USER_ATTRIB.USERATTRIB_ID`
,`USER_ATTRIB`.`USER_NAME` AS `USER_ATTRIB.USER_NAME`
,`QSTN_CATGs`.`QSTN_CATG_ID` AS `QSTN_CATGs.QSTN_CATG_ID`,
,`QSTN_CATGs`.`CATG_I` AS `QSTN_CATGs.QSTN_CATG_I`
,`REPLY_MAINs`.`REPLY_ID` AS `REPLY_MAINs.REPLY_ID`
, COUNT(`REPLY_MAINs`.`QSTN_I`) AS `REPLY_MAINs.REPLY_COUNT`
FROM (
SELECT `QUESTION_MAIN`.`QUESTION_ID`
, ( 6371 * acos( cos( radians(13.0508629) ) * cos( radians( QSTN_LOC_LAT ) ) * cos( radians( QSTN_LOC_LONG ) - radians(77.6092108) ) + sin( radians(13.0508629) ) * sin( radians( QSTN_LOC_LAT ) ) ) ) AS `DISTANCE`
FROM `QUESTION_MAIN` AS `QUESTION_MAIN`
WHERE (
SELECT `QSTN_I`
FROM `QSTN_CATG` AS `QSTN_CATG`
WHERE (`QSTN_CATG`.`QSTN_I` = `QUESTION_MAIN`.`QUESTION_ID`) LIMIT 1
) IS NOT NULL
HAVING `DISTANCE` < 5
ORDER BY `QUESTION_MAIN`.`CREATED` DESC LIMIT 3
) AS `QUESTION_MAIN`
LEFT OUTER JOIN `USER_ATTRIB` AS `USER_ATTRIB` ON `QUESTION_MAIN`.`POSTER_I` = `USER_ATTRIB`.`USER_I`
INNER JOIN `QSTN_CATG` AS `QSTN_CATGs` ON `QUESTION_MAIN`.`QUESTION_ID` = `QSTN_CATGs`.`QSTN_I`
LEFT OUTER JOIN `REPLY_MAIN` AS `REPLY_MAINs` ON `QUESTION_MAIN`.`QUESTION_ID` = `REPLY_MAINs`.`QSTN_I`
AND `REPLY_MAINs`.`REPLY_STATUS` = 200
GROUP BY `QUESTION_ID`
ORDER BY `QUESTION_MAIN`.`CREATED` DESC;


This is the Sequelize to make that query:



QUESTION_MAIN.findAll({
attributes:['QUESTION_ID', 'POSTER_I',
['( 6371 * acos( '
+ 'cos( radians('+qstnFeedRequest.qstnLocLat+') ) '
+ '* cos( radians( QSTN_LOC_LAT ) ) '
+ '* cos( radians( QSTN_LOC_LONG ) - radians('+ qstnFeedRequest.qstnLocLong+') ) '
+ '+ sin( radians('+qstnFeedRequest.qstnLocLat+') ) '
+ '* sin( radians( QSTN_LOC_LAT ) ) ) '
+ ')', 'DISTANCE'
]
],
include: [
{ model: USER_ATTRIB,
attributes:['USER_NAME']
},
{ model: QSTN_CATG,
attributes: [['CATG_I', 'QSTN_CATG_I']],
where: qstnCatgWhereClause
},
{ model: REPLY_MAIN,
attributes: [[sequelize.fn('COUNT', sequelize.col('REPLY_MAINs.QSTN_I')), 'REPLY_COUNT']],
where: {REPLY_STATUS: 200},
required: false
}
],
having:{ 'DISTANCE' : {$lt: 5} },
where: whereClause,
group: ['QUESTION_ID'],
limit: qstnFeedRequest.limit
})


The problem is that the
GROUP BY
clause is being applied inside the inner query, not on the whole join:



SELECT `QUESTION_MAIN`.*,
...
FROM (
SELECT `QUESTION_MAIN`.`QUESTION_ID`,
...
HAVING `DISTANCE` < 5
GROUP BY `QUESTION_ID` -- This should go outside
ORDER BY `QUESTION_MAIN`.`CREATED` DESC LIMIT 3
) AS `QUESTION_MAIN`
LEFT OUTER JOIN `USER_ATTRIB` ...
ORDER BY `QUESTION_MAIN`.`CREATED` DESC;


This is causing wrong aggregation on the count. No matter what I try, I am not able to get the
GROUP BY
clause out of the inner query.

How do I make the grouping to the whole join instead of the main table alone?

Answer

After exploring half of the web, I finally found the solution.

As said in the thread whose link is in the comment, it is inefficient to make a 1:M query with a limit outside the join. So, Sequelize does separate queries for 1:1 and 1:M relations given that separate: true property is set in the include statement of the 1:M table.

Even after this, there are a couple of issues:

The code breaks if the joining column for the tables is not included in the attributes.

Sequelize applies the outer having clause on the inner table also. To prevent that, I added an truthy having statement in the includes.

This is my final Sequelize after the modifications:

QUESTION_MAIN.findAll({
attributes:['QUESTION_ID', 'POSTER_I',
  ['( 6371 * acos( ' 
                  + 'cos( radians('+qstnFeedRequest.qstnLocLat+') ) ' 
                  + '* cos( radians( QSTN_LOC_LAT ) ) '
                  + '* cos( radians( QSTN_LOC_LONG ) - radians('+ qstnFeedRequest.qstnLocLong+') ) '
                  + '+ sin( radians('+qstnFeedRequest.qstnLocLat+') ) '
                  + '* sin( radians( QSTN_LOC_LAT ) ) ) '
  + ')', 'DISTANCE'
  ]
],
include: [
  { model: USER_ATTRIB, 
    attributes:['USER_NAME']
  },
  { model: QSTN_CATG, 
    attributes: [['CATG_I', 'QSTN_CATG_I']],
    where: qstnCatgWhereClause
  },
  { model: REPLY_MAIN, //this is the 1:M table
    attributes: ['QSTN_I', [sequelize.fn('COUNT', sequelize.col('REPLY_MAIN.QSTN_I')), 'REPLY_COUNT']], 
    //QSTN_I is the column joining QUESTION_MAIN and REPLY_MAIN. Not including this in the attributes throws an error 
    where: {REPLY_STATUS: 200},
    group: ['QSTN_I'], //grouping it in this query instead of the main query
    separate: true,//the culprit
    having: {'REPLY_COUNT': {$ne: null}}, //this is a dummy having clause which always returns true. This is added to stop the outer having clause being applied to the inner query
    required: false
  }
],
having:{ 'DISTANCE' : {$lt: 5} },
where: whereClause,
limit: qstnFeedRequest.limit
})

Hope this saves someone's 2 days' time