Mayeul sgc Mayeul sgc - 2 months ago 7
SQL Question

MySQL Limit Results Based on Join Table

I have 2 tables,but linked in many to many relations so 3 tables :

Table Author :
idAuthor,
Name

+----------+-------+
| idAuthor | Name |
+----------+-------+
| 1 | Renee |
| 2 | John |
| 3 | Bob |
| 4 | Bryan |
+----------+-------+
Table Publication:
idPublication,
Title,
Type,
Date,
Journal,
Conference

+---------------+--------------+------+-------------+------------+-----------+
| idPublication | Title | Date | Type | Conference | Journal |
+---------------+--------------+------+-------------+------------+-----------+
| 1 | Flower thing | 2008 | book | NULL | NULL |
| 2 | Bees | 2009 | article | NULL | Le Monde |
| 3 | Wasps | 2010 | inproceding | KDD | NULL |
| 4 | Whales | 2010 | inproceding | DPC | NULL |
| 5 | Lyon | 2011 | article | NULL | Le Figaro |
| 6 | Plants | 2012 | book | NULL | NULL |
+---------------+--------------+------+-------------+------------+-----------+
Table author_has_publication :
Author_idAuthor,
Publication_idPublication

+-----------------+---------------------------+
| Author_idAuthor | Publication_idPublication |
+-----------------+---------------------------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 1 | 5 |
| 2 | 5 |
| 3 | 5 |
| 3 | 6 |
+-----------------+---------------------------+


What I want to do is get the top X author having the most publications.

I achieved to get the result avec the idAuthor having the most publications, using this request :
SELECT Author_idAuthor, COUNT(*) as count FROM Author_has_publication GROUP BY Author_idAuthor ORDER BY count DESC;

I get the list of the authors id, ordered by the number of publications :

+-----------------+-------+
| Author_idAuthor | count |
+-----------------+-------+
| 3 | 3 |
| 2 | 2 |
| 1 | 2 |
| 4 | 1 |
+-----------------+-------+


but then when I try to select the author corresponding to the top X of the result set of the previous query I have an error
I am Trying this
SELECT TOP 2 FROM author WHERE (SELECT Author_idAuthor, COUNT(*) as count FROM Author_has_publication GROUP BY Author_idAuthor ORDER BY count DESC)=idAuthor;

I think it might be because my inside query return 2 rows, and I do a simple
SELECT
here or that I need a
JOIN
but i have no ideas how to use it here.

Answer

MySQL has no TOP keyword. It does however have a LIMIT keyword. Your query is invalid anyway.

There are a couple of options here. The following is an example of a correlated subquery: https://en.wikipedia.org/wiki/Correlated_subquery

SELECT 
    a.idAuthor, 
    a.Name , 
    (SELECT COUNT(*) from author_has_publication ahp WHERE   
        ahp.Author_idAuthor = a.idAuthor) AS publication_count
FROM 
    author a
ORDER BY 
    publication_count DESC
LIMIT 2

As the referenced article notes, the above is inefficient as the subqeuery needs to be re-executed for each row of the result. If you do not actually need the count in the resultset then the below would be more efficient as the subquery is non-correlated and executed only once.

SELECT 
    a.idAuthor, 
    a.Name 
FROM 
    author a
INNER JOIN 
    (select ahp.Author_idAuthor AS idAuthor, COUNT(*) as publication_count 
        FROM author_has_publication ahp GROUP BY ahp.Author_idAuthor LIMIT 2) 
        AS TEMP ON TEMP.idAuthor = a.idAuthor