NaturalBornCamper NaturalBornCamper - 2 months ago 10
MySQL Question

Mysql FULLTEXT search with GROUP BY, keep row value that has the highest score

In two FULLTEXT searches, I look for the search terms in the title of the book and the tags and get the following result:


rScore tScore ID
...
1.235689725827653 0 406
0.928482055664062 2.37063312530518 406
0.928482055664062 0 406
0.453363467548853 0 520
...


WHAT I WOULD LIKE TO HAVE, all duplicate ID have been concatenated taking highest scores:


rScore tScore ID
...
1.235689725827653 2.37063312530518 406
0.453363467548853 0 520
...


, but after a GROUP BY, ID 406 was grouped in this row column:


...
MATCH_SCORE_TITLE MATCH_SCORE_TAGS ID
0.928482055664062 0 406
0.453363467548853 0 520
...


How can I group all these results and keeping the max value of each MATCH? I know this has been asked before and can be done with a JOIN, but I didn't find it with a combination of two rows, plus I already have JOINS in my query since TITLE and TAGS are in two different tables.

UPDATE:
I have 3 tables, "registrants" (the left table with the titles to search), the "registrants_tags" (relational table between the left and the right table) and "tags" (the right table with the tags to search. Here is a simplified version of the SQL query:


SELECT
tags.tag, (Also tried (GROUP_CONCAT(`tags`.`tag`) AS tags)
MAX(MATCH(registrants.story_title) AGAINST('bob')) as rScore,
MAX(MATCH(tags.tag) AGAINST('bob')) as tScore,
registrants.id, registrants.story_title
FROM registrants
LEFT JOIN registrants_tags ON registrants.id = registrants_tags.registrant
LEFT JOIN tags ON registrants_tags.tag = tags.id
WHERE MATCH(registrants.story_title) AGAINST('bob')
OR MATCH(tags.tag) AGAINST('bob')
GROUP BY registrants.id
ORDER BY (rScore + tScore) DESC


Which gives me error message: "#1247 - Reference 'tscore' not supported (reference to group function)"

Answer

You could sort by the score, and use a max on a subquery to get the final preferred row.

For example:

SELECT  id, story_title,
    max(match_score_title) as titleScore,
    max(match_score_tags) as as tagScore
FROM (
    SELECT
        MATCH(registrants.story_title) AGAINST('bob') as rScore,
        MATCH(tags.tag) AGAINST('bob') as tScore,
        registrants.id, registrants.story_title
    FROM
    registrants 
    LEFT JOIN registrant_tags on registrant_tags.registrant=registrant.id
    LEFT JOIN tags on tags.id=registrant_tags.tag
    WHERE rScore > 0 or tScore > 0
) AS score_matcher
group by ID
ORDER BY (rScore + tScore) DESC

That ought to work for you. It may not be the quickest query in the universe, since it's relying on subqueries, which in MySQL aren't terribly well optimized in my experience, but it should get you your results.

You could also rework it to a different subquery to take advantage of group_concat like follows:

SELECT
    MATCH(registrants.story_title) AGAINST('bob') as rScore,
    MATCH(tags.tag) AGAINST('bob') as tScore,
    registrants.id, registrants.story_title
FROM
registrants 
LEFT JOIN (
    FROM rtags.registrant, GROUP_CONCACT(DISTINCT tags.tag SEPARATOR ',') as tags
    FROM registrants_tags AS rtags
    INNER JOIN tags on tags.id=registrants_tags.id
    GROUP BY rtags.registrant
) AS grouped_tags ON registrants.id = grouped_tags.registrant 
WHERE rScore > 0 or tScore > 0
ORDER BY (rScore + tScore) DESC

It would help, if in your database, you added a "grouped_tags" field to your registrant table, which could then have a fulltext index added to it - that would eliminate the need for the grouped_tags subquery. Then whenever someone updates the tags for a particular registrant, the grouped_tags field get updated with the current list of correct tags.

And if you did as I recommended about adding a grouped_tags field (that gets populated in the interface), you could replace the whole query with this, which with fulltext indexes would be pretty fast (however fulltext indexes require the use of MyISAM, which isn't exactly great).

If you did that, then this will definitely be the fastest query I've got listed here.

SELECT
    MATCH(story_title) AGAINST('bob') as rScore,
    MATCH(grouped_tags) AGAINST('bob') as tScore,
    id, story_title
FROM
registrants 
WHERE rScore > 0 or tScore > 0
GROUP BY ID
ORDER BY (rScore + tScore) DESC

So there are a pile of recommendations for you for making this query happen, and which solution you go with will depend largely on the size of your data set, and how fast the query need to be. I recommend doing some benchmarking to find which one works best for you