DaiBu DaiBu - 6 months ago 13
SQL Question

Painfully slow self JOIN

I have a relationship table named "match_players" that has about 20000 rows. There is a column named "hero_id" which represents what hero was chosen by a player in a match. All fields are indexed.

I need to build another table consisting of all combinations of two heroes, with a count of how many times they were on the same team, how many times they were enemies, etc.

My current query takes about 90 seconds on an 8 core SSD server. I imagine it has something to do with MySQL building this massive table of all combinations internally before grouping the results.

Is there an alternative method for gathering combinations of row values, maybe in which MySQL simply scans the table, recording new combinations as it finds them? Any input is appreciated.

Table "match_players":

match_id | team | position | player_id | hero_id |

56427859 | 1 | 1 | 546107 | 17 |
56427859 | 1 | 2 | 469333 | 81 |
56427859 | 1 | 3 | 227526 | 60 |
56427859 | 1 | 4 | 193739 | 32 |
56427860 | 0 | 0 | 473923 | 11 |
56427860 | 0 | 1 | 292764 | 93 |
56427860 | 0 | 2 | 138018 | 26 |
56427860 | 0 | 3 | 326510 | 96 |

etc...


The query:

SELECT mp1.hero_id, mp2.hero_id
FROM match_players mp1
INNER JOIN match_players mp2
ON mp1.hero_id < mp2.hero_id
WHERE mp1.team = mp2.team
GROUP BY mp1.hero_id, mp2.hero_id


EXPLAIN:

id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra

1 | SIMPLE | mp1 | ALL | faction_id,hero_id | NULL | NULL | NULL | 34060 | Using temporary; Using filesort
1 | SIMPLE | mp2 | ref | faction_id,hero_id | faction_id | 1 | beta_dota_2.mp1.faction_id | 3499 | Using where


Update:

Since I only need heroes that were in a match together, I updated my query to the following and it's much, much faster. I think it completes in a couple tenths of a second.

SELECT mp1.hero_id, mp2.hero_id
FROM match_players mp1
INNER JOIN match_players mp2
ON mp1.hero_id < mp2.hero_id
WHERE mp1.team = mp2.team AND mp1.match_id = mp2.match_id
GROUP BY mp1.hero_id, mp2.hero_id


I would still like to know, though, if I had wanted a list of combinations of all heroes, whether in the same match or not, what would be the best way to go about it? It clearly isn't feasible with my original query on any table larger than a few thousand rows.

Answer

Since I only need heroes that were in a match together, I updated my query to the following (joining on match_id) and it's much, much faster. I think it completes in a couple tenths of a second.

SELECT mp1.hero_id, mp2.hero_id
FROM match_players mp1
INNER JOIN match_players mp2
ON mp1.hero_id < mp2.hero_id
WHERE mp1.team = mp2.team AND mp1.match_id = mp2.match_id
GROUP BY mp1.hero_id, mp2.hero_id
Comments