user2827214 user2827214 - 6 months ago 15
SQL Question

Prepared Statements, MyISAM with 100 million records and Caching on MySQL

I have 10 large read-only tables. I make a lot of queries that have the same format with different parameters, for example:

SELECT 'count' FROM table1 WHERE x='SOME VARIABLE';
SELECT 'count' FROM table1 WHERE x='SOME OTHER VARIABLE';
...
SELECT 'count' FROM table2 WHERE x='' AND y='';
SELECT 'count' FROM table2 WHERE x='' AND y='';
...
SELECT 'count' FROM table3 WHERE x='' AND y='' AND z='';


The variables used are different each query so I almost never execute the same query twice. Is it correct that query caching and row caching on the MySQL side would be wasteful and they should be disabled? Table caching seems like it would be a good thing.

On the client side, I am using prepared statements which I assume is good. If I enable Prepared statement caching (via Slick) though won't that hurt my performance since the parameters are so variable? Is there anything else I can do to optimize my performance?

Should auto-commit be off since I'm only doing selects and will never need to rollback?

Answer

Given that you are using the MYISAM engine and have tables which have hundreds of millions of active rows, I would take care less of how I query the cache (due to your low complexity, this is most likely the least problem), but more focus on the proper organization of the data within the database:

  • Prepared Statements are totally ok. It may be helpful to not prepare the statement over and over again. Instead, just reuse the existing prepared statement (some environments even allow to store prepared statements on the client side) with a new set of parameter values. However, this mainly only saves time, which is being used in the query cache. As the complexity of your query is quite low, it can be assumed that this won't be the biggest time consumer.
  • Key Caching (also called Key Buffering), however, is - as the name already suggests - key for your game! Most DB configurations of MySQL suffer greatly from wrong values in that area, as the buffers are way too small. In a nutshell, key caching makes sure that the references to the data (for instance in your indices) can be accessed in main memory. If they are not in memory, they need to be retrieved from the disk, which is slow. To see if your key cache is efficient, you should watch the key hit ratio, when your system is under load. Details about that is greatly explained at http://dba.stackexchange.com/questions/58182/tuning-key-reads-in-mysql.
  • If the caches become large or are being displaced frequently due to the usage of other tables, it may be helpful to create own key caches for your tables. For details, see https://dev.mysql.com/doc/refman/5.5/en/cache-index.html
  • If you always access large portions of your table via the same attributes, it may make sense to change the ordering of the data storage on the disk by using ALTER TABLE ... ORDER BY expr1, expr2, .... For details on this approach see also https://dev.mysql.com/doc/refman/5.5/en/optimizing-queries-myisam.html
  • Avoid using variable-length columns, such as VARCHAR, BLOB or TEXT. They might help to save some space, but especially comparing their values can become time-consuming. Please note, however, that already one single column of such a type will MySQL make switch to Dynamic column mode.
  • Run ANALYZE TABLE after huge data changes to keep the statistics up to date. If you have deleted huge areas, it might help to OPTIMIZE TABLE, helping to make sure that there are no large gaps around which need to be skipped when reading.
  • Use INSERT DELAYED to write changes asynchronously, if you do not need the reply. This will greatly improve your performance, if there are other SELECT statements around at the same point in time.
  • Alternatively, if you need the reply, you may use INSERT LOW_PRIORITY. Then the execution of the concurrent SELECTs are preferred compared to your INSERT. This may help to ease the pain of the fact a little, that MyISAM only supports table-level locking.
  • You may try to provide Index Hints to your queries, especially if there are multiple indices on your table which are overlapping each other. You should try to use that index which has the smallest width, but still covers the most attributes. However, please note that in your case the impact must be quite small: You are not ordering/grouping or joining, so the query optimizer should already be very good at finding the best one. Simply check by using EXPLAIN on your SELECT statement to see, if the choice of the index used is reasonable.

In short, Prepared Statements are totally ok. Key Caching is key - and there are some other things you can do to help MySQL getting along with the whole bulk of data.

Comments