Adding a new column or adding a new index can take hours and days for large innodb tables in MySQL with more than 10 million rows. What is the best way to increase the performance on large innodb tables in these two cases? More memory, tweaking the configuration (for example increasing the sort_buffer_size or innodb_buffer_pool_size), or some kind of trick? Instead of altering a table directly, one could create a new one, change it, and copy the old data the new, like this which is useful for ISAM tables and multiple changes:
CREATE TABLE tablename_tmp LIKE tablename;
ALTER TABLE tablename_tmp ADD fieldname fieldtype;
INSERT INTO tablename_tmp SELECT * FROM tablename;
ALTER TABLE tablename RENAME tablename_old;
ALTER TABLE tablename_tmp RENAME tablename;
Edit 2016: we've recently (August 2016) released
gh-ost, modifying my answer to reflect it.
Today there are several tools which allow you to do online alter table for MySQL. These are:
A large table will take long time to
innodb_buffer_pool_size is important, and so are other variables, but on very large table they are all negligible. It just takes time.
What MySQL does to
ALTER a table is to create a new table with new format, copy all rows, then switch over. During this time the table is completely locked.
It will most probably perform worst of all options. Why is that? Because you're using an InnoDB table, the
INSERT INTO tablename_tmp SELECT * FROM tablename makes for a transaction. a huge transaction. It will create even more load than the normal
Moreover, you will have to shut down your application at that time so that it does not write (
UPDATE) to your table. If it does - your whole transaction is pointless.
The tools do not all work alike. However, the basics are shared:
The openark-kit tool has been in use for 3.5 years now. The Percona tool is a few months old, but possibly more tested then the former. Facebook's tool is said to work well for Facebook, but does not provide with a general solution to the average user. I haven't used it myself.
gh-ost is a triggerless solution, which significantly reduces master write-load on the master, decoupling the migration write load from the normal load. It is auditable, controllable, testable. We've developed it internally at GitHub and released it as open source; we're doing all our production migrations via
gh-ost today. See more here.
Each tool has its own limitations, look closely at documentation.
The conservative way is to use an Active-Passive Master-Master replication, do the
ALTER on the standby (passive) server, then switch roles and do the
ALTER again on what used to be the active server, now turned passive. This is also a good option, but requires an additional server, and deeper knowledge of replication.