Does splitting database table up to most requested data and the rest of the data gives considerable performance boost? I've spent last week or so watching very technical tutorials and conferences but it's still not clear to me how big part does columns count, columns position and rows count play in performance (if whole table fits to memory/RAM).
I've scetched 4 different options. What would be the pros and cons for each approach?
Here is one test case:
Create test data with 1M rows:
drop table if exists posts; create table posts ( id int not null primary key, summary varchar(255), post text ) as select seq as id , repeat(rand(1), 10) as summary , repeat(rand(1), 100) as post from seq_1_to_1000000 ; drop table if exists small_data; create table small_data ( id int not null primary key, summary varchar(255) ) as select p.id, p.summary from posts p ; drop table if exists big_data; create table big_data ( id int not null primary key, post text ) as select p.id, p.post from posts p; show table status where Name in('posts', 'small_data', 'big_data'); Name | Engine | Row_format | Rows | Avg_row_length | Data_length big_data | InnoDB | Compact | 870341 | 2361 | 2055208960 posts | InnoDB | Compact | 838832 | 2627 | 2204106752 small_data | InnoDB | Compact | 985832 | 229 | 226197504
So there are three tables.
posts~ 2.1 GB. contains all data (id, summary ~ 200 Bytes, post ~ 2000 Bytes).
small_data~ 215 MB. contains (id, summary)
big_data~ 1.9 GB. contains (id, post)
select p.id, p.summary from posts p order by id asc limit 10 offset 500000
First run: 16.552 sec. Second run: 16.723 sec.
select p.id, p.summary from small_data p order by id asc limit 10 offset 500000
First run: 0.702 sec. Second run: 0.093 sec.
You can see, there may be a huge difference. But it depends on your data and your queries. So you should do your own benchmarks.
seq_1_to_1000000is a table with 1M sequence numbers. You will need to create it first or use MariaDBs Sequence plugin.