Solo Solo - 2 years ago 75
SQL Question

Splitting database table to frequently queried data + other data

Does splitting database table up to most requested data and the rest of the data gives considerable performance boost? I've spent last week or so watching very technical tutorials and conferences but it's still not clear to me how big part does columns count, columns position and rows count play in performance (if whole table fits to memory/RAM).

I've scetched 4 different options. What would be the pros and cons for each approach?

Some details:

  • Mostly reads: at least 80% - emphasis is on performance

  • 10 different categories - each with ~100K entries/rows

  • 30 total values in each category

  • 5 frequently queried values (used in both lists and single post views)

  • 25 less frequently queried values (only in single post view)

  • To specify: if I say value, I mean entity/database column

enter image description here

Answer Source

Here is one test case:

Create test data with 1M rows:

drop table if exists posts;
create table posts (
    id int not null primary key,
    summary varchar(255),
    post text
) as
    select seq as id
    , repeat(rand(1), 10) as summary
    , repeat(rand(1), 100) as post
    from seq_1_to_1000000

drop table if exists small_data;
create table small_data (
    id int not null primary key,
    summary varchar(255)
) as select, p.summary from posts p
drop table if exists big_data;
create table big_data (
    id int not null primary key,
    post text
) as select, from posts p;

show table status where Name in('posts', 'small_data', 'big_data');

Name       | Engine | Row_format | Rows   | Avg_row_length | Data_length
big_data   | InnoDB | Compact    | 870341 |           2361 |  2055208960
posts      | InnoDB | Compact    | 838832 |           2627 |  2204106752
small_data | InnoDB | Compact    | 985832 |            229 |   226197504

So there are three tables.

  • posts ~ 2.1 GB. contains all data (id, summary ~ 200 Bytes, post ~ 2000 Bytes).
  • small_data ~ 215 MB. contains (id, summary)
  • big_data ~ 1.9 GB. contains (id, post)
select, p.summary
from posts p
order by id asc
limit 10
offset 500000

First run: 16.552 sec. Second run: 16.723 sec.

select, p.summary
from small_data p
order by id asc
limit 10
offset 500000

First run: 0.702 sec. Second run: 0.093 sec.

You can see, there may be a huge difference. But it depends on your data and your queries. So you should do your own benchmarks.


  • Tested on 10.0.19-MariaDB.
  • seq_1_to_1000000 is a table with 1M sequence numbers. You will need to create it first or use MariaDBs Sequence plugin.
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download