Piwwoli Piwwoli - 6 months ago 10
SQL Question

MySQL: Should I prefer a lot of columns or a lot of rows

My situation is kind of hard to explain, but I will try.

I have, say, 50 meters which each output some value every minute and I have currently built a table like so that every minute there are 50 rows added to a table for every meter like so:

50 rows like this every minute:

id, datetime, meternumber, metervalue


Then I have another table that has:

id, meternumber, metername


So that I can join the meternames for each meternumber. Basic stuff.

However, this is quickly becoming a problem, because 50 rows every minute is a huge amount of rows. I also have to do some averaging and summing of the meter values and that is not easy nor it is efficient, because sum and average do not seem to work well when I have to like sum every 50th row for each of the 50 rows, as I want to sum/average the metervalue for each meter.

So I was thinking, should I instead make 50 columns like so:

id, datetime, meter_1_value, meter_2_value, ..., meter_n_value


Which would basically divide the amount of rows by 50 and make summing a lot easier, because now I can just sum every row to get:

id, datetime, meter_1_value_summed, meter_2_value_summed, ..., meter_n_value_summed


So is this a good idea to basically turn those 50 rows into columns of the table? I also tried doing this in query-time by first making a temporary table that turns the rows into columns and then I query that table, but as there are already like 15 million rows, it is getting slower and slower.

Indexing, partitioning and increasing the server specs did not do much, so I am starting to run out of ideas.

I know you guys want to see actual table data for some reason, so here is an example:

DateTime TagName Value
-------- ------- -----
2016-06-02 16:24:51 meter_1_name 66.232818603515625
2016-06-02 16:24:51 meter_2_name 42.3612060546875
2016-06-02 16:24:51 meter_3_name 25.111988067626953
2016-06-02 16:24:51 meter_4_name 4.296875
2016-06-02 16:24:51 meter_5_name NULL
2016-06-02 16:24:51 meter_6_name 3.5083911418914795
2016-06-02 16:24:51 meter_7_name 46.137149810791016
2016-06-02 16:24:51 meter_8_name 71.419265747070312
2016-06-02 16:24:51 meter_9_name 68.337669372558594
2016-06-02 16:24:51 meter_10_name 3.1090855598449707
2016-06-02 16:24:51 meter_11_name 3.0222799777984619
2016-06-02 16:24:51 meter_12_name 2.3900461196899414
2016-06-02 16:24:51 meter_13_name 44.856769561767578
2016-06-02 16:24:51 meter_14_name 64.431419372558594
2016-06-02 16:24:51 meter_15_name 34.657115936279297
2016-06-02 16:24:52 meter_1_name 66.232818603515625
2016-06-02 16:24:52 meter_2_name 42.3612060546875
2016-06-02 16:24:52 meter_3_name 25.111988067626953
2016-06-02 16:24:52 meter_4_name 4.296875
2016-06-02 16:24:52 meter_5_name NULL
2016-06-02 16:24:52 meter_6_name 3.5083911418914795
2016-06-02 16:24:52 meter_7_name 46.137149810791016
2016-06-02 16:24:52 meter_8_name 71.419265747070312
2016-06-02 16:24:52 meter_9_name 68.337669372558594
2016-06-02 16:24:52 meter_10_name 3.1090855598449707
2016-06-02 16:24:52 meter_11_name 3.0222799777984619
2016-06-02 16:24:52 meter_12_name 2.3900461196899414
2016-06-02 16:24:52 meter_13_name 44.856769561767578
2016-06-02 16:24:52 meter_14_name 64.431419372558594
2016-06-02 16:24:52 meter_15_name 34.657115936279297
2016-06-02 16:24:53 meter_1_name 66.232818603515625
2016-06-02 16:24:53 meter_2_name 42.3612060546875
2016-06-02 16:24:53 meter_3_name 25.111988067626953
2016-06-02 16:24:53 meter_4_name 4.296875
2016-06-02 16:24:53 meter_5_name NULL
2016-06-02 16:24:53 meter_6_name 3.5083911418914795
2016-06-02 16:24:53 meter_7_name 46.137149810791016
2016-06-02 16:24:53 meter_8_name 71.419265747070312
2016-06-02 16:24:53 meter_9_name 68.337669372558594
2016-06-02 16:24:53 meter_10_name 3.1090855598449707
2016-06-02 16:24:53 meter_11_name 3.0222799777984619
2016-06-02 16:24:53 meter_12_name 2.3900461196899414
2016-06-02 16:24:53 meter_13_name 44.856769561767578
2016-06-02 16:24:53 meter_14_name 64.431419372558594
2016-06-02 16:24:53 meter_15_name 34.657115936279297


And I was thinking of turning that into this:

DateTime meter_1_value meter_2_value meter_3_value meter_4_value
-------- ------------- ------------- ------------- -------------
2016-06-02 16:24:51 66.2328186035 42.36146875 21.111986762693 5.29687
2016-06-02 16:24:52 70.2328186035 43.36146875 22.111988062695 2.29685
2016-06-02 16:24:53 80.2328186035 40.36120465 23.111988762653 8.29675
2016-06-02 16:24:54 90.2328186035 49.36120685 24.111986762693 5.29875


So as you can see, there would be A LOT less rows and sum/average can be done way easier this way. Figuring which value belongs to which meter in this case will not be a problem.

Edit: The row->column query is hacky and looks like this:

DROP VIEW IF EXISTS v_temp;

CREATE OR REPLACE VIEW v_temp AS
(
SELECT m.datatime,

MAX(IF(metername = 1, metervaluevalue, null)) as "meter1",
MAX(IF(metername = 2, metervaluevalue, null)) as "meter2",
MAX(IF(metername = 3, metervaluevalue, null)) as "meter3"

FROM meters m

WHERE m.datatime >= CAST("2016-05-09 00:00:00" AS DATETIME)
AND m.datatime <= CAST("2016-05-11 23:59:00" AS DATETIME)

GROUP BY datatime
);

SELECT datatime,

ROUND(AVG(meter1), 0) as meter1_avg,
ROUND(AVG(meter2), 0) as meter2_avg,
ROUND(AVG(meter3), 0) as meter3_avg

FROM v_temp

GROUP BY DATE(datatime), HOUR(datatime), MINUTE(datatime)
ORDER BY datatime ASC

Answer

So is this a good idea to basically turn those 50 rows into columns of the table?

In your position I would keep the existing structure and add a summary table that maintains the number of records for each meter and also the sum. I would not bother to keep an average because that can be calculated quite easily from the count and the sum.

Which would basically divide the amount of rows by 50 and make summing a lot easier, because now I can just sum every row to get:

id, datetime, meter_1_value_summed, meter_2_value_summed, ..., meter_n_value_summed

So is this a good idea to basically turn those 50 rows into columns of the table? I also tried doing

I believe that this would not give you a definite advantage because you would be doing a full table scan and calculating for all 50 columns. That could turn out to be much slower than the current summation.

Using an index as suggested by Gordon will help you get the sum and average for a single meter but if you would need to sum and average for all the meters you would still be reading the full table. Slow.

The summary table.

My proposed summary table would be something like this

meter_number, num_records, summation.

You would be updating this table with the aid of a trigger so that calculation is a trivial addition. Retrieval of the sum and average is a trivial query, you need to read only 50 records. No calculations except for summation/num_records.