Stephen H. Anderson Stephen H. Anderson - 3 months ago 5x
MySQL Question

How to store a data whose type can be numeric, date or string in mysql

We're developing a monitoring system. In our system values are reported by agents running on different servers. This observations reported can be values like:

  • A numeric value. e.g. "CPU USAGE" = 55. Meaning 55% of the CPU is in

  • Certain event was fired. e.g. "Backup completed".

  • Status: e.g. SQL Server is offline.

We want to store this observations (which are not know in advance and will be added dynamically to the system without recompiling).

We are considering adding different columns to the observations table like this:

IntMeasure -> INTEGER
FloatMeasure -> FLOAT
Status -> varchar(255)

So if the value we whish to store is a number we can use IntMeasure or FloatMeasure according to the type. If the value is a status we can store the status literal string (or a status id if we decide to add a Statuses(id, name) table).

We suppose it's possible to have a more correct design but would probably become to slow and dark due to joins and dynamic table names depending on types? How would a join work if we can't specify the tables in advance in the query?


I haven't done a formal study, but from my own experience I would guess that more than 80% of database design flaws are generated from designing with performance as the most important (if not only) consideration.

If a good design calls for multiple tables, create multiple tables. Don't automatically assume that joins are something to be avoided. They are rarely the true cause of performance problems.

The primary consideration, first and foremost in all stages of database design, is data integrity. "The answer may not always be correct, but we can get it to you very quickly" is not a goal any shop should be working toward. Once data integrity has been locked down, if performance ever becomes an issue, it can be addressed. Don't sacrifice data integrity, especially to solve problems that may not exist.

With that in mind, look at what you need. You have observations you need to store. These observations can vary in the number and types of attributes and can be things like the value of a measurement, the notification of an event and the change of a status, among others and with the possibility of future observations being added.

This would appear to fit into a standard "type/subtype" pattern, with the "Observation" entry being the type and each type or kind of observation being the subtype, and suggests some form of type indicator field such as:

create table Observations(
   ObservationKind  char( 1 ) check( ObservationKind in( 'M', 'E', 'S' )),

But hardcoding a list like this in a check constraint has a very low maintainability level. It becomes part of the schema and can be altered only with DDL statements. Not something your DBA is going to look forward to.

So have the kinds of observations in their own lookup table:

ID  Name         Meaning
==  ===========  =======
M   Measurement  The value of some system metric (CPU_Usage).
E   Event        An event has been detected.
S   Status       A change in a status has been detected.

(The char field could just as well be int or smallint. I use char here for illustration.)

Then fill out the Observations table with a PK and the attributes that would be common to all observations.

create table Observations(
   ID               int identity primary key,
   ObservationKind  char( 1 ) not null,
   DateEntered      date not null,
   constraint FK_ObservationKind foreign key( ObservationKind )
      references ObservationKinds( ID ),
   constraint UQ_ObservationIDKind( ID, ObservationKind )

It may seem strange to create a unique index on the combination of Kind field and the PK, which is unique all by itself, but bear with me a moment.

Now each kind or subtype gets its own table. Note that each kind of observation gets a table, not the data type.

create table Measurements(
    ID                   int not null,
    ObservationKind      char( 1 ) check( ObservationKind = 'M' ),
    Name                 varchar( 32 ) not null, -- Such as "CPU Usage"
    Value                double not null, -- such as 55.00
    ...,  -- other attributes of Measurement observations
    constraint PK_Measurements primary key( ID, ObservationKind ),
    constraint FK_Measurements_Observations foreign key( ID, ObservationKind )
        references Observations( ID, ObservationKind )

The first two fields will be the same for the other kinds of observations except the check constraint will force the value to the appropriate kind. The other fields may differ in number, name and data type.

Let's examine an example tuple that may exist in the Measurements table:

ID    ObservationKind  Name       Value  ...
====  ===============  =========  =====
1001  M                CPU Usage  55.0   ...

In order for this tuple to exist in this table, a matching entry must first exist in the Observations table with an ID value of 1001 and an observation kind value of 'M'. No other entry with an ID value of 1001 can exist in either the Observations table or the Measurements table and cannot exist at all in any other of the "kind" tables (Events, Status). This works the same way for all the kind tables.

I would further recommend creating a view for each kind of observation which will provide a join of each kind with the main observation table:

create view MeasurementObservations as
    select ...
    from   Observations o
    join   Measurements m
        on m.ID = o.ID;

Any code that works solely with measurements would need to only hit this view instead of the underlying tables. Using views to create a wall of abstraction between the application code and the raw data greatly enhances the maintainability of the database.

Now the creation of another kind of observation, such as "Error", involves a simple Insert statement to the ObservationKinds table:

F   Fault        A fault or error has been detected.

Of course, you need to create a new table and view for these error observations, but doing so will have no impact on existing tables, views or application code (except, of course, to write the new code to work with the new observations).