Paul Paul - 1 year ago 62
SQL Question

Determine dependencies beetween colums in table

I have a table in a denormalized form like below:

Col1 Col2 Col3 Col4 Col5
Paris France Europe 1 4
Paris France Europe 2 5
Paris France Europe 3 6
Washington D.C. USA North America 8 9
Washington D.C. USA North America 7 7
many more rows

In order to normalize it I need to understand the structure of data.

Presumably there is a logical dependency from Col3 to Col2 and from Col2 to Col1. Paris isthe capital of France and France is a country in europe.

How can I prove this with a SQL query? Basically I need to prove that there are combinations like "Paris - France - Europe", "Washington D.C. - USA - North America" and so on, but never "Paris - USA - Europe" or "Washington D.C. - USA - Europe" for example. Actually the query should also prove right if I find something like "Berlin - Germany - Africa" in my DB, as long as I do not find "Berlin - Germany - Europe".

Answer Source

You can use aggregation:

select col3, count(*), count(distinct col2);
from t
group by col3;

The expectation is that the second column would have a value of "1". You could get all examples with multiple values in col2 by using having count(distinct col2) > 1.

Of course, cities do have the same name. Paris, for instance, is a rather well-known city in Texas.