user6789594 user6789594 - 3 months ago 6
SQL Question

PostgreSQL: Foreign key in new tables from existing table

I'm very new to Postgresql and Python and I'm having some issues understanding foreign keys (which I think is what I use here). I've had a look at an example from here but I don't think that is exactly what I need

As a simple example, I have some information in an existing table:

[ID REFERENCE REF_AGE DATA1 DATA 2]
[1 JOHN 50 50 60 ]
[2 JOHN 50 55 30 ]
[3 TOM 60 60 10 ]
[4 MATT 30 76 57 ]
[5 MATT 30 45 47 ]


I want to make two new tables from this. One including the ID and data with a reference_id that links to the other new table - a reference table, where I can store the other information about each reference (e.g. the age above).

Table 1:

[ID REF_ID DATA1 DATA 2]
[1 1 50 60 ]
[2 1 55 30 ]
[3 2 60 10 ]
[4 3 76 57 ]
[5 3 45 47 ]


Table 2:

[REF_ID NAME AGE ]
[1 JOHN 50 ]
[2 TOM 60 ]
[3 MATT 30 ]


Can anyone show me how to split existing data like this? Separate the unique values from the original tables reference column into the new reference table, and insert corresponding ref_id into the other new table?

Answer

There is the recipe. But you have a problem if the person names are not unique.

create table not_normalized (
    id int, reference text, ref_age int, data1 int, data2 int
);

insert into not_normalized (id, reference, ref_age, data1, data2) values
(1,'JOHN',50,50, 60    ),
(2,'JOHN',50,55, 30    ),
(3,'TOM',60,60, 10    ),
(4,'MATT',30,76, 57    ),
(5,'MATT',30,45, 47    );

create table referenced (
    ref_id serial primary key,
    name text,
    age int
);

Selecting the distinct pair (name, age) minimizes the name collision problem:

insert into referenced (name, age)
select distinct reference, ref_age
from not_normalized
;
table referenced;
 ref_id | name | age 
--------+------+-----
      1 | JOHN |  50
      2 | TOM  |  60
      3 | MATT |  30

create table referencer (
    id serial primary key,
    ref_id int references referenced (ref_id),
    data1 int, data2 int
);

Again use the age to minimize collisions:

insert into referencer (ref_id, data1, data2)
select r.ref_id, data1, data2
from
    not_normalized nn
    inner join
    referenced r on r.name = nn.reference and r.age = nn.ref_age
;
table referencer;
 id | ref_id | data1 | data2 
----+--------+-------+-------
  1 |      1 |    50 |    60
  2 |      1 |    55 |    30
  3 |      3 |    76 |    57
  4 |      3 |    45 |    47
  5 |      2 |    60 |    10
Comments