Michael Andrew Bentley Michael Andrew Bentley - 8 months ago 12
SQL Question

Join rows of table A with columns of table B in MySQL

I have two tables, A and B.

Table A contains a list of samples, which are numbered, but also have character id. It has the form

sample_num sample_id
1 id_1
2 id_2
. .
. .
. .
n id_n

Table B contains gene expression data for each sample. It has the form

probe_num 1 2 . . . n

Ideally I would have samples as rows, and the genes as columns, however, there are too many genes for MySQL to store as columns so I cannot really change this.

Now, I am given a subset of sample id's and asked to return the associated gene expression data. But I can't figure out how to perform the required join. I need to join on the column identifiers from table B with the sample_num column from table A.

This would be easy if I could, for instance, transpose table B, but I don't know if this is possible or not given the limitation on the number of columns.


Regarding what you say in a comment:

The structure of table B isn't fixed actually, which I guess is a bad thing. If a new sample is added, a new column is added to table B, and a new row is added to table A.

This is a clear bad relational design.

Instead, I suggest you to use a different aproach (the one used for n:m relationships)

Table B
probe_num gene_num gene_value
1         1        value_for_gene_1_of_probe_1
1         2        value_for_gene_2_of_probe_1
1         3        value_for_gene_3_of_probe_1
1         4        value_for_gene_4_of_probe_1
...and so on for probe_1
2         1        value_for_gene_1_of_probe_2
2         2        value_for_gene_2_of_probe_2
2         3        value_for_gene_3_of_probe_2
2         4        value_for_gene_4_of_probe_2

Now you can store information of different genes for differents probes. If new data related to new genes appears, there's no need to modify data structure. Just add new lines to the table. Such as

probe_num gene_num gene_value
15        2714     value_for_gene_2714_of_probe_15

Your table will probably have lots of lines (not a problem!)

For getting all gene information related to the probe 15:

WHERE probe_num = 15;

Finally, you can relate it with table A as follows:

JOIN TABLE_A ON TABLE_A.sample_num = TABLE_B.sample_num
WHERE TABLE_A.sample_id LIKE 'id_2';