I have several m x n matrices of gene expression data that I want to store in MySQL.
m is approx 30,000 genes (uniquely identifiable)
n is approx 3,000 samples (mostly uniquely identifiable)
I'm not sure what the best way is to store these data. I initially read the matrices directly into MySQL tables, but I have since been told that this is not a great way to do things, since the number of columns (samples) is a variable quantity. I cannot transpose the matrices and store them that way because there are more genes than MySQL allows for when creating columns.
I've since been told that 'junction tables' might represent a better way to do this. After watching several YouTube videos on these, however, I'm none the wiser. I've also searched Google and there doesn't seem to be a tutorial on storing gene expression data in MySQL using junction tables. So, does anyone have any advice on how best to store these data? I honestly expected that there would be a huge literature on this, so if you have useful links that would also be much appreciated.
You need just a few tables for this, I am using mysql syntax:
CREATE TABLE genes ( `gene_id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY, `gene_name` varchar(99) not null )ENGINE=InnoDB; CREATE TABLE samples ( `sample_id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY, `sample_name` varchar(99) not null )ENGINE=InnoDB; CREATE TABLE gene_sample ( `gene_id` INT NOT NULL, `sample_id` INT NOT NULL, FOREIGN KEY(`gene_id`) REFERENCES genes (`gene_id`), FOREIGN KEY(`sample_id`) REFERENCES sample (`sample_id`), )ENGINE=InnoDB;
For every gene that occurs in a sample, insert the pair of
sample_id into the
Use two JOIN expressions in a SELECT to reconstruct the full data:
SELECT genes.*, samples.* FROM gene_sample LEFT JOIN genes USING (gene_id) LEFT JOIN samples USING (sample_id);