N1h1l1sT N1h1l1sT - 1 month ago 9x
SQL Question

How to Combine Multiple Nested SQL tables into one?

First of all, I should preface this by letting you know that I'm a SQL novice - I've never really used SQL Server before and what I'd like to do must be quite rare or challenging because I've been unable to find any relevant answers on StackOverflow or Google.

I'd really, really appreciate your help on this. In the meantime, I myself am currently trying to improve my SQL knowledge and unearth a way to tackle this - but let's get straight to the point

I'm currently in possession of a SQL Server (which I browse through SQL Server Management Studio) with 4 tables. Everything's in Greek so no point in writing the real names. The point is that each row in Table 1 is associated with multiple rows in Table 2, which in turn is associated with multiple rows in Table 3, which in turn is associated with multiple rows in Table 4

My task is to perform AI/Machine Learning on this multi-instance multi-label problem, but to do that, I have to make it so there is only 1 table containing all the information of all tables.

SQL Server database structure:

  • 4 Tables

  • 3.75 GB

Table 1:

  • Holds information about tasks

  • 100 columns

  • 400,000 rows

  • ID
    is connected to table 2's

Table 2:

  • Each task has multiple sub-tasks (which is what this table holds)

  • 11 columns

  • 2,500,000 rows

  • ID
    is connected to table 3's

Table 3:

  • Each sub-task requires things to be bought or changed or thrown away (held in this table)

  • 8 columns

  • 17,000,000 rows

  • Material_ID
    connected to table 4's

Table 4:

  • Each material has a certain cost and stuff (held in this table)

  • 12 columns

  • 3,700 rows

The way I see it, maybe it needs to happen in stages starting from the bottom to top.

For each row in table 3, there are many associated rows in table 4; hence, each row in table 3 is inserted in a new table as many times as the number of rows associated with it in table 4.

This means that a lot of the information will be duplicated and the 3.75GB will become much bigger, but that's normal and is what the problem needs.

After this happens for table 3 and 4, same thing needs to happen for table 2, and then for table 1. Note that a couple of columns from each table have to not be included in the final table. As I understand it, the only thing this changes is the use of each column's name in the "Select" instead of the asterisk (*). Lastly, remember that I need to actually create a new table because it needs to occur only once and stay available for months to be read by machine learning programs (WEKA, R, etc) and programming libraries (Accord.NET, etc)

The thing is.. how can I combine all these tables into one table that persists?

If I've neglected to share any needed information, please inform me and I shall do so as soon as I see the message.


You use joins to get the information. Tehcnically, you can do something like

JOIN Table2 ON Table1.Table2Id = Table1.ID
JOIN Table3 ON Table2.Table3Id = Table3.ID

Etc. But, you end up with repeats that can mess things up, so you are better to only select the columns you require. The joins here are one way, and will exclude nulls, so you might need other types of joins. The most information comes from a cross join, but it makes a Cartesian product of all of the tables involved, so you have the potential of getting more back than what you require.

Here is a link that explains joins in T-SQL: http://www.techonthenet.com/sql_server/joins.php

It is a good place to get started and may answer your question with a little bit of experimentation on your part.