I am in the process of designing a database for forum functionality for my website.After doing some searching on SO and google, I have come up with the following design:
Username : varchar(256)
Password : varchar(256)
ThreadId : int
UserId : int, related to Users table
Title : varchar(255)
Date : timestamp, when a thread was created
PostId : int
ThreadId : int, related to Threads table
UserId : int, related to Users table
Date : timestamp, when post was made
Title : varchar(255) - post title (optional)
Body : text - the actual body of a post
Your design looks basically correct.
This is a classic "normalised" data structure - exactly the kind of shape that relational databases are built for. If you don't know about normal forms, but came up with this structure, then you clearly have a natural understanding of how relational databases work.
In order for PHP to avoid traversing the whole table, you should ensure that you issue a SQL statement that selects only the records you're looking for. E.g.
SELECT * FROM posts WHERE ThreadId = ? ORDER BY Date
Your worry about the database having to traverse the whole table is fair and reasonable, though you can avoid this - this is a classic relational database problem that was solved when they first appeared as commercial products more than 30 years ago.
You can create an INDEX on posts that supports the SQL you are running. In this case something along the lines of:
CREATE INDEX postThreadsIndex ON posts ( ThreadId, Date )
This index allows your database engine to find the records you're selecting very quickly, without having to read the whole table. If you want to know how, read up on b-tree indexes.
As I said earlier in the answer, this is exactly the kind of thing that relational databases were built for, and your design is solid and appropriate.
Do not consider any alternatives - you got it right first time!
But, for completion's sake - let's look at your suggested alternative.
You suggest splitting the Post table by User - this would mean the following:
In order to retrieve the full thread you now need to look in posts_UserA, posts_UserB and posts_UserC.
If these were the only three users, then you would need to look through all the data in these threee tables in order to find all the posts, and this would be equal to the number of records that would have been in the table posts in your original design.
You gain nothing.
If you had 1000 other users, you would have to also look in these other 1000 tables in order to find that they didn't have any records.
You still gain nothing.
You could add added another table to store which users had commented on which posts - and therefore which tables to look in, but now the solution starts to get complicated.
You could split the Post table by Thread - this would mean that all the posts where in a table based on the thread that they were made on. This would probably be pretty good for selecting the posts on a single thread, but it would be awful for: - selecting all the posts made by a given user. - finding the most recent post regardless of thread. - finding all the posts made on a particular date. - anything else that did not involve a particular thread.
Basically, the alternative that you suggest may be more efficient for a very particular query, but it will almost always be extremely complicated for any other query.
The original design is simpler for all queries and can be made to perform well by adding indexes.
If you ever get to the point where your SQL performance is too slow because of the volume of data, then you can look at table partitioning, which does what you describe in an invisible way. But to be honest you're unlikely to ever need it unless your site is extremely popular - and if that's the case then you'll probably have the cash to invest in a fundamentals of relational databases course at that point anyway...