just_a_coder just_a_coder - 1 year ago 67
MySQL Question

An efficient database design for a simple forum using php and mysql

I am in the process of designing a database for forum functionality for my website.After doing some searching on SO and google, I have come up with the following design:
User table

Username : varchar(256)
Password : varchar(256)

Threads table

ThreadId : int
UserId : int, related to Users table
Title : varchar(255)
Date : timestamp, when a thread was created

Posts table

PostId : int
ThreadId : int, related to Threads table
UserId : int, related to Users table
Date : timestamp, when post was made
Title : varchar(255) - post title (optional)
Body : text - the actual body of a post

Even though this serves my purpose, i can't help but think this is not very efficient, especially for selecting all the posts for a particular thread for which it would be necessary to traverse the entire table.

From the top of my head, i can think of a design where the Users table and Threads table remain as they are, but instead of having a single table for the Posts table,I create a Posts table for every user having the same name as the user.This way retrieving all the posts belonging to a thread is a lot more efficient,since all I need is the userid of the person who started the thread.With this info I search the table with the same name to retrieve all posts for the specific thread.But is it a good idea to make the number of tables I create directly dependant on the number of registered users?What i also want to know is, which one of these designs will scale better,will be easier to manage? Is there a better database design for my requirement?

Answer Source

Your design looks basically correct.

This is a classic "normalised" data structure - exactly the kind of shape that relational databases are built for. If you don't know about normal forms, but came up with this structure, then you clearly have a natural understanding of how relational databases work.


In order for PHP to avoid traversing the whole table, you should ensure that you issue a SQL statement that selects only the records you're looking for. E.g.

SELECT * FROM posts WHERE ThreadId = ? ORDER BY Date

Your worry about the database having to traverse the whole table is fair and reasonable, though you can avoid this - this is a classic relational database problem that was solved when they first appeared as commercial products more than 30 years ago.

You can create an INDEX on posts that supports the SQL you are running. In this case something along the lines of:

CREATE INDEX postThreadsIndex ON posts ( ThreadId, Date )

This index allows your database engine to find the records you're selecting very quickly, without having to read the whole table. If you want to know how, read up on b-tree indexes.


As I said earlier in the answer, this is exactly the kind of thing that relational databases were built for, and your design is solid and appropriate.

Do not consider any alternatives - you got it right first time!

But, for completion's sake - let's look at your suggested alternative.

You suggest splitting the Post table by User - this would mean the following:

  • User "UserA" creates a thread - his initial post is stored in posts_UserA
  • User "UserB" responds to the post - his post is storied in posts_UserB
  • User "UserC" responds to the post - her post is storied in posts_UserC

In order to retrieve the full thread you now need to look in posts_UserA, posts_UserB and posts_UserC.

If these were the only three users, then you would need to look through all the data in these threee tables in order to find all the posts, and this would be equal to the number of records that would have been in the table posts in your original design.

You gain nothing.

If you had 1000 other users, you would have to also look in these other 1000 tables in order to find that they didn't have any records.

You still gain nothing.

You could add added another table to store which users had commented on which posts - and therefore which tables to look in, but now the solution starts to get complicated.

You could split the Post table by Thread - this would mean that all the posts where in a table based on the thread that they were made on. This would probably be pretty good for selecting the posts on a single thread, but it would be awful for: - selecting all the posts made by a given user. - finding the most recent post regardless of thread. - finding all the posts made on a particular date. - anything else that did not involve a particular thread.

Basically, the alternative that you suggest may be more efficient for a very particular query, but it will almost always be extremely complicated for any other query.

The original design is simpler for all queries and can be made to perform well by adding indexes.

If you ever get to the point where your SQL performance is too slow because of the volume of data, then you can look at table partitioning, which does what you describe in an invisible way. But to be honest you're unlikely to ever need it unless your site is extremely popular - and if that's the case then you'll probably have the cash to invest in a fundamentals of relational databases course at that point anyway...

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download