Benjamin W Benjamin W - 16 days ago 5
MySQL Question

mysql create post and store images & paragraphs in single table or multiple table

I try to build a post article page contain images and paragraphs.

My question is should I create 2 tables for images and paragraphs?

or create a single table and store images & paragraphs in one column?

*(images & paragraphs will have order from top to bottom)

1. multiple table



//article table

id title
1 first post


//image table (easy to output, but db will become very big)

id article_id src
1 1 abc.jpg


//paragraph table

id article_id text
1 1 this is first paragraph...


2. single table (store tag inside db, output will have xss problem)



//article table

id content
1 <p>feff.....</p><img... /><p>efefef</p><img.../>

Answer

The problem with your first approach is that it has no concept of order. Unless you only have one image and one paragraph in a specific order there's no way to guarantee the correct order. A better approach to such a task would be to have something like articles and article_contents.

// articles

id, title

// article_contents

id, article_id, content_type, body, order

I suppose you could also not have the order field and rely on auto increment ID but I think it's better to be more explicit with these things.

So when rendering the the template you can do something like. (this is all pseudo code)

<h1>
    <?= $article->title; ?>
</h1>

<?php
foreach($article->content as $content) {
    // this would be better as a class constant
    if ($content->content_type === 'image') {
        // have a class to handle rendering of images
        echo \Content\Image::render($content->body);
    }

    if ($content->content_type === 'paragraph') {
        // have a class to handle rendering of content, this output will likely need to be purified (see below)
        echo \Content\Paragraph::render($content->body);
    }
}
?>

There are lots of ways you could do it.

The other option you mentioned does have issues but you can handle the XSS issue one of two ways.

  1. htmlentities() but then you lose all formatting ().
  2. HTML Purifier, this will allow you to whitelist certain tags like bold, italic and paragraph. You will also likely need to do this for the first solution as well because articles are pretty awful to read without any formatting.

Updates based on comments:

In regards to your first comment, you can't rely on the IDs if the IDs are in two separate tables, it might work but it's a very fragile solution that doesn't offer any benefit I can think of.

As for purifying src tags, you just have to configure HTMLpurifier to validate src tags, from the looks of it you can do that with the directive Core.RemoveInvalidImg. HTMLPuirifier has a LOT of these options built in, it's very powerful. See the links bellow for more options/documentation.

https://github.com/GordonLesti/Lesti_Blog/tree/master/lib/HTMLPurifier/ConfigSchema/schema

http://htmlpurifier.org/docs/enduser-customize.html

Further updates:

In regards to the performance. That's what is known as "a good problem to have".

When selecting articles you can cache the resulting HTML and serve that, there's no need to bother the database when the data doesn't change frequently.

When inserting new articles, you just need to wait and build around it. You can do things like update the UI and update the database asynchronously. Keep the user informed, that's the most important thing.

When updating articles don't forget to invalidate the cache.

The main thing to take away though is, that's not really a problem you need to think about right now. Throw a caching layer in now and when you get to the size of SO then you can employ people who are specialists to handle that kind of crazyness. See the link below, it's very interesting.

http://nickcraver.com/blog/2013/11/22/what-it-takes-to-run-stack-overflow/