Jeff Jeff - 1 year ago 84
SQL Question

data grabbing/restructuring speed

This question might be flagged as too broad or opinion-based, but I take the risk...

I have a REST-API in php that gets all data from a mysql table, that also includes 'hasMany' fields. Let's call them a 'post' hasMany 'comments'.

Right now I'm doing ONE select with a LEFT JOIN on comments, then walk through the results to restructure the output to

{ "posts": [
{"id": 1,
"comments": [1,2,3]

All is fine until I have more than one hasMany-field, because then the refacturing gets complicated (now produces double entries) and I would need to loop through the result (not manually, but still with built in functions) several times.

So I thought about refacturing my code to:

1. select the actual item ('post')

2. select all hasMany fields ('comments', 'anythingelse',...) and add the results.

which of course produces loads of action on my db.

So my question is if anybody has a simple answer like 'better grab all the data in one go from a database and do the work in php' or the opposite.

Yes, I could do benchmarks myself. But fist - to be honest I would like to avoid all the reprogramming just to find out it's slower - second i don't know if my benchmark would remain the same on an optimized (and linux) production machine (right now I'm developing on easyPhp on windows).

Some info:
The 'post' table could result in some hundred records, same as the hasMany each. But combined with some hasMany fields it could result in a recordset (with the first aproach) of several thousands.

Answer Source

Use the IN (…) operator.

First, get the relevant posts on their own:

SELECT […stuff…] FROM posts WHERE […conditions…]

Then take the list of post IDs from the results you get there and substitute the whole list into a set of queries of the form:

SELECT […stuff…] FROM comments WHERE post_id IN (1, 2, 3 […etc…])
SELECT […stuff…] FROM anythingelse WHERE post_id IN (1, 2, 3 […etc…])

Running one query per dependent table is fine. It's not significantly more expensive than running a single JOINed query; in fact, it may be less expensive, as there's no duplication of the fields from the parent table.

Make sure the post_id column is indexed on the subtables, of course.