Dennis Dennis - 2 months ago 24
MySQL Question

Dapper batch queries instead of a single query executed many times

I'm trying to optimize some queries, and I have this crazy one. The basic idea is I get a bunch of rooms which has some corresponding meetings. I currently run a query to get all the rooms, then foreach room I need to get the meetings, where I do a query for each room. This opens up for a lot of database connections (i.e. 1000 rooms each having to open a connection to pull the meetings), and I'd like to do it as a batch instead. I am using dapper to map my queries to models and I'm trying to use the list parameters described here

SELECT
mm.id,
mm.organizer_name as Organizer,
mm.subject as Subject,
mm.start_time as StartTime,
mm.end_time as EndTime,
(mm.deleted_at IS NOT NULL) as WasCancelled,
(am.interactive = 0 AND am.cancelled_at IS NOT NULL) as WasNoShow,
c.name as name
FROM master_meeting mm
LEFT JOIN master_meeting__exchange mme ON mme.id=mm.id
LEFT JOIN master_meeting__forwarded_exchange mmfe ON mmfe.id=mm.id
LEFT JOIN meeting_instance__exchange mie ON mie.meeting_id=mm.id
LEFT JOIN meeting_instance__forwarded_exchange mife ON mife.meeting_id=mm.id
LEFT JOIN appointment_meta__exchange ame ON mie.item_id=ame.item_id
LEFT JOIN appointment_meta__exchange ame2 ON mife.item_id=ame2.item_id
LEFT JOIN appointment_meta am ON am.id=ame.id
LEFT JOIN appointment_meta am2 ON am2.id=ame2.id
LEFT JOIN calendar c on mie.calendar_id=c.id
WHERE mie.calendar_id = @Id OR mife.calendar_id=@Id
AND mm.start_time BETWEEN @StartTime AND @EndTime


Without going into details of the crazy long join sequence, I currently have to do this query, a lot. It has been written up initially as:

List<Result> resultSet = new List<Result>();

foreach(int id in idList){
resultSet.AddRange(
_queryHandler.Handle(
new MeetingQuery(id, "FixedStartTime", "FixedEndTime")
)
);
}


Which in turn calls this a bunch of times and runs the query:

_connection.Query<Meeting>(sql,
new {
Id = query.id,
StartTime = query.StartTime,
EndTime = query.EndTime
}
);


This obviously requires quite a few database connections, and I'd like to avoid this by having dapper doing multiple queries, but I get the following error if I try to add the parameters as a list which looks like this:

class Parameters {
int Id;
string StartTime;
string EndTime;
}
List<Parameters> parameters = new List<Parameters>();
foreach(int id in idList)
parameters.Add(new Parameters(id, "SameStartTime", "SameEndTime");


Then I would use the list of parameters as this:

_connection.Query<Meeting>(sql,parameters);


The error I get is:

dapper Additional information: An enumerable sequence of parameters (arrays, lists, etc) is not allowed in this context

Answer

Firstly, it's possible to reuse a single connection for multiple queries, so you could retrieve all of your data with multiple Dapper "Query" calls using the same connection.

Something like the following (which isn't the exact same query as you showed since I was testing this on my own computer with a local database; it should be easy enough to see how it could be altered to work with your query, though) -

private static IEnumerable<Record> UnbatchedRetrieval(IEnumerable<Parameters> parameters)
{
    var allResults = new List<Record>();
    using (var conn = GetConnection())
    {
        foreach (var parameter in parameters)
        {
            allResults.AddRange(
                conn.Query<Record>(
                    "SELECT Id, Title FROM Posts WHERE Id = @id",
                    parameter
                )
            );
        }
    }
    return allResults;
}

public class Parameters
{
    public int Id { get; set; }
}

However, if it really is the number of queries that you want to reduce through batching then there isn't anything in Dapper that makes it very easy to do since each parameter must be uniquely named, which won't be the case if you provide multiple instances of a type as the "parameters" value (since there will be "n" Id values that are all called "Id", for example).

You could do something a bit hacky to produce a single query string that will return results from multiple parameter sets, such as the following -

private static IEnumerable<Record> BatchedRetrieval(IEnumerable<Parameters> parameters)
{
    using (var conn = GetConnection)
    {
        var select = "SELECT Id, Title FROM Posts";
        var where = "Id = {0}";

        var sqlParameters = new DynamicParameters();
        var combinedWheres =
            "(" +
            string.Join(
                ") OR (",
                parameters.Select((parameter, index) =>
                {
                    sqlParameters.Add("id" + index, parameter.Id);
                    return string.Format(where, "@id" + index);
                })
            ) +
            ")";

        return conn.Query<Record>(
            select + " WHERE " + combinedWheres,
            sqlParameters
        );
    }
}

public class Parameters
{
    public int Id { get; set; }
}

.. but this feels a bit dirty. It might be an option to explore, though, if you are absolutely sure that performing those queries one-by-one is a performance bottleneck.

Another thing to consider - when you need the data for 1000 different ids, are the start and end times always the same for each of the 1000 queries? If so, then you could possibly change your query to the following:

private static IEnumerable<Record> EfficientBatchedRetrieval(
    IEnumerable<int> ids,
    DateTime startTime,
    DateTime endTime)
{
    using (var conn = GetConnection())
    {
        return conn.Query<Record>(
            @"SELECT 
                    mm.id,
                    mm.organizer_name as Organizer,
                    mm.subject as Subject,
                    mm.start_time as StartTime,
                    mm.end_time as EndTime,
                    (mm.deleted_at IS NOT NULL) as WasCancelled,
                    (am.interactive = 0 AND am.cancelled_at IS NOT NULL) as WasNoShow,
                    c.name as name
            FROM master_meeting mm
            LEFT JOIN master_meeting__exchange mme ON mme.id=mm.id
            LEFT JOIN master_meeting__forwarded_exchange mmfe ON mmfe.id=mm.id
            LEFT JOIN meeting_instance__exchange mie ON mie.meeting_id=mm.id
            LEFT JOIN meeting_instance__forwarded_exchange mife ON mife.meeting_id=mm.id
            LEFT JOIN appointment_meta__exchange ame ON mie.item_id=ame.item_id
            LEFT JOIN appointment_meta__exchange ame2 ON mife.item_id=ame2.item_id
            LEFT JOIN appointment_meta am ON am.id=ame.id
            LEFT JOIN appointment_meta am2 ON am2.id=ame2.id
            LEFT JOIN calendar c on mie.calendar_id=c.id
            WHERE mie.calendar_id IN @Ids OR mife.calendar_id IN @Ids
            AND mm.start_time BETWEEN @StartTime AND @EndTime",
            new { Ids = ids, StartTime = startTime, EndTime = endTime }
        );
    }
}

There may be a problem with this if you call it with large numbers of ids, though, due to the way that Dapper converts the IN clause - as described in http://stackoverflow.com/a/19938414/3813189 (where someone warns against using it with large sets of values).

If that approach fails then it might be possible to do something similar to the temporary table bulk load suggested here: http://stackoverflow.com/a/9947259/3813189, where you get all of the keys that you want data for into a temporary table and then perform a query that joins on to that table for the keys (and then deletes it again after you have the data).

Comments