Grief Coder Grief Coder - 3 months ago 38
C# Question

How to limit the amount of concurrent async I/O operations?

// let's say there is a list of 1000+ URLs
string[] urls = { "http://google.com", "http://yahoo.com", ... };

// now let's send HTTP requests to each of these URLs in parallel
urls.AsParallel().ForAll(async (url) => {
var client = new HttpClient();
var html = await client.GetStringAsync(url);
});


Here is the problem, it starts 1000+ simultaneous web requests. Is there an easy way to limit the concurrent amount of these async http requests? So that no more than 20 web pages are downloaded at any given time. How to do it in the most efficient manner?

Answer

You can definitely do this in the latest versions of async for .NET, using .NET 4.5 Beta. The previous post from 'usr' points to a good article written by Stephen Toub, but the less announced news is that the async semaphore actually made it into the Beta release of .NET 4.5

If you look at our beloved SemaphoreSlim class (which you should be using since it's more performant than the original Semaphore), it now boasts the WaitAsync(...) series of overloads, with all of the expected arguments - timeout intervals, cancellation tokens, all of your usual scheduling friends :)

Stephen's also written a more recent blog post about the new .NET 4.5 goodies that came out with Beta: http://blogs.msdn.com/b/pfxteam/archive/2012/02/29/10274035.aspx

Last, here's some sample code about how to use SemaphoreSlim for async method throttling:

async Task MyOuterMethod() {

    // let's say there is a list of 1000+ URLs
    string[] urls = { "http://google.com", "http://yahoo.com", ... };

    // now let's send HTTP requests to each of these URLs in parallel
    List<Task> allTasks = new List<Task>();
    SemaphoreSlim throttler = new SemaphoreSlim(initialCount: 20);
    foreach (var url in urls) {

        // do an async wait until we can schedule again
        await throttler.WaitAsync();

        // using Task.Run(...) to run the lambda in its own parallel
        // flow on the threadpool
        allTasks.Add(Task.Run(async () => {
            try {
                var client = new HttpClient();
                var html = await client.GetStringAsync(url);
            } finally {
                throttler.Release();
            }
        }));
    }

    // won't get here until all urls have been put into tasks
    await Task.WhenAll(allTasks);

    // won't get here until all tasks have completed in some way
    // (either success or exception)
}

Last, but probably a worthy mention is a solution that uses TPL-based scheduling. You can create delegate-bound tasks on the TPL that have not yet been started, and allow for a custom task scheduler to limit the concurrency. In fact, there's an MSDN sample for it here:

http://msdn.microsoft.com/en-us/library/ee789351

Comments