Toby Mellor Toby Mellor - 4 months ago 69
PHP Question

Identify which promise failed and dynamically change promise queue in Guzzle 6

I need to download a large number of large files, stored across multiple identical servers. A file, like '5.doc', that is stored on server 3, is also stored on server 55.

To speed this up, instead of using just one server to download all the files one after another, I'm using all servers at the same time. The problem is that one of the servers may be much slower than the others, or may even be down. When using Guzzle to batch download files, all of the files in that batch must be downloaded before another batch starts.

Is there a way to immediately start downloading another file alongside others so that all of the servers are constantly downloading a file?

If a server is down, I've set a timeout of 300 seconds and when this is reached Guzzle will catch it's ConnectionException.

How do I identify which of the promises (downloads) have failed so I can cancel them? Can I get information about which file/server failed?

Below is a simplified example of the code I'm using to illustrate the point. Thanks for the help!

$filesToDownload = [['5.doc', '8.doc', '10.doc'], ['1.doc', '9.doc']]; //The file names that we need to download
$availableServers = [3, 55, 88]; //Server id's that are available

foreach ($filesToDownload as $index => $fileBatchToDownload) {
$promises = [];

foreach ($availableServers as $key => $availableServer) {
array_push(
$promises, $client->requestAsync('GET', 'http://domain.com/' . $fileBatchToDownload[$index][$key], [
'timeout' => 300,
'sink' => '/assets/' . $fileBatchToDownload[$index][$key]
])
);

$database->updateRecord($fileBatchToDownload[$index][$key], ['is_cached' => 1]);
}

try {
$results = Promise\unwrap($promises);
$results = Promise\settle($promises)->wait();
} catch (\GuzzleHttp\Exception\ConnectException $e) {
//When can't connect to the server or didn't download within timeout
foreach ($e->failed() as $failedPromise) {
//Re-set record in database to is_cached = 0
//Delete file from server
//Remove this server from the $availableServers list as it may be down or too slow
//Re-add this file to the next batch to download $filesToDownload
}
}
}

Answer

I'm not sure how you are doing an asynchronous download of one file from multiple servers using Guzzle, but getting array index of failed requests can be done by promise's then() method:

array_push(
    $promises,
    $client->requestAsync('GET', "http://localhost/file/{$id}", [
            'timeout' => 10,
            'sink' => "/assets/{$id}"
        ])->then(function() {
            echo 'Success';
        },
        function() use ($id) {
            echo "Failed: $id";
        }
    )
);

then() accepts two callbacks. First one is triggered on success and the second one on failure. Source calls them $onFullfilled and $onRejected. Other usages are documented in guzzle documentation. This way you can start downloading a file immediately after its failure.

Can I get information about which file/server failed?

When a promise failed then it means request remained unfulfilled. In this case you can get host and requested path by passing an instance of RequestException class to second then()'s callback:

use GuzzleHttp\Exception\RequestException;
.
.
.
array_push(
    $promises,
    $client->requestAsync('GET', "http://localhost/file/{$id}", [
            'timeout' => 10,
            'sink' => "/assets/{$id}"
        ])->then(function() {
            echo 'Success';
        },
        function(RequestException $e)  {
            echo "Host: ".$e->getRequest()->getUri()->getHost(), "\n";
            echo "Path: ".$e->getRequest()->getRequestTarget(), "\n";
        }
    )
);

So you will have full information about failing host and file's name. If you may need access to more information you should know that $e->getRequest() returns an instance of GuzzleHttp\Psr7\Request class and all methods on this class are available to be used here. (Guzzle and PSR-7)

When an item is successfully downloaded, can we then immediately start a new file download on this free server, whilst the other files are still downloading?

I think you should decide to download new files only on creating promises at the very beginning and repeat/renew failed requests within second callback. Trying to make new promises followed by a successful promise may result in an endless process with downloading duplicated files and that's not simple to handle.