TFAtTheMoon TFAtTheMoon - 1 month ago 5
HTTP Question

NginX issues HTTP 499 error after 60 seconds despite config. (PHP and AWS)

At the end of last week I noticed a problem on one of my medium AWS instances where Nginx always returns a HTTP 499 response if a request takes more than 60 seconds. The page being requested is a PHP script

I've spent several days trying to find answers and have tried everything that I can find on the internet including several entries here on Stack Overflow, nothing works.

I've tried modifying the PHP settings, PHP-FPM settings and Nginx settings. You can see a question I raised on the NginX forums on Friday (http://forum.nginx.org/read.php?9,237692) though that has received no response so I am hoping that I might be able to find an answer here before I am forced to moved back to Apache which I know just works.

This is not the same problem as the HTTP 500 errors reported in other entries.

I've been able to replicate the problem with a fresh micro AWS instance of NginX using PHP 5.4.11.

To help anyone who wishes to see the problem in action I'm going to take you through the set-up I ran for the latest Micro test server.

You'll need to launch a new AWS Micro instance (so it's free) using the AMI ami-c1aaabb5

This PasteBin entry has the complete set-up to run to mirror my test environment. You'll just need to change example.com within the NginX config at the end

http://pastebin.com/WQX4AqEU

Once that's set-up you just need to create the sample PHP file which I am testing with which is

<?php
sleep(70);
die( 'Hello World' );
?>


Save that into the webroot and then test. If you run the script from the command line using php or php-cgi, it will work. If you access the script via a webpage and tail the access log /var/log/nginx/example.access.log, you will notice that you receive the HTTP 1.1 499 response after 60 seconds.

Now that you can see the timeout, I'll go through some of the config changes I've made to both PHP and NginX to try to get around this. For PHP I'll create several config files so that they can be easily disabled

Update the PHP FPM Config to include external config files

sudo echo '
include=/usr/local/php/php-fpm.d/*.conf
' >> /usr/local/php/etc/php-fpm.conf


Create a new PHP-FPM config to override the request timeout

sudo echo '[www]
request_terminate_timeout = 120s
request_slowlog_timeout = 60s
slowlog = /var/log/php-fpm-slow.log ' >
/usr/local/php/php-fpm.d/timeouts.conf


Change some of the global settings to ensure the emergency restart interval is 2 minutes

# Create a global tweaks
sudo echo '[global]
error_log = /var/log/php-fpm.log
emergency_restart_threshold = 10
emergency_restart_interval = 2m
process_control_timeout = 10s
' > /usr/local/php/php-fpm.d/global-tweaks.conf


Next, we will change some of the PHP.INI settings, again using separate files

# Log PHP Errors
sudo echo '[PHP]
log_errors = on
error_log = /var/log/php.log
' > /usr/local/php/conf.d/errors.ini

sudo echo '[PHP]
post_max_size=32M
upload_max_filesize=32M
max_execution_time = 360
default_socket_timeout = 360
mysql.connect_timeout = 360
max_input_time = 360
' > /usr/local/php/conf.d/filesize.ini


As you can see, this is increasing the socket timeout to 3 minutes and will help log errors.

Finally, I'll edit some of the NginX settings to increase the timeout's that side

First I edit the file /etc/nginx/nginx.conf and add this to the http directive
fastcgi_read_timeout 300;

Next, I edit the file /etc/nginx/sites-enabled/example which we created earlier (See the pastebin entry) and add the following settings into the server directive

client_max_body_size 200;
client_header_timeout 360;
client_body_timeout 360;
fastcgi_read_timeout 360;
keepalive_timeout 360;
proxy_ignore_client_abort on;
send_timeout 360;
lingering_timeout 360;


Finally I add the following into the location ~ .php$ section of the server dir

fastcgi_read_timeout 360;
fastcgi_send_timeout 360;
fastcgi_connect_timeout 1200;


Before retrying the script, start both nginx and php-fpm to ensure that the new settings have been picked up. I then try accessing the page and still receive the HTTP/1.1 499 entry within the NginX example.error.log.

So, where am I going wrong? This just works on apache when I set PHP's max execution time to 2 minutes.

I can see that the PHP settings have been picked up by running phpinfo() from a web-accessible page. I just don't get, I actually think that too much has been increased as it should just need PHP's max_execution_time, default_socket_timeout changed as well as NginX's fastcgi_read_timeout within just the server->location directive.

Update 1



Having performed some further test to show that the problem is not that the client is dying I have modified the test file to be

<?php
file_put_contents('/www/log.log', 'My first data');
sleep(70);
file_put_contents('/www/log.log','The sleep has passed');
die('Hello World after sleep');
?>


If I run the script from a web page then I can see the content of the file be set to the first string. 60 seconds later the error appears in the NginX log. 10 seconds later the contents of the file changes to the 2nd string, proving that PHP is completing the process.

Update 2



Setting fastcgi_ignore_client_abort on; does change the response from a HTTP 499 to a HTTP 200 though nothing is still returned to the end client.

Update 3



Having installed Apache and PHP (5.3.10) onto the box straight (using apt) and then increasing the execution time the problem does appear to also happen on Apache as well. The symptoms are the same as NginX now, a HTTP200 response but the actual client connection times out before hand.

I've also started to notice, in the NginX logs, that if I test using Firefox, it makes a double request (like this PHP script executes twice when longer than 60 seconds). Though that does appear to be the client requesting upon the script failing

Answer

The cause of the problem is the Elastic Load Balancers on AWS. They, by default, timeout after 60 seconds of inactivity which is what was causing the problem.

So it wasn't NginX, PHP-FPM or PHP but the load balancer.

To fix this, simply go into the ELB "Description" tab, scroll to the bottom, and click the "(Edit)" link beside the value that says "Idle Timeout: 60 seconds"

Comments