kenorb kenorb - 2 months ago 7
Linux Question

How to fix issues with 'The SSH connection was unexpectedly closed by the remote end'?

My configuration:


  • dedicated server (Ubuntu 16.04 LTS) used for Jenkins (2.7.1) only,

  • over 100+ Jenkins jobs, each invoking vagrant instances to AWS (Vagrantfile),

  • each job (provision script) may take 1-2h to run,

  • most of the server config files (such as SSH) have default system configuration.



When I run multiple Jenkins instances at the same time, they're more likely to fail with this error:

00:00:00.774 + vagrant up --no-provision --destroy-on-error --provider=aws
00:00:09.635 Bringing machine 'MT-aws' up with 'aws' provider...
...
00:01:16.498 MT-aws: Running: inline script
...
00:01:26.415 ==> MT-aws: + echo
00:01:26.415 ==> MT-aws: + sleep 20
00:01:26.427 The SSH connection was unexpectedly closed by the remote end. This
00:01:26.427 usually indicates that SSH within the guest machine was unable to
00:01:26.427 properly start up. Please boot the VM in GUI mode to check whether
00:01:26.427 it is booting properly.
00:01:26.625 Build step 'Execute shell' marked build as failure


Facts:


  • the provisioning script fails at random places (no specific code right before the fail),

  • server is not overloaded and has plenty of free RAM and access to Gbit network,

  • the more jobs I run in parallel, they've more chances to fail,

  • re-running the same job individually usually works fine,

  • default settings in
    /etc/ssh/ssh_config
    , no
    ~/.ssh/config
    for Jenkins.






How can I fix the above issue with SSH being unexpectedly closed?

Do I need to increase some SSH timeout settings or something else?

Answer

Open your /etc/ssh/sshd_config file:

# vi /etc/ssh/sshd_config

Modify setting as follows:

ClientAliveInterval 30
ClientAliveCountMax 5

Where,

ClientAliveInterval: Sets a timeout interval in seconds (30) after which if no data has been received from the client, sshd will send a message through the encrypted channel to request a response from the client. The default is 0, indicating that these messages will not be sent to the client. This option applies to protocol version 2 only.

ClientAliveCountMax: Sets the number of client alive messages (5) which may be sent without sshd receiving any messages back from the client. If this threshold is reached while client alive messages are being sent, sshd will disconnect the client, terminating the session.

Close and save the file, then restart sshd, e.g.:

# /etc/init.d/ssh restart

or:

# service sshd restart

Another option is enable ServerAliveInterval in the client’s (your workstation) ssh_config file, e.g.

# vi /etc/ssh/ssh_config

Then append/modify values as follows:

ServerAliveInterval 15
ServerAliveCountMax 3

Where,

ServerAliveInterval: Sets a timeout interval in seconds after which if no data has been received from the server, ssh will send a message through the encrypted channel to request a response from the server.

In above example, ServerAliveInterval is set to 15 and ServerAliveCountMax is left at the 3, if the server becomes unresponsive, ssh will disconnect after approximately 45 seconds. Again this option applies to protocol version 2 only.

Comments