jay_t jay_t - 2 months ago 19
Python Question

RabbitMQ, Pika and reconnection strategy

I'm using Pika to process data from RabbitMQ.
As I seemed to run into different kind of problems I decided to write a small test application to see how I can handle disconnects.

I wrote this test app which does following:


  1. Connect to Broker, retry until successful

  2. When connected create a queue.

  3. Consume this queue and put result into a python Queue.Queue(0)

  4. Get item from Queue.Queue(0) and produce it back into the broker queue.



What I noticed were 2 issues:


  1. When I run my script from one host connecting to rabbitmq on another host (inside a vm) then this scripts exits on random moments without producing an error.

  2. When I run my script on the same host on which RabbitMQ is installed it runs fine and keeps running.



This might be explained because of network issues, packets dropped although I find the connection not really robust.

When the script runs locally on the RabbitMQ server and I kill the RabbitMQ then the script exits with error: "ERROR pika SelectConnection: Socket Error on 3: 104"

So it looks like I can't get the reconnection strategy working as it should be. Could someone have a look at the code so see what I'm doing wrong?

Thanks,

Jay

#!/bin/python
import logging
import threading
import Queue
import pika
from pika.reconnection_strategies import SimpleReconnectionStrategy
from pika.adapters import SelectConnection
import time
from threading import Lock

class Broker(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.logging = logging.getLogger(__name__)
self.to_broker = Queue.Queue(0)
self.from_broker = Queue.Queue(0)
self.parameters = pika.ConnectionParameters(host='sandbox',heartbeat=True)
self.srs = SimpleReconnectionStrategy()
self.properties = pika.BasicProperties(delivery_mode=2)

self.connection = None
while True:
try:
self.connection = SelectConnection(self.parameters, self.on_connected, reconnection_strategy=self.srs)
break
except Exception as err:
self.logging.warning('Cant connect. Reason: %s' % err)
time.sleep(1)

self.daemon=True
def run(self):
while True:
self.submitData(self.from_broker.get(block=True))
pass
def on_connected(self,connection):
connection.channel(self.on_channel_open)
def on_channel_open(self,new_channel):
self.channel = new_channel
self.channel.queue_declare(queue='sandbox', durable=True)
self.channel.basic_consume(self.processData, queue='sandbox')
def processData(self, ch, method, properties, body):
self.logging.info('Received data from broker')
self.channel.basic_ack(delivery_tag=method.delivery_tag)
self.from_broker.put(body)
def submitData(self,data):
self.logging.info('Submitting data to broker.')
self.channel.basic_publish(exchange='',
routing_key='sandbox',
body=data,
properties=self.properties)
if __name__ == '__main__':
format=('%(asctime)s %(levelname)s %(name)s %(message)s')
logging.basicConfig(level=logging.DEBUG, format=format)
broker=Broker()
broker.start()
try:
broker.connection.ioloop.start()
except Exception as err:
print err

Answer

The main problem with your script is that it is interacting with a single channel from both your main thread (where the ioloop is running) and the "Broker" thread (calls submitData in a loop). This is not safe.

Also, SimpleReconnectionStrategy does not seem to do anything useful. It does not cause a reconnect if the connection is interrupted. I believe this is a bug in Pika: https://github.com/pika/pika/issues/120

I attempted to refactor your code to make it work as I think you wanted it to, but ran into another problem. Pika does not appear to have a way to detect delivery failure, which means that data may be lost if the connection drops. This seems like such an obvious requirement! How can there be no way to detect that basic_publish failed? I tried all kinds of stuff including transactions and add_on_return_callback (all of which seemed clunky and overly complicated), but came up with nothing. If there truly is no way then pika only seems to be useful in situations that can tolerate loss of data sent to RabbitMQ, or in programs that only need to consume from RabbitMQ.

This is not reliable, but for reference, here's some code that solves your multi-thread problem:

import logging
import pika
import Queue
import sys
import threading
import time
from functools import partial
from pika.adapters import SelectConnection, BlockingConnection
from pika.exceptions import AMQPConnectionError
from pika.reconnection_strategies import SimpleReconnectionStrategy

log = logging.getLogger(__name__)

DEFAULT_PROPERTIES = pika.BasicProperties(delivery_mode=2)


class Broker(object):

    def __init__(self, parameters, on_channel_open, name='broker'):
        self.parameters = parameters
        self.on_channel_open = on_channel_open
        self.name = name

    def connect(self, forever=False):
        name = self.name
        while True:
            try:
                connection = SelectConnection(
                    self.parameters, self.on_connected)
                log.debug('%s connected', name)
            except Exception:
                if not forever:
                    raise
                log.warning('%s cannot connect', name, exc_info=True)
                time.sleep(10)
                continue

            try:
                connection.ioloop.start()
            finally:
                try:
                    connection.close()
                    connection.ioloop.start() # allow connection to close
                except Exception:
                    pass

            if not forever:
                break

    def on_connected(self, connection):
        connection.channel(self.on_channel_open)


def setup_submitter(channel, data_queue, properties=DEFAULT_PROPERTIES):
    def on_queue_declared(frame):
        # PROBLEM pika does not appear to have a way to detect delivery
        # failure, which means that data could be lost if the connection
        # drops...
        channel.confirm_delivery(on_delivered)
        submit_data()

    def on_delivered(frame):
        if frame.method.NAME in ['Confirm.SelectOk', 'Basic.Ack']:
            log.info('submission confirmed %r', frame)
            # increasing this value seems to cause a higher failure rate
            time.sleep(0)
            submit_data()
        else:
            log.warn('submission failed: %r', frame)
            #data_queue.put(...)

    def submit_data():
        log.info('waiting on data queue')
        data = data_queue.get()
        log.info('got data to submit')
        channel.basic_publish(exchange='',
                    routing_key='sandbox',
                    body=data,
                    properties=properties,
                    mandatory=True)
        log.info('submitted data to broker')

    channel.queue_declare(
        queue='sandbox', durable=True, callback=on_queue_declared)


def blocking_submitter(parameters, data_queue,
        properties=DEFAULT_PROPERTIES):
    while True:
        try:
            connection = BlockingConnection(parameters)
            channel = connection.channel()
            channel.queue_declare(queue='sandbox', durable=True)
        except Exception:
            log.error('connection failure', exc_info=True)
            time.sleep(1)
            continue
        while True:
            log.info('waiting on data queue')
            try:
                data = data_queue.get(timeout=1)
            except Queue.Empty:
                try:
                    connection.process_data_events()
                except AMQPConnectionError:
                    break
                continue
            log.info('got data to submit')
            try:
                channel.basic_publish(exchange='',
                            routing_key='sandbox',
                            body=data,
                            properties=properties,
                            mandatory=True)
            except Exception:
                log.error('submission failed', exc_info=True)
                data_queue.put(data)
                break
            log.info('submitted data to broker')


def setup_receiver(channel, data_queue):
    def process_data(channel, method, properties, body):
        log.info('received data from broker')
        data_queue.put(body)
        channel.basic_ack(delivery_tag=method.delivery_tag)

    def on_queue_declared(frame):
        channel.basic_consume(process_data, queue='sandbox')

    channel.queue_declare(
        queue='sandbox', durable=True, callback=on_queue_declared)


if __name__ == '__main__':
    if len(sys.argv) != 2:
        print 'usage: %s RABBITMQ_HOST' % sys.argv[0]
        sys.exit()

    format=('%(asctime)s %(levelname)s %(name)s %(message)s')
    logging.basicConfig(level=logging.DEBUG, format=format)

    host = sys.argv[1]
    log.info('connecting to host: %s', host)
    parameters = pika.ConnectionParameters(host=host, heartbeat=True)
    data_queue = Queue.Queue(0)
    data_queue.put('message') # prime the pump

    # run submitter in a thread

    setup = partial(setup_submitter, data_queue=data_queue)
    broker = Broker(parameters, setup, 'submitter')
    thread = threading.Thread(target=
         partial(broker.connect, forever=True))

    # uncomment these lines to use the blocking variant of the submitter
    #thread = threading.Thread(target=
    #    partial(blocking_submitter, parameters, data_queue))

    thread.daemon = True
    thread.start()

    # run receiver in main thread
    setup = partial(setup_receiver, data_queue=data_queue)
    broker = Broker(parameters, setup, 'receiver')
    broker.connect(forever=True)