Terraform RabbitMQ Autocluster

At mytaxi we are handling a lot of MQTT traffic back and forth with the taxi driver app. Thousands of connections must be kept for all online drivers. The system behind that has to be fast and reliable otherwise customers might not be able to book a taxi. Our former RabbitMQ setup was running on Amazon Linux. While planning a RabbitMQ update we found, that Amazon Linux only provides Erlang R14B04 in their repositories. So Amazon Linux was a dead end. We then turned to Docker and slightly modified the alpine-rabbitmq-autocluster image for our needs. Together with an AWS Autoscaling Group we created a setup that is easy to provision and scale.

This post will describe how to launch a RabbitMQ Cluster in AWS with the help of Terraform and the RabbitMQ Autocluster Plugin. The cluster only provides MQTT access, but can easily be modified for AMQP. It runs RabbitMQ 3.6.9 on Erlang OTP 19.1 in an Alpine Linux 3.5 container.

Check out the repository to get started.

RabbitMQ

Docker

The docker folder contains all necessary assets for the RabbitMQ Autocluster container. The Autocluster and AWS plugin are directly placed in the docker/plugins folder.

The config enables the following plugins:

AWS
Autocluster
Management
MQTT

Instance Userdata

The userdata script is a piece of bash that AWS runs during initial instance startup. It runs as root, so we have the opportunity to install Docker and launch the container. If you work in another AWS region than eu-west-1, be sure to change the region specific settings in the userdata file.

Here is the commented Docker run command that starts the container:

docker run -d \
    --name rabbitmq \
    --net=host \ <- Host networking for performance
    --dns-search=eu-west-1.compute.internal \ <- Region specific
    --ulimit nofile=65536:65536 \ <- Increase open file limits
    --restart on-failure:5 \
    -p 1883:1883 \
    -p 4369:4369 \
    -p 5672:5672 \
    -p 15672:15672 \
    -p 25672:25672 \
    -e AUTOCLUSTER_TYPE=aws \
    -e AWS_AUTOSCALING=true \
    -e AUTOCLUSTER_CLEANUP=true \
    -e CLEANUP_WARN_ONLY=false \
    -e AWS_DEFAULT_REGION=eu-west-1 \ <- Fill in your region
    -v /mnt/storage:/var/lib/rabbitmq/mnesia \ <- Volume on the host for RabbitMQ data
    hrzbrg/rabbitmq-autocluster

Config Learnings

As of RabbitMQ 3.6.7 background garbage collection has been disabled per default. We have found that this leads to out of memory issues in our setup. Hence background GC was enabled explicitly in the config:

      {background_gc_enabled, true},
      {background_gc_target_interval, 60000},

Terraform

A lot of our cluster setups are maintained with Terraform. It gives us the certainty of reproducable infrastructure and instant documentation through infrastructure-as-code.

Open the main.tf to see the building blocks that Terraform will set up.

The Autoscaling Group initially consists of 3 servers. The instance type can be configured in the variables.tf as instance_type. In production we went for c3.xlarge.
The Launch Configuration feeds the userdata.sh and SSH Key to the instances.
An Elastic LoadBalancer sits in front of the cluster to handle SSL termination for the MQTT port 8883.
Two Security Groups are created. One for the ELB that only allows port 8883. Another one for the cluster instances that allows traffic from the ELB and SSH/RabbitMQ Management access from a manually defined IP range.

Once you filled in all necessary variables in the variables.tf you are ready to run Terraform to launch the Autoscaling Group.
After a successful Terraform run the RabbitMQ Management Interface will be available on any node via http://$instance_ip:15672.