Azure Load Balancer health probes and the four way handshake

It's always the fun little things that cause you pain.

We've got an Azure Load Balancer running over a RabbitMQ cluster with a health probe set to check port 5672 every 60 seconds.

The RabbitMQ logs were filling up "handshake_timeout" errors every 60 seconds. Very odd.

Time for a packet capture where we find the following

1. Load balancer SYN
2. RabbitMQ ACK
3. Load Balancer ACK
4. 10 seconds later RabbitMQ RST
5. Another 50 seconds later Load Balancer FIN


Azure load balancer documentation declares that it does a four way handshake to terminate a probe. What it fails to tell you is that the FIN isn't sent until the start of the next probe.

This leads to RabbitMQ sitting there waiting for data, not getting any in its default 10 second handshake period, terminating the connection and logging it as an error.

So the horrible workaround/compromise was to set the handshake_timeout config in RabbitMQ to 30 seconds and the load balancer interval to 25 seconds.

Why on…

Setup a Logstash server on Ubuntu

Pretty self explanatory and mainly for my own benefit, but easier to follow than the Elastic documentation.

How to setup certificates in Apache

Brief notes on setting up certificates in Apache. More a personal note than a blog post :)

sudo mkdir /etc/apache2/ssl
sudo mkdir /etc/apache2/ssl/private
sudo chmod 755 /etc/apache2/ssl
sudo chmod 710 /etc/apache2/ssl/private
sudo chown -R root:root /etc/apache2/ssl/
sudo chown -R root:ssl-cert /etc/apache2/ssl/private/

Copy cert to /etc/apache2/ssl
Copy key to /etc/apache2/ssl/private

sudo chmod 644 /etc/apache2/ssl/*.crt

sudo -s
sudo chmod 640 /etc/apache2/ssl/private/*.key

sudo a2enmod ssl

Edit config files

sudo nano /etc/apache/sites-available/000-default.conf

DocumentRoot /var/www/html2
SSLEngine on
SSLCertificateFile /path/to/your_domain_name.crt
SSLCertificateKeyFile /path/to/your_private.key
SSLCertificateChainFile /path/to/DigiCertCA.crt
sudo a2ensite 000-default apachectl configtest
sudo systemctl restart apache2.service

Filebeat Windows MSI

Elastic don't provide an MSI for Filebeat which makes deploying via DSC, Puppet, etc a bit of a pain. To ease this, I created an MSI generator for it and here it is on Github. Enjoy! Filebeat MSI generator

Automating a RabbitMQ Cluster Deployment on Windows with Powershell DSC

This one has been annoying me for a while, but I finally got to the bottom of it!
The solution is really very simple, but with not a lot of love for Rabbit on Windows and nothing I could find on using DSC, it took longer than it really should have.

Hope this helps someone

SCUP 2011 and File Digest errors

This one has been bugging be for quite some time...

Sometimes, when you've published an update from SCUP to WSUS as metadata only, it fails to publish the full content later and gives an error of "incorrect file digest".
What's going on here?
This is where the publisher has created an updated binary for the same release, but not updated the Origin File Digest in the metadata.
When SCUP imports a update, it stores a SHA-1 Base64 hash of the file which is held in the SCUP database. When you go to publish "full content", it compares the hash of the downloaded file with that in the database. When the don't match, you get the file digest error.
Ultimately, this is a mistake on the part of the vendor, but they're pretty inconsistent at fixing them (Dell being a particular offender).

So how do we fix it to get that critical update out?

Download the binary directly from the vendorGet the SHA-1 Base64 encoded of the file (I used a cop…

Repairing non-booting Windows 2012 R2 and others

If you're stuck in a "Preparing Automatic Repair" boot loop that always takes you back to the blue screen of unhelpful menu options:

Disable Automatic repair

Get to the recovery command prompt

bcdedit /enum

Get the name of the Windows entry (likely to be {default}

bcdedit /set {default} recoveryenabled No


Now instead of getting into the loop of a failing repair it'll show you the real problem that it's failing to fix. This is likely to be a corrupt file. In my case it was c:\windows\system32\drivers\cng.sys which I copied from a working server that was at the same patch level.