Tuesday 31 January 2017

Announcing Zipkin Collector for Azure EventHub

If you are reading this, you have probably heard of Zipkin. If not, please take my word to leave this post to spend 10 minutes reading up on it - a very worthwhile 10 minutes which will introduce to you one of the best, yet simplest distributed tracing systems. It one word, it tells you where the time to serve requests been most spent helping you to optimise your Microservice architecture.

Zipkin, used by the likes of Twitter and Netflix, has already created a complete storm in the Java/JVM ecosystem, but many of us in the .NET community have not heard of it - and that is frankly a real pity. And if you have heard it and want to use it, yes of course we can try to port the whole system over to .NET but that would be a huge amount of work and frankly a waste since Zipkin is designed to work across different stacks as long as you can somehow get your data over to it. The data is normally pushed to Kafka, and Zipkin consume messages from Kafka by a component called Collector. Data then gets stored in a storage (currently available for MySQL, Cassandra or Elasticsearch) and then served by the UI.

Of course nothing stops you to run Kafka in your cloud or on-premise environment, but if you have never done it, to say the least, ZooKeeper (a consensus required for running Kafka) is not the easiest service to operate. And frankly if you are on Azure it makes a lot of sense to use EventHub, an Azure PaaS service with functionality very similar to Kafka. Sadly there were no collector for it.

I have been very keen to bring Zipkin to ASOS, but could not quite justify running ZK and Kafka, even for myself. Hence I felt something has to be done about it. The only problem: had never done a Java/Maven project before.

*     *     *

I have been doing what I have been doing - being a professional developer - for some time now. And I have had my ups and downs, both moments that I am proud of and moments of embarrassment because I have messed up. But never, have I just picked up a complete different stack, and built something like what I am going to share, within a couple of weeks. [Yeah I am talking about Zipkin Collector for Azure EventHub]



This really has been a testament to how pluggable and nicely designed-Zipkin is, and above all it has a truly amazing community - championed by Adrian Cole. Help was always around the corner, be it on hardcore stuff such as how to modularise collector or my noob problems with Maven.

Not to forget too, that Azure EventHub SDK basically made it completely trivial to implement a working solution. All the heavy lifting has been done by the EventProcessorHost so all is left is a bit of plumbing to get the configuration over to these components.

*     *     *

How to use EventHub Collector

So the idea is that you would run zipkin-server (which hosts the Zipkin UI) and in the same process you run your collector. Zipkin uses Spring Boot's auto configuration mechanism to load the right collector based on the configurations provided. The project is host on github. [UPDATE: Project has moved to OpenZipkin organisation here]

EventHub Collector gets triggered by the existence of "zipkin.collector.eventhub.eventHubConnectionString" configuration via command line. Rest of the configurations necessary can be passed by an application.properties or application.yaml file.

So to run the EventHub collector you need:

1- zipkin.jar (zipkin-server)
2- application.properties file
3- zipkin-collector-eventhub-autoconfig module jar (which contains transitive dependencies too). This jar is not on maven yet

So in order to run:

1- Clone the source and build

mkdir zipkin-collector-eventhub
cd zipkin-collector-eventhub
git clone git@github.com:aliostad/zipkin-collector-eventhub.git
mvn package

If you do not have maven, get maven here.

2- Unpackage MODULE jar into an empty folder

copy zipkin-collector-eventhub-autoconfig-x.x.x-SNAPSHOT-module.jar (that has been package in the target folder) into an empty folder and unpackage

jar xf zipkin-collector-eventhub-autoconfig-0.1.0-SNAPSHOT-module.jar

You may then delete the jar itself.

3- Download zipkin-server jar


Download the latest zipkin-server jar (which is named zipkin.jar) from here. For more information visit zipkin-server homepage.

4- create an application.properties file for configuration next to the zipkin.jar file

Populate the configuration - make sure the resources (Azure Storage, EventHub, etc) exist. Only storageConnectionString is mandatory the rest are optional and must be used only to override the defaults:

zipkin.collector.eventhub.storageConnectionString=<azure storage connection string>
zipkin.collector.eventhub.eventHubName=<name of the eventhub, default is zipkin>
zipkin.collector.eventhub.consumerGroupName=<name of the consumer group, default is $Default>
zipkin.collector.eventhub.storageContainerName=<name of the storage container, default is zipkin>
zipkin.collector.eventhub.processorHostName=<name of the processor host, default is a randomly generated GUID>
zipkin.collector.eventhub.storageBlobPrefix=<the path within container where blobs are created for partition lease, processorHostName>


5- Run the server along with the collector

Assuming zipkin.jar and application.properties are in the current working directory, run this from the command line (note that the connection string to the eventhub itself is passed in the command line):

java -Dloader.path=/where/jar/was/unpackaged -cp zipkin.jar org.springframework.boot.loader.PropertiesLauncher --spring.config.location=application.properties --zipkin.collector.eventhub.eventHubConnectionString="<eventhub connection string, make sure quoted otherwise won't work>"


After running, spring boot and the rest of the stack gets loaded and then you should be able to see some INFO output from the collector outputting the configuration you have passed.


You should be up and running and can start pushing spans to your EventHub.

Span serialisation guideline

EventHub Collector expects spans serialised as JSON array of spans. The payload gets read as a UTF-8 string and gets deserialised by the zipkin-server components.

Roadmap

Next step is to get the jar on to maven central. Also I will start working on a .NET library to make building spans easier.