Brujo has been a programmer since the age of 10 and he has more than 15 years in the industry. He was a VB, .NET, Java and Haskell programmer until he found Erlang 8 years ago. He’s now Inaka's CTO and Erlang Solutions’ Tech Lead and Trainer. He’s an active member of the open-source community and his blog (Erlang Battleground on Medium) was the most active Erlang blog of 2016.
24 October 2016
Iterative Performance Benchmarking of Apache Kafka – Part 2
In Part 1 of this series of posts on the subject of performance benchmarking Apache Kafka for the purpose of hotspot analysis the profiling of the codebase was confined solely to the server side element. In this follow-up a similar benchmark script and approach is used but instead of adaptively instrumenting and profiling the broker we now turn attention to the producer of the benchmark load – the client-side Java library. Both the server (broker) and producer scripts look up the
KAFKA_OPTS environment variable so to change from server side to client side profiling all that is needed is for the environment variable to be set differently – in another order. Everything else stays pretty much the same. The script still loops over a benchmark a number of times and with each iteration it redefines the coverage of the instrumentation.
After 10 iterations of the benchmark with hotspot thresholds of 5 and 1 microseconds, for inclusive and exclusive times respectively, the instrumentation set is reduced from 1000s of methods to less than 10.
Increasing (10x) the benchmark parameters
-num-records from 10000000 to 100000000 and
--throughput from 100000 to 1000000 results in a slightly different performance model following a complete re-execution of the benchmark iterations. Strangely enough we’ve lost both the poll methods in this run. They were eliminated during the second iteration. This can happen when probes, methods, are on the borderline of a threshold or the execution cost has multi-phase characteristics. There are sophisticated ways to deal with such cases but that’s for another post.
The above hotspot list is a bit on the small side. A more typical list size falls in the range of 25-50 probes. Please note that the smaller the list the greater the over statement of the exclusive (inherent/self) time is as both direct and indirect callee method invocations are removed from the instrumentation set and measurement collection with such timing moving upwards back through the caller chain. The reality is very simple you can’t have all (coverage) the cake (instrument) and eat (measure) it all (collection). You, more specifically the machine never the man, need to be very selective. In doing so you benefit as the final list of hotspots revealed are those points at which it is most suitable to inject some form flow control or an actual code change. Conversely the smaller the list the more accurate the inclusive time is.
In the metering table columns “(I)” stands for inherent, not inclusive. Inherent (self) is the time that’s exclusive to the probe, at least in terms of what was instrumented and measured.
Lets rerun the benchmark again this time with an inclusive hotspot threshold set to 2. The exclusive threshold tends to be much bigger than the inclusive threshold in order to prune away leafs nodes in the call tree so to expose the most suitable points for code changes which is not necessarily the actual bottleneck point. An example would be a loop making a relative inexpensive call N number of times. This can be seen in the table below with the
KafkaProducer.doSend now appearing as a hotspot.
The list of hotspots can be increased by reducing further the exclusive hotspot threshold from 2 microseconds to 1, making both exclusive and inclusive thresholds the same. Normally this is not recommended when doing an ad hoc once off profile of a system as it takes a much longer time for the adaptive metering strategy to filter out instrumentation and measurement noise within the Java runtime. But with the iterative refinement of the code coverage across multiple benchmarks there is no need to be overly concerned for any possible overhead residue caused by the instrumentation and measurement of non-hotspots though that is not to say some of the timings will not be overstated slightly which is why it is far better to focus on relative rankings and not actual numbers. Start at the top and work down through the list of hotspot probes by inspecting the source code for each in turn.
Looking over the source code for the hotspots identified by Satoris there are a number of common cost concerns: synchronization, allocation and logging.
APPENDIX A – HOTSPOT CODE LISTINGS
org.apache.kafka.clients.producer.internals.RecordAccumulator.append #synchronized #logging #allocation #cas
org.apache.kafka.clients.producer.KafkaProducer.doSend #waiting #logging #allocation
org.apache.kafka.clients.producer.internals.RecordBatch.done #logging #looping #allocation #readrepeat
org.apache.kafka.clients.producer.internals.RecordAccumulator.abortExpiredBatches #allocation #looping #synchronized #logging
org.apache.kafka.clients.producer.internals.RecordAccumulator.ready #allocation #looping #synchronized
org.apache.kafka.common.requests.RequestSend.serialize #allocation #nio
APPENDIX B – PERFORMANCE MODEL REFINEMENT
The following visualizations are useful for those curious to how the performance model is refined over the 10 iterations in the last benchmark run via the online and offline adaptive mechanism and at what point in the process does the instrumentation and measurement remain relative constant.
25 Jul 2016
A video exploring the potential of fast simulated...
08 Aug 2016
Gradle did come to stay with us. Although...
22 Aug 2016
Software Testing is not for Attention...
19 Sep 2016
New video from #droidconpl2015 is out!...
19 Sep 2016
Last night I attended a software testing...
21 Oct 2016
A summary of my visit from SystemD Conference...
24 Oct 2016
Up until yesterday, I had only gone...
24 Oct 2016
Updating sources (versions, revisions, tags)...
14 Nov 2016
Elixir is a joy to work with, an easy...
16 Nov 2016
A few weeks ago I attended Mobiconf, one of...
20 Jan 2017
When you can’t bang! This story...
26 Jan 2017
I have been using Ruby/Rails for 8 years and...
07 Feb 2017
I am an old Java man, I never allocated many of...
07 Feb 2017
Earlier today, Dorothy Graham presented...
07 Feb 2017
I have seen great posts about Elixir release...
22 Feb 2017
I first time I came across the term of "dark...
22 Feb 2017
You probably know...
22 Feb 2017
I am quite happy to share that I have...
06 Mar 2017
Who will find this interesting If you’re...
07 Mar 2017
A peer2peer and/or crowdfunding, blockchain...