There is a lot of buzz about Kafka these days among developers, and we get asked how Solace and Kafka compare often enough that we wanted to document the difference between the two, specifically SolOS 7.1.1 and Kafka 0.9.0.
Kafka was originally created by LinkedIn as a way to get logs of their site’s user activity into Hadoop in real-time. They wanted real-time ingestion, but the messaging tools that existed at the time (e.g. ActiveMQ, RabbitMQ, Qpid) couldn’t handle the volumes they were dealing with. They built a great tool in Kafka, and open sourced it to the Apache Foundation in 2011.
Kafka has become popular as a way to address big data use cases that require massive scale and near real-time delivery. Many developers have come to love Kafka and believe it’s appropriate for any application with those requirements. But Kafka’s designers made some fundamental (and perfectly logical) design decisions that optimize for log aggregation – tradeoffs that restrict its applicability to other use cases.
Using Kafka in those use cases forces developers to code message handling into their applications, build proxies or APIs, or add tooling around Kafka. Given all the time and energy it takes to create and support all that extraneous code, they’d usually be better off choosing a technology that better delivers on the requirements at hand.
Kafka’s Key Disadvantages and Drawbacks
This comparison focuses on Kafka relative to the kinds of use cases we are most frequently asked about, i.e. applications that require an Event Driven Architecture (EDA), micro-services, enterprise messaging capabilities and industry-specific use cases like post-trade distribution in capital markets, eCommerce and payment event distribution. For these use cases, the key drawbacks to Kafka are:
- APIs for Integration: Apache Kafka only supports a proprietary Java API, with other APIs created independently (and inconsistently) outside the realm of the Apache project, making it hard or prohibitively expensive to integrate diverse applications. Solace supports lots of APIs.
- Topic Filtering: Kafka only supports exact match subscriptions on flat topics, which makes it impossible to move data in flexible ways to a variety of consuming applications. Solace supports topic hierarchies, filters and wildcard subscriptions.
- Messaging Features: Kafka lacks critical messaging features like request/reply, non-persistence QoS, point-to point queues, replication for DR and simple inter-datacenter routing. Solace supports all of these features and more.
- Distributed Architecture: Kafka’s distributed architecture requires several components for configuration and state coordination (Zookeeper) and inter-cluster communication (MirrorMaker) in addition to the Kafka brokers. Solace integrated these features into a turnkey broker without external dependencies.
To overcome these shortcomings you’ll need to add code, tooling and complexity to your solution for the things around Kafka that you can do yourself. As features are added to Kafka itself, they tend to negate some of the original advantages of Kafka in the first place. For example, using TLS to consumers decreases Kafka’s performance by 90%.
If you just want to cut to the chase and see a feature comparison summary table, click here.
Kafka was designed for high performance log aggregation where publishers can push logs to Kafka brokers and consumers can “tail” subsets of these log files to consume them. The programmatic interface presented is a publish/subscribe model where in the “publish” you tag a log entry (or message) with a topic and consumers receive the logs they want by subscribing to a topic, which causes them to tail the log files associated with that topic. So for applications that want a high performance pub/sub infrastructure with few other messaging features, Kafka is a viable alternative.
However, most distributed applications need more than basic pub/sub capability. In some cases, features can be added to Kafka, but the design decisions made to optimize Kafka for log aggregation mean that many of these features are either very difficult or prohibitively expensive to add (given Kafka’s distributed log store) or implementing them would destroy the performance that makes Kafka attractive in the first place.
We designed Solace messaging to provide a single messaging fabric for all your application messaging needs, not for one specific purpose. We designed it with three guiding principles in mind:
- Provide the features application developers need so they can focus on their application tasks rather than on the plumbing and give them a rich set of APIs so any application can easily participate in messaging.
- Use networking concepts and technologies to provide a high performance, robust, scalable real time data movement fabric.
- Deliver messaging in an integrated simple-to-use-and-deploy form factor with rich management visibility. We integrate features like high availability, disaster recovery, multi-site distribution, WAN optimization, DB backup/restore, simple upgrade and downgrade procedures and rich monitoring of application health in a turnkey broker without the need for external components and proxies.
The aspects of messaging that are compared are below.
|The Solace architecture utilizes a sophisticated message router with simple APIs to provide comprehensive messaging and routing features while ensuring application simplicity. The Solace message router implements open wireline protocols and integrated resiliency, robustness, security and management features all without any other external components. These features are implemented in hardware in Solace appliances for maximum performance and scale, and in the cloud native software VMR for total deployment flexibility.||Kafka is architected as a simple broker with a sophisticated API. That means the broker provides limited functionality from a messaging feature point of view – it just replays logs as they were received, leaving the complexity of messaging handling up to API libraries and applications themselves. The simplicity of the Kafka broker also necessitates the use of components such as Zookeeper and MirrorMaker, which adds to deployment and engineering complexity.|
|Solace provides for fine-grained hierarchical topics with wildcard subscriptions as well as filtering within a topic. Messages can be richly annotated so consumers can filter and receive only what they want according to varying criteria and all messages are delivered in publish order regardless of topic. Solace also supports queue-based addressing.||Kafka topics are coarse, stateful constructs and only exact topic match subscriptions are supported. Kafka does not support wildcard topic matching, filtering of messages within a topic or queue-based addressing. These typical applications provide examples of the need for fine grained message routing.|
|Solace supports rich messaging features for enterprise applications, price distribution, trading and gaming platforms, micro services, EDAs and many others. From standard features like request/reply, non-persistent messaging, load balanced delivery, XA and session-based transactions and message TTL to sophisticated rate limiting and congestion control features built and proven over many years.||Kafka’s simple datapath is optimized for high performance sequential log storage and playback in a basic publish/subscribe manner. It does not support messaging features like request/reply, queues, non-persistent messaging or many others expected by most distributed systems.|
|The Solace message router routes and enqueues messages to consumers, tracking message delivery state for each consumer, delivering these messages with the transport the consumer requested (Solace wireline, MQTT, REST, WebSockets). Congestion can be managed, messages can be rate controlled, queue depths can be monitored and threshold alerts can be emitted.||The Kafka broker uses the sendfile() system call to maximize performance of message delivery to consumers, but this only works for cases where the published message can be delivered completely unchanged to the consumer – otherwise, Kafka’s performance degrades significantly. For example, sendfile() cannot be used with TLS so the maximum message rate for TLS consumers decreases by 90%. The Kafka broker does not manage consumer state – this increases the complexity of client APIs, prevents implementation of certain messaging features and results in several operational shortcomings.|
|Solace enables messaging architectures that enable consistent multi-protocol client authentication and authorization security across the enterprise with deep integration into enterprise authentication services in a minimal set of components.||Kafka implements an industry standard SASL interface for simple authentication integration and an ability to implement a distributed set of authorization policies.|
|Both Solace and Kafka provide a standard programmatic interface to monitor their systems, Solace provides a deeper view into the behavior of the overall messaging system. Solace upgrades have been a simpler task with minimal service disruption whereas Kafka often requires application recode as their heavy APIs evolved.||Kafka was built as a point solution to solve log injection into big data solutions. As such there was no requirement to provide a comprehensive set of monitoring and management tools as a lot of this functionality would fall under the control of the big-data management platform (e.g. Ambari, Cloudera Manager, etc.). This is generally a good idea as redundant functionality is not required. But, as you move to more use cases the message broker needs to be able to provide enough detailed monitoring functionality to be able to do basic troubleshooting on system wide problems.|
Feature Comparison Matrix
|Publish/Subscribe (topic based) Messaging|
|Hierarchical, extensible topics|
|Filtering within a topic (e.g. Selectors)|
|Queue-based, point-to-point messaging|
|Filtering within a queue (e.g. Selectors)|
|Many software APIs, open wirelines, web streaming, 3rd party product integrations|
|Synchronous message consumption|
|Asynch message consumption (to support Reactor/dispatcher application paradigm)|
|Selectively acknowledge messages in any order|
|Replication for disaster recovery|
|Message Time To Live (TTL)|
|Dead message queue|
|“sticky routing” to consumer group|
|Load balanced message delivery to consumer group (shared queue)|
|Exclusive queue (for stateful HA consumers)|
|Static inter-broker routing|
|Dynamic inter-broker routing|
Real-Time Delivery Controls
|Consumer rate controlled delivery (eg. conflation)|
|Consumer congestion control|
|Consistent security architecture for all APIs|
|Integrated with Enterprise security infrastructure|
|Roles based management accounts|
|Programmatic management interface|
|Easy in-service upgrades|