In the last decade, organisations have become reliant on multiple systems and applications to fulfil their business needs. To work effectively, these systems and applications must be able to communicate with each other in a secure and efficient way. Messaging frameworks have become a critical part of the big data stack for these data-driven organisations, although it is difficult to choose which platform will suit their needs.
There are currently three types of messaging frameworks:
Messaging Queue Frameworks – The traditional message queue paradigm, which is to be used only when there is a fixed end-to-end messaging system to support it.
Distributed Messaging Pub-Sub Frameworks – Publish–subscribe is a sibling of the message queue paradigm. This pattern provides greater network scalability and a more dynamic network topology, with a resulting decreased flexibility to modify the publisher and the structure of the published data.
Distributed Stream Processing Frameworks – Stream processing frameworks are runtime libraries which help developers write code to process streaming data, without dealing with lower level streaming mechanics.
In this blog we give an in-depth overview of these three types of messaging frameworks and a comparison of the specific platforms available in today’s market.
Messaging Queue Frameworks
Active MQ / RabbitMQ / ZeroMQ / RocketMQ
These are earlier traditional message brokers with more emphasis on queuing rather than streaming.
They are built over point to point messaging models.
These are recommended only when there is a fixed end to end communication system.
Distributed Messaging Pub-Sub Frameworks
Apache Kafka
Apache Kafka is more mature and stable distributed and scalable publish-subscribe data streaming platform with simple producer-consumer, distributed broker, message topics, append only logs and distributed partitions modal.
Apache Pulsar
Similarly to Kafka, Apache Pulsar is also an open-source distributed and scalable pub-sub messaging system - originally created at Yahoo and now part of the Apache Software Foundation.
Distributed Stream Processing Frameworks
Apache Samza
Apache Samza is a distributed and scalable real time stream processing framework. Samza allows you to build stateful applications that process data in real-time from multiple sources including Apache Kafka.
Apache Flink
Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale.
Apache Spark
Apache Spark is a unified analytics engine for large-scale data processing. It achieves high performance for batch and streaming data engine, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine.
Apache Storm
Apache Storm is an open source distributed real time computation system. Apache Storm makes it easy to reliably process unbounded streams of data, doing for real time processing what Hadoop did for batch processing.
Distributed Messaging Broker platform (Kafka) is actively evolved in the market as a nervous connection network for any data platforms or any type of data engines.
If you would like to find out how to bring best practice in your Kafka deployment and optimise the performance and scalability of your Kafka clusters, then give us a call on +44 (0)203 475 7980 or email us at Salesforce@coforge.com