c#


Kafka consumer groups and partitions


I'm having trouble to grasp the relationship behind partitions and customer groups.
The ideas by themselves are pretty clears, each message that's pushed to a topic gets replicated to all of it's partitions, right?
That way, if two different clients connect to two different partitions of the same topic then they should consume and commit the same messages without interrupting each other.
Consumer groups, as i understand, are abstractions to the idea of partitions and they essentially promise the same thing, two different clients that connect to two different consumer groups of the same topic should consume and commit the same messages without interrupting each other.
So as i see it, it should follow that two clients that connect to the same consumer group would consume messages from the same partition and two clients that connect to two different consumer groups would consume from two different partitions (given that there are at least two partitions for that topic) because otherwise the idea of consumer groups doesn't comply with the idea of partitions.
However, when i run a simple consumer client in C#
string group = Console.ReadLine();
var config = new Dictionary<string, object>()
{
{ "group.id", group },
{ "bootstrap.servers", "10.0.0.3:9092" },
{ "enable.auto.commit", true },
{ "auto.commit.interval.ms", 1000 }
};
using (var consumer = new Consumer<Null, string>(config, null, new StringDeserializer(Encoding.UTF8)))
{
consumer.Subscribe(new List<string>() { { "myFirstTopic" } });
while (true)
{
Message<Null, string> msg;
if (!consumer.Consume(out msg, TimeSpan.FromMilliseconds(100)))
{
continue;
}
Console.WriteLine($"Topic: {msg.Topic} Partition: {msg.Partition} Offset: {msg.Offset} {msg.Value}");
}
}
I get this result:
The same consumer group consumes from 2 different partitions.
When i run two clients that consume from different consumer groups (a and b) i get this:
Two different consumer groups consume from similar partitions.
I don't understand how it happens, doesn't it mean that the idea of consumer groups and the idea of partitions contradict one another?
If the same message appears in two different consumer groups under the same partition, doesn't it mean that the same message was inserted twice to the same partition?
Please help me understand.
Your understanding of consumer groups is correct, but the details in partitions need a bit of clarification.
The ideas by themselves are pretty clears, each message that's pushed to a topic gets replicated to all of it's partitions, right?
Not exactly. A message will be written to a single partition (and its replicas). All of the messages written to the topic will be split between the topic's partitions. Thus, each partition will only contain a subset of all the messages written to the topic.
Note that replicas are just a way to ensure availability for your data in Kafka cluster in case a Kafka node goes down. It does not affect the message processing semantics.
So as i see it, it should follow that two clients that connect to the same consumer group would consume messages from the same partition...
Kafka will only allow one client to consume from a partition at a time. Therefore, none of the clients in the same consumer group will consume data from the same partition. However, they can consume from more than one partition at a time. Also, if you have more clients than partitions in a single group, some of the clients will not get any data at all because there is no partition for them to consume data from.
Since a partition only has a subset of the data and it's only assigned to a single client at a time, each client will consume a unique subset of the data written to the topic. Thus, you could say that the multi-partition with single consumer group arrangement works similar to the worker pattern.
Partitions in Kafka drive the parallelization factor for your message processing. The more partitions your topic has, the more clients you can have working in parallel.
...and two clients that connect to two different consumer groups would consume from two different partitions (given that there are at least two partitions for that topic) because otherwise the idea of consumer groups doesn't comply with the idea of partitions.
If you have clients in different consumer groups, they can consume from the same partitions. Therefore, all of the consumer groups will receive the same set of data. Multiple consumer groups arrangement is similar to the fan-out pattern.
Kafka guarantees order of messages, right? How does it work with multiple partitions for the same topic? In-fact, i've seen for myself that it's not always true, is it true only for a single partition?
Your observations are correct. Message ordering can only be guaranteed per partition. Luckily, messages with the same key will end up in the same partition, so you can guarantee ordering by key.
For example, let's assume that you have a topic for all forum post comments. If you only care about ordering of comments within a single forum post, you can select the forum post identifier as the message key for all the comments.
I read that when i commit an offset, it is committed as part of the partition and not the consumer group, so if i commit an offset in one group, will it affect the offset of another if it pulls from the same partition?
The offsets are stored per partition AND consumer group, i.e. a consumer group can have its own offset for a partition. This way the offsets will not overlap between groups.

Related Links

Safely add collection in TPL
How to Open a newly Created Excel WOrkBook as Active Workbook
how to set private structure fields in a public property?
Arg assigning using for-switch
delegates events and null reference
IUserRepository, where is the implementation code?
Deserialize large chunk of json data with JSON.net
How to get a partial view Html in the controller inside a folder?
Setting datagrid cell background colour wpf
Visual C# Access item from other class file
Data annotation issue in EF
Fibonnachi sequence in c#
When casting of base value types is neccessary in C#
Windows Service Unable to get correct System Culture
Nullable DateTime List to Non Nullable DateTime List
What if the catch statement code causes an error?

Categories

HOME
c#
kendo-ui-angular2
leaflet
boxplot
pandoc
clip-path
doctrine2
olap
kairosdb
comma
loader
react-jsx
coccinelle
ecmascript-2017
cloudformation
setup-project
sqldependency
csproj
overflow
oracle-nosql
dom4j
cache-control
linkurious
aspose.pdf
directadmin
binary-decision-diagram
amazon-rds-aurora
python-venv
serverless-architecture
kendo-scheduler
selection-sort
multi-targeting
was
lightspeed
recursive-query
fileopendialog
cedar-bdd
racket-student-languages
mavlink
scikit-image
strtol
git-push
boost-propertytree
jcalendar
log4cxx
borland-c++
euro
waterline
perfect-scrollbar
service-broker
nexusdb
optimistic-locking
crystal-reports-7
xcode5
drools-guvnor
random-access
parsefloat
nsmutableurlrequest
ant-contrib
enquire.js
facebook-likebox
svg-android
ia-32
forward-declaration
jython-2.5
kademlia
uikit-dynamics
latex-suite
qmainwindow
simplemembership
architectural-patterns
screensharing
configurationsection
orchestration
filedialog
dancer
advanceddatagrid
pinchzoom
idtabs
parallel-python
backlight
sipdroid
cpack
versions
web-search
data-formats
moores-law

Resources

Encrypt Message