-
Notifications
You must be signed in to change notification settings - Fork 920
Add assignment and watermark offsets APIs #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -508,6 +589,22 @@ static PyMethodDef Consumer_methods[] = { | |||
" :raises: KafkaException\n" | |||
"\n" | |||
}, | |||
{ "get_watermark_offsets", (PyCFunction)Consumer_get_watermark_offsets, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assignment and watermark offsets seem quite different re: the PR. The assignment seems pretty uncontroversial -- we know what we want exposed here.
Is the watermark the same? If we're consistent everywhere, that's fine, just asking because I think this is something we want to be consistent about across languages and I'm not sure we've fully specified the semantics of. Even the cached versions seem like they may need clarification (e.g., what is the cache duration?). The high/low offsets seem clearer, but I think in the Java client there may be some desire to make any request for offsets a bit more on-demand compared to the high offset provided with each fetch request.
/cc @hachikuji since I think we may one day (hopefully in the near future!) want to provide this for the Java clients as KafkaBasedLog
and similar classes could greatly benefit from a real LEO API...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed assignment() to test get_watermark_offsets() :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re watermark consistency:
I briefly discussed such an API with @hachikuji about six months ago, but we didnt really decide on anything.
Prior to librdkafka 0.9.0 I added two APIs, query_watermark_offsets(ask broker) and get_watermark_offsets(cached) which the Python client abstracts in one call.
In that very limited sense it makes the API consistent.
But I agree with you that a more generic API that allows querying for any offset (logical, time based, absolute) makes more sense.
re caching: the main use of the cached API is to allow an application to check the currently known high watermark (and possibly somewhat outdated low watermark if stats is enabled) in its fast path with low cost. Typical use is an updated consumer lag (even though the stats already exposes that..).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, for consistency I even just meant consistency across all the clients we're building based on librdkafka, but agreed that there could be others.
I'm fine with exposing it, just wanted to make sure we're thinking about compatibility and designing good, user friendly APIs along the way.
This is a very desirable feature. But it seems not merged yet. Maybe it could become available in the next release then? |
@ewencp with the dust settled, should we merge this as-is or rename the function get_watermark_offsets? |
@edenhill I'm fine to merge it as is (or close to as is, needs rebase and I think there's still that comment about tuple instead of list) as long as we stay consistent with our other clients that wrap librdkafka. |
4944479
to
8ea2fd4
Compare
* Add Seek API to consumer. Using rd_kafka_seek implemented in the rdkafka library we just expose this on the consumer interface. * Update Seek description
For issue #31