-
Notifications
You must be signed in to change notification settings - Fork 919
Open
Labels
bugReporting an unexpected or problematic behavior of the codebaseReporting an unexpected or problematic behavior of the codebasecomponent:librdkafkaFor issues tied to the librdkfka elementsFor issues tied to the librdkfka elementsinvestigate furtherIt's unclear what the issue is at this time but there is enough interest to look into itIt's unclear what the issue is at this time but there is enough interest to look into it
Description
Description
Issue with termination of kafka consumer. When consumer has being cunsuming messages for over a number of hours, it will fail to fully terminate and hang.
consumer.close() is issued, and the process starts, but fails to fully complete, and hangs forever. Even signal.alarm does not terminate the script.
Python script running in Alpine container.
librdkafka version 2.2.0
python:3.9-alpine
## python script
consumer_config = {'bootstrap.servers': brokers,'group.id': group_id,'auto.offset.reset': 'latest', 'sasl.username': user, \
'sasl.password': password, 'security.protocol': secprotocol ,'sasl.mechanism': saslmech, \
'ssl.ca.location': caCertificate, 'debug': 'generic, consumer, cgrp', 'internal.termination.signal': 50}
logging.info('Connect to kafka brokers {} With group-id {}'.format(brokers, group_id))
consumer = Consumer(consumer_config)
logging.info('Subscribe to topics: {}'.format(topics))
consumer.subscribe([topics])
#start of main script
logging.info('Start polling for messages')
while run:
try:
# read single message at a time
msg = consumer.poll(timeout=1.0)
if msg is None:
continue
if msg.error():
raise KafkaException(msg.error())
else: #process message
logging.info('MSG:: %% %s [%d] at offset %d with key %s' %
(msg.topic(), msg.partition(), msg.offset(),
str(msg.key())))
logging.debug('Received kafka message: {}'.format(str(msg.value())[:300]))
if msg.value() is None: # Check for Null message
logging.info('Null message Ignore {}'.format(str(msg)))
else: # Process message
logging.debug('Received message: {}'.format(str(msg.value().decode('utf-8'))[:400]))
telemetry_msg = msg.value()
processMessage(telemetry_msg,configuration,rulesJSON,devices_databases)
if not run: # Break with SIGHUP
logging.critical('Received SIGHUP breaking from kafka messaging loop')
break
except Exception as e:
logging.critical('Error in main loop: ' + str(e))
run = False # Break running loop
## Final part of python script with termination
def timeout_handler(signum, frame):
logging.critical("Received SIGALRM")
raise Exception
signal.signal(signal.SIGALRM, timeout_handler) #signal timeout call for function
signal.alarm(20) #20 seconds to terminate program
logging.critical('Stopping all writebatch objects')
for deviceName in devices_databases:
devices_databases[deviceName].stop()
logging.critical('Close kafka consumer')
consumer.close()
logging.critical('Program terminated for restart')
``
### Final logs when script starts to terminate. But never completes
%7|1698828670.537|TERMINATE|rdkafka#consumer-1| [thrd:app]: Sending TERMINATE to internal main thread
%7|1698828670.537|TERMINATE|rdkafka#consumer-1| [thrd:app]: Sending thread kill signal 50
%7|1698828670.537|TERMINATE|rdkafka#consumer-1| [thrd:app]: Joining internal main thread
%7|1698828670.537|BROADCAST|rdkafka#consumer-1| [thrd:GroupCoordinator]: Broadcasting state change
%7|1698828670.537|TERMINATE|rdkafka#consumer-1| [thrd:main]: Internal main thread terminating
%7|1698828670.537|DESTROY|rdkafka#consumer-1| [thrd:main]: Destroy internal
%7|1698828670.537|BROADCAST|rdkafka#consumer-1| [thrd:main]: Broadcasting state change
%7|1698828670.537|DESTROY|rdkafka#consumer-1| [thrd:main]: Removing all topics
%7|1698828670.538|TERMINATE|rdkafka#consumer-1| [thrd:main]: Purging reply queue
%7|1698828670.538|BROADCAST|rdkafka#consumer-1| [thrd:GroupCoordinator]: Broadcasting state change
%7|1698828670.538|BROADCAST|rdkafka#consumer-1| [thrd:sasl_plaintext://10.2.80.139:9094/1]: Broadcasting state change
%7|1698828670.538|BROADCAST|rdkafka#consumer-1| [thrd:sasl_plaintext://10.2.80.139:9094/1]: Broadcasting state change
%7|1698828670.538|TERMINATE|rdkafka#consumer-1| [thrd:main]: Decommissioning internal broker
%7|1698828670.538|BROADCAST|rdkafka#consumer-1| [thrd:sasl_plaintext://10.2.80.138:9094/0]: Broadcasting state change
%7|1698828670.538|BROADCAST|rdkafka#consumer-1| [thrd:sasl_plaintext://10.2.80.138:9094/0]: Broadcasting state change
%7|1698828670.538|BROADCAST|rdkafka#consumer-1| [thrd:sasl_plaintext://10.2.80.138:9094/0]: Broadcasting state change
%7|1698828670.538|TERMINATE|rdkafka#consumer-1| [thrd:main]: Join 6 broker thread(s)
%7|1698828670.538|BROADCAST|rdkafka#consumer-1| [thrd:sasl_plaintext://10.2.80.140:9094/2]: Broadcasting state change
%7|1698828670.538|BROADCAST|rdkafka#consumer-1| [thrd::0/internal]: Broadcasting state change
%7|1698828670.538|BROADCAST|rdkafka#consumer-1| [thrd:sasl_plaintext://10.2.80.140:9094/2]: Broadcasting state change
%7|1698828670.540|BROADCAST|rdkafka#consumer-1| [thrd:sasl_plaintext://10.2.80.136:9094/bootstrap]: Broadcasting state change
How to reproduce
================
Checklist
=========
Please provide the following information:
- [ ] confluent-kafka-python and librdkafka version (`confluent_kafka.version()` and `confluent_kafka.libversion()`):
- [ ] Apache Kafka broker version:
- [ ] Client configuration: `{...}`
- [ ] Operating system:
- [ ] Provide client logs (with `'debug': '..'` as necessary)
- [ ] Provide broker log excerpts
- [ ] Critical issue
johng and Heldroe
Metadata
Metadata
Assignees
Labels
bugReporting an unexpected or problematic behavior of the codebaseReporting an unexpected or problematic behavior of the codebasecomponent:librdkafkaFor issues tied to the librdkfka elementsFor issues tied to the librdkfka elementsinvestigate furtherIt's unclear what the issue is at this time but there is enough interest to look into itIt's unclear what the issue is at this time but there is enough interest to look into it