When Kafka is stopped via its init.d script, it sometimes fails.
When the user calls "/etc/init.d/kafka stop", start-stop-daemon sends SIGTERM to Kafka. Kafka then performs a graceful shutdown which may take some time. If the shutdown was not successful within 5 seconds, start-stop-daemon errors out, claiming that the process refused to stop. The service status is "crashed", although Kafka was shut down properly.
The fact that the service "crashes" is just an annoyance, but things get worse when configuration management tools like Ansible are involved.
I suggest that the init.d script explicitly implements the stop() command and retries for a longer time period. The init script of fail2ban was adjusted in a similar fashion due to the same behavior:
Any other suggestions how to approach this?
Steps to Reproduce:
1. /etc/init.d/kafka stop
* Stopping kafka ...
* start-stop-daemon: 1 process refused to stop
* Failed to stop kafka
* ERROR: kafka failed to stop
Service stops successfully