Kafka 使用完全写入的存储代理

Kafka brokers down with fully written storages

Kafka 使用完全写入的存储进行代理

我已尝试生成代理可以处理的尽可能多的消息。使用完全写入的存储 (8GB) 代理全部停止并且他们无法再次出现此错误

代理尝试重启的日志

[2020-04-28 04:34:05,774] INFO [ThrottledChannelReaper-Request]: Stopped (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2020-04-28 04:34:05,774] INFO [ThrottledChannelReaper-Request]: Shutdown completed (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2020-04-28 04:34:05,842] INFO [KafkaServer id=1] shut down completed (kafka.server.KafkaServer)
[2020-04-28 04:34:05,844] INFO Shutting down SupportedServerStartable (io.confluent.support.metrics.SupportedServerStartable)
[2020-04-28 04:33:58,847] INFO [ThrottledChannelReaper-Produce]: Stopped (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2020-04-28 04:34:05,844] INFO Closing BaseMetricsReporter (io.confluent.support.metrics.BaseMetricsReporter)
[2020-04-28 04:34:05,844] INFO Waiting for metrics thread to exit (io.confluent.support.metrics.SupportedServerStartable)
[2020-04-28 04:34:05,844] INFO Shutting down KafkaServer (io.confluent.support.metrics.SupportedServerStartable)
[2020-04-28 04:34:05,845] INFO [KafkaServer id=1] shutting down (kafka.server.KafkaServer)
[2020-04-28 04:33:58,847] INFO [ThrottledChannelReaper-Request]: Shutting down (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2020-04-28 04:33:59,847] INFO [ThrottledChannelReaper-Request]: Shutdown completed (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2020-04-28 04:33:59,847] INFO [ThrottledChannelReaper-Request]: Stopped (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2020-04-28 04:33:59,854] INFO [KafkaServer id=0] shut down completed (kafka.server.KafkaServer)
[2020-04-28 04:33:59,942] INFO Shutting down SupportedServerStartable (io.confluent.support.metrics.SupportedServerStartable)
[2020-04-28 04:33:59,942] INFO Closing BaseMetricsReporter (io.confluent.support.metrics.BaseMetricsReporter)
[2020-04-28 04:33:59,942] INFO Waiting for metrics thread to exit (io.confluent.support.metrics.SupportedServerStartable)
[2020-04-28 04:33:59,942] INFO Shutting down KafkaServer (io.confluent.support.metrics.SupportedServerStartable)
[2020-04-28 04:33:59,942] INFO [KafkaServer id=0] shutting down (kafka.server.KafkaServer)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access1(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
[2020-04-28 04:37:18,938] INFO [ReplicaAlterLogDirsManager on broker 2] Removed fetcher for partitions Set(__consumer_offsets-22, kafka-connect-offset-15, kafka-connect-offset-16, kafka-connect-offset-2, __consumer_offsets-30, kafka-connect-offset-7, kafka-connect-offset-13, __consumer_offsets-8, __consumer_offsets-21, kafka-connect-offset-8, kafka-connect-status-2, kafka-connect-offset-5, __consumer_offsets-4, __consumer_offsets-27, __consumer_offsets-7, __consumer_offsets-9, __consumer_offsets-46, kafka-connect-offset-19, __consumer_offsets-25, __consumer_offsets-35, __consumer_offsets-41, __consumer_offsets-33, __consumer_offsets-23, __consumer_offsets-49, kafka-connect-offset-20, kafka-connect-offset-3, __consumer_offsets-47, __consumer_offsets-16, __consumer_offsets-28, kafka-connect-config-0, kafka-connect-offset-9, kafka-connect-offset-17, __consumer_offsets-31, __consumer_offsets-36, kafka-connect-status-1, __consumer_offsets-42, __consumer_offsets-3, __consumer_offsets-18, __consumer_offsets-37, __consumer_offsets-15, __consumer_offsets-24, kafka-connect-offset-10, kafka-connect-offset-24, kafka-connect-status-4, __consumer_offsets-38, __consumer_offsets-17, __consumer_offsets-48, kafka-connect-offset-23, kafka-connect-offset-21, kafka-connect-offset-0, __consumer_offsets-19, __consumer_offsets-11, kafka-connect-status-0, __consumer_offsets-13, kafka-connect-offset-18, __consumer_offsets-2, __consumer_offsets-43, __consumer_offsets-6, __consumer_offsets-14, kafka-connect-offset-14, kafka-connect-offset-22, kafka-connect-offset-6, perf-test4-0, kafka-connect-status-3, kafka-connect-offset-11, kafka-connect-offset-12, __consumer_offsets-20, __consumer_offsets-0, kafka-connect-offset-4, __consumer_offsets-44, __consumer_offsets-39, kafka-connect-offset-1, __consumer_offsets-12, __consumer_offsets-45, __consumer_offsets-1, __consumer_offsets-5, __consumer_offsets-26, __consumer_offsets-29, __consumer_offsets-34, __consumer_offsets-10, __consumer_offsets-32, __consumer_offsets-40) (kafka.server.ReplicaAlterLogDirsManager)
[2020-04-28 04:37:18,990] INFO [ReplicaManager broker=2] Broker 2 stopped fetcher for partitions __consumer_offsets-22,kafka-connect-offset-15,kafka-connect-offset-16,kafka-connect-offset-2,__consumer_offsets-30,kafka-connect-offset-7,kafka-connect-offset-13,__consumer_offsets-8,__consumer_offsets-21,kafka-connect-offset-8,kafka-connect-status-2,kafka-connect-offset-5,__consumer_offsets-4,__consumer_offsets-27,__consumer_offsets-7,__consumer_offsets-9,__consumer_offsets-46,kafka-connect-offset-19,__consumer_offsets-25,__consumer_offsets-35,__consumer_offsets-41,__consumer_offsets-33,__consumer_offsets-23,__consumer_offsets-49,kafka-connect-offset-20,kafka-connect-offset-3,__consumer_offsets-47,__consumer_offsets-16,__consumer_offsets-28,kafka-connect-config-0,kafka-connect-offset-9,kafka-connect-offset-17,__consumer_offsets-31,__consumer_offsets-36,kafka-connect-status-1,__consumer_offsets-42,__consumer_offsets-3,__consumer_offsets-18,__consumer_offsets-37,__consumer_offsets-15,__consumer_offsets-24,kafka-connect-offset-10,kafka-connect-offset-24,kafka-connect-status-4,__consumer_offsets-38,__consumer_offsets-17,__consumer_offsets-48,kafka-connect-offset-23,kafka-connect-offset-21,kafka-connect-offset-0,__consumer_offsets-19,__consumer_offsets-11,kafka-connect-status-0,__consumer_offsets-13,kafka-connect-offset-18,__consumer_offsets-2,__consumer_offsets-43,__consumer_offsets-6,__consumer_offsets-14,kafka-connect-offset-14,kafka-connect-offset-22,kafka-connect-offset-6,perf-test4-0,kafka-connect-status-3,kafka-connect-offset-11,kafka-connect-offset-12,__consumer_offsets-20,__consumer_offsets-0,kafka-connect-offset-4,__consumer_offsets-44,__consumer_offsets-39,kafka-connect-offset-1,__consumer_offsets-12,__consumer_offsets-45,__consumer_offsets-1,__consumer_offsets-5,__consumer_offsets-26,__consumer_offsets-29,__consumer_offsets-34,__consumer_offsets-10,__consumer_offsets-32,__consumer_offsets-40 and stopped moving logs for partitions  because they are in the failed log directory /opt/kafka/data/logs. (kafka.server.ReplicaManager)
[2020-04-28 04:37:18,990] INFO Stopping serving logs in dir /opt/kafka/data/logs (kafka.log.LogManager)
[2020-04-28 04:37:18,992] WARN [Producer clientId=producer-1] 1 partitions have leader brokers without a matching listener, including [__confluent.support.metrics-0] (org.apache.kafka.clients.NetworkClient)
[2020-04-28 04:37:18,996] ERROR Shutdown broker because all log dirs in /opt/kafka/data/logs have failed (kafka.log.LogManager)

Prometheus 监控快照

我希望在代理失败之前通过解决方案阻止这种情况,例如删除以前的消息足够 space 之类的。对此有什么推荐的最佳做法吗?

你需要在kafka上配置两个参数:log.retention.hourslog.retention.bytes。这两个参数是“在旋转之前我将存储多少数据以及在旋转之前我将存储数据多少次?如果您想存储更多数据或减少保留时间,请尝试提高这些值。

如果问题出在存储上,您可以查看的另一件事是您的日志。生成的日志可以使用很多space。在这一点上,您需要使用一些旋转技术。查看 logs 文件夹的大小。