版本:Doris Version: 2.1.2
环境:DorisFE 2台 DorisBE 4台
Doris集群版本搭建详细教程:Apache Doris 2.x 版本【保姆级】安装+使用教程_system has no available disk capacity or no availa-CSDN博客
在确认服务器资源都没有问题的情况下,发生下面情况:
一:问题复现:
在2024-04月时,Doris数据库升级到了2.1.2版本。这次版本升级可能引发了一些问题,影响到了动态分区的创建。通常情况下,系统会根据我的设置自动每月创建未来两个月的分区,但到了7月份时,分区未能正常创建,导致数据无法正确写入。简单来说,就是在4月份的时候我的动态分区的表根据我设置的"dynamic_partition.end" = "2"值已经创建好了5、6月的分区,之后我升级到了1.2.1。之后5月6月没有在继续创建动态分区,以至于在当前7月份数据没有成功写入。
二:问题原因:
问题追溯:
此问题涉及到创建分区表时使用的关键配置项 dynamic_partition.start 和 dynamic_partition.history_partition_num。由于去年对 dynamic_partition.start 的设置值与当时时间的间隔过短,这可能导致分区失效。比如我当时设置的dynamic_partition.start = -10,则分区范围在次偏移之前的分区将会被删除,也就是说-10只保存历史10个分区。所以当时发现这个问题后将 start 值调整为 -656521,以尝试解决这一问题。
问题原因:
在 Doris 2.1.2 版本中,由于 "dynamic_partition.start" 的值设定为 "-656521" 过小,这导致了问题。在此版本中,"dynamic_partition.start" 的值不应过小,因为如果当前时间加上这个偏移值超过了1970-01-01,系统可能会出现问题。"dynamic_partition.history_partition_num" 的情况也是类似的。理论上,start 和 history_partition_num 两者功能相似,我们推荐只保留其中一个即可。
由于 start 值过小,导致了动态分区的轮询线程异常终止,从而不再执行其他表的操作。为解决这个问题,只需将这些值调整到合适的范围,避免设置过小,即可恢复正常。
而数据没有写入进去,在DorisFE的fe.warn.log中有详细报错:
2024-07-04 11:13:37,845 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIME]; keys: [3438-04-01 00:00:00]; ..types: [DATETIME]; keys: [2024-10-01 00:00:00]; ), db: gxy_history, table: gxy_job
2024-07-04 11:13:37,857 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIME]; keys: [3438-04-01 00:00:00]; ..types: [DATETIME]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: person_educations_tmp
2024-07-04 11:13:37,864 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIME]; keys: [3438-04-01 00:00:00]; ..types: [DATETIME]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: persons_20240701
2024-07-04 11:13:37,866 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIME]; keys: [3438-04-01 00:00:00]; ..types: [DATETIME]; keys: [2024-10-01 00:00:00]; ), db: gxy_history, table: gxy_students
2024-07-04 11:13:37,874 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIMEV2]; keys: [3438-04-01 00:00:00]; ..types: [DATETIMEV2]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: person_educations_20240703
2024-07-04 11:13:37,883 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIMEV2]; keys: [3438-04-01 00:00:00]; ..types: [DATETIMEV2]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: persons_demo
2024-07-04 11:13:37,889 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIMEV2]; keys: [3438-04-01 00:00:00]; ..types: [DATETIMEV2]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: persons_20240628
2024-07-04 11:13:37,896 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIMEV2]; keys: [3438-04-01 00:00:00]; ..types: [DATETIMEV2]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: person_jobs
2024-07-04 11:13:37,934 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIME]; keys: [3438-04-01 00:00:00]; ..types: [DATETIME]; keys: [2024-10-01 00:00:00]; ), db: gxy_history, table: gxy_students_old
2024-07-04 11:13:37,940 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIMEV2]; keys: [3438-04-01 00:00:00]; ..types: [DATETIMEV2]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: person_jobs_old
2024-07-04 11:13:37,947 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIME]; keys: [3438-04-01 00:00:00]; ..types: [DATETIME]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: person_educations_20240628
2024-07-04 11:13:37,966 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIME]; keys: [3438-04-01 00:00:00]; ..types: [DATETIME]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: enterprise_bak
2024-07-04 11:13:37,980 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIMEV2]; keys: [3438-04-01 00:00:00]; ..types: [DATETIMEV2]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: person_jobs_20240701
2024-07-04 11:13:37,994 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIMEV2]; keys: [3438-04-01 00:00:00]; ..types: [DATETIMEV2]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: person_educations
2024-07-04 11:13:37,998 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIMEV2]; keys: [3438-04-01 00:00:00]; ..types: [DATETIMEV2]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: persons_2
2024-07-04 11:13:38,005 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIME]; keys: [3438-04-01 00:00:00]; ..types: [DATETIME]; keys: [2024-10-01 00:00:00]; ), db: gxy_history, table: gxy_plan_teacher_student
2024-07-04 11:13:38,017 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIMEV2]; keys: [3438-04-01 00:00:00]; ..types: [DATETIMEV2]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: persons_20240703
2024-07-04 11:13:38,019 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIMEV2]; keys: [3438-04-01 00:00:00]; ..types: [DATETIMEV2]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: person_jobs_20240703
2024-07-04 11:13:38,024 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIMEV2]; keys: [3438-04-01 00:00:00]; ..types: [DATETIMEV2]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: persons
2024-07-04 11:13:38,033 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIME]; keys: [3438-04-01 00:00:00]; ..types: [DATETIME]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: persons_candidate
2024-07-04 11:13:38,036 ERROR (DynamicPartitionScheduler|40) [Daemon.run():118] daemon thread got exception. name: DynamicPartitionScheduler
org.apache.doris.nereids.exceptions.AnalysisException: date/datetime literal [+52687-06-01 00:00:00] is invalid
at org.apache.doris.nereids.trees.expressions.literal.DateLiteral.normalize(DateLiteral.java:202) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.nereids.trees.expressions.literal.DateTimeLiteral.determineScale(DateTimeLiteral.java:107) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.nereids.types.DateTimeV2Type.forTypeFromString(DateTimeV2Type.java:90) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.nereids.trees.expressions.literal.DateTimeV2Literal.<init>(DateTimeV2Literal.java:38) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.catalog.PartitionKey.getDateTimeLiteral(PartitionKey.java:121) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.catalog.PartitionKey.createPartitionKey(PartitionKey.java:99) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.clone.DynamicPartitionScheduler.getDropPartitionClause(DynamicPartitionScheduler.java:431) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.clone.DynamicPartitionScheduler.executeDynamicPartition(DynamicPartitionScheduler.java:555) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.clone.DynamicPartitionScheduler.runAfterCatalogReady(DynamicPartitionScheduler.java:641) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.common.util.Daemon.run(Daemon.java:116) ~[doris-fe.jar:1.2-SNAPSHOT]
从日志中可以观察到,由于 "dynamic_partition.start"
值过小,导致动态分区的轮询线程daemon thread got exception. name: DynamicPartitionScheduler出现故障,进而停止执行后续其他表的操作。为了解决这个问题,您需要调整这些配置值,确保它们不要设置得太小,这样可以避免类似的系统异常和操作中断。
可以通过命令看到,这时候我大部分表的dynamic_partition.start值都是-656521。
-- 查看所有表分区状态
SHOW DYNAMIC PARTITION TABLES;
问题本质:
问题本质上是由于一个特定表(我们称其为A表)的start配置-656521,这个错误引起的连锁反应,导致另一个表(B表)的分区创建和删除操作未能成功执行。哪怕B表的start配置不是-656521。根据源码判断,这个问题的处理流程如下:
- 假设A表在B表前被调度;
A表
成功创建新分区;A表
删除历史分区失败;- 影响到
B
表的操作:由于第三步A
在删除历史分区时发生故障,导致系统的调度程序未能继续对后续表(如B
)执行分区创建和删除操作。 - B的分区操作被跳过:由于第三步
A
在删除历史分区时发生故障,导致B
表本应执行的创建新分区和删除旧分区操作均被系统跳过。
问题修复:
通过命令将Doris下所有库下start值过小的表全部重新设置"dynamic_partition.start"值
ALTER TABLE table_name SET ("dynamic_partition.start" = "-240");
查看结果:
修复完成之后,可以看到没有start值过小的数据了。
查看表结构,也已经自动创建了未来两个月的分区。成功修复了该问题。