作者: bert 原文来源: https://tidb.net/blog/60c87e38
TIDB v7.1 reource control资源管控特性体验贴
1. 使用场景:
定义:TIDB的资源管控 (Resource Control) ,使用资源管控特性,将用户绑定到某个资源组后,TiDB 层会根据用户 所绑定资源组设定的配额对用户的读写请求做流控,TiKV 层会根据配额映射的优先级来对请求做调度。通过 流控和调度这两层控制,可以实现应用的资源隔离,满足服务质量 (QoS) 要求。
个人理解:resource manager在于我的理解,最大的收益就是资源整合,降低数据库的使用成本,尤其对于企业db层被paas平台化后的性价比提升。多租户应用程序,多业务系统资源整合,实现并发访问控制,每个租户配置自己的隔离组,确保资源使用容量不相互干扰,这有助于优化资源利用率并降低成本。除此之外还适用这些场景:读写分离控制;测试开发混部;批处理夜维与工作日资源控制。
2. 本次体验测试计划
本来打算用sysbench测试,感觉tiup的bench更贴合业务场景。关键还是tiup已经提供bench工具了,省事,(●'◡'●)
- 通过 TiUP 部署 TiDB v7.1.0 (略)
- 开启tidb的reource control(默认开启),并在进行资源规划之前,使用 CALIBRATE RESOURCE 估算集群容量。
- 使用默认test数据库、oltp_user及oltp_rg资源组,使用tiup bench tpcc测试方法,生成测试数据,模拟日常的oltp交易场景
- 创建olap数据库、olap_user及olap_rg资源组,使用tiup bench tpch测试方法,生成测试数据,模拟日常的olap统计场景
- 分别启动 tpcc/tpch测试。tpcc测试 20 分钟,tpch测试 20 轮。
- 在dashboard观察资源隔离效果,并简单总结测试。
3. 测试环境介绍
3.1 硬件环境
[root@vlan99 ~]# mpstat -P ALL|head -1 Linux 3.10.0-957.el7.x86_64 (vlan99) 06/09/2023 x86_64 (8 CPU) [root@vlan99 ~]# free -m total used free shared buff/cache available Mem: 15884 12050 290 793 3544 2710
<截图dashboard的主机面板>
3.2 确认reource control开启状态并估算容量
`MySQL [(none)]> show variables like '%tidb_enable_resource_control%'; +------------------------------+-------+ | Variable_name | Value | +------------------------------+-------+ | tidb_enable_resource_control | ON | +------------------------------+-------+ 1 row in set (0.00 sec)
MySQL [(none)]> calibrate resource workload oltp_read_write; +-------+ | QUOTA | +-------+ | 14886 | +-------+ 1 row in set (0.01 sec) `
4. 创建oltp_user及oltp_rg资源组创建,oltp对应的tpcc数据准备。创建olap_user及olap_rg资源组创建,olap对应的tpch数据准备
4.1 测试用户准备
MySQL [(none)]> create resource group if not exists oltp_rg ru_per_sec=5000; Query OK, 0 rows affected (0.15 sec) MySQL [(none)]> create resource group if not exists olap_rg ru_per_sec=2000; Query OK, 0 rows affected (0.15 sec) MySQL [(none)]> create user oltp_user identified by 'oltp_user' resource group oltp_rg; Query OK, 0 rows affected (0.18 sec) MySQL [(none)]> create user olap_user identified by 'olap_user' resource group olap_rg; Query OK, 0 rows affected (0.03 sec) MySQL [(none)]> grant all on . to oltp_user; Query OK, 0 rows affected (0.03 sec) MySQL [(none)]> grant all on . to olap_user; Query OK, 0 rows affected (0.03 sec) MySQL [(none)]> select user,host,user_attributes from mysql.user; +-----------+-----------+-------------------------------+ | user | host | user_attributes | +-----------+-----------+-------------------------------+ | root | % | NULL | | root | localhost | NULL | | oltp_user | % | {"resource_group": "oltp_rg"} | | olap_user | % | {"resource_group": "olap_rg"} | +-----------+-----------+-------------------------------+ 4 rows in set (0.00 sec)
4.2 TPCC测试数据准备
[root@vlan99 ~]# tiup bench tpcc --warehouses 1 --parts 1 prepare tiup is checking updates for component bench ...timeout(2s)! Starting component bench: /root/.tiup/components/bench/v1.12.0/tiup-bench tpcc --warehouses 1 --parts 1 prepare creating table warehouse creating table district creating table customer 略 begin to check warehouse 1 at condition 3.3.2.9 begin to check warehouse 1 at condition 3.3.2.12 Finished
4.3 TPCH测试数据准备
[root@vlan99 ~]# time tiup bench tpch --sf=1 prepare tiup is checking updates for component bench ...timeout(2s)! Starting component bench: /root/.tiup/components/bench/v1.12.0/tiup-bench tpch --sf=1 prepare creating nation creating region 略 generate orders/lineitem tables done Finished
5. 分别启动 tpcc/tpch测试
tpcc测试 20 分钟,tpch测试 20 轮。
5.1 tpcc测试结果
tiup bench tpcc --warehouses 1 --time 20m --user=oltp_user -poltp_user run
TPC-C 使用 tpmC 值 (Transactions per Minute) 来衡量系统最大有效吞吐量 (MQTh, Max Qualified Throughput),其中 Transactions 以 NewOrder Transaction 为准,即最终衡量单位为每分钟处理的新订单数。
[Summary] DELIVERY - Takes(s): 599.2, Count: 2376, TPM: 237.9, Sum(ms): 100035.8, Avg(ms): 42.1, 50th(ms): 41.9, 90th(ms): 50.3, 95th(ms): 52.4, 99th(ms): 62.9, 99.9th(ms): 104.9, Max(ms): 125.8 [Summary] NEW_ORDER - Takes(s): 599.7, Count: 26849, TPM: 2686.3, Sum(ms): 311837.6, Avg(ms): 11.6, 50th(ms): 11.5, 90th(ms): 14.2, 95th(ms): 15.2, 99th(ms): 19.9, 99.9th(ms): 44.0, Max(ms): 369.1 [Summary] ORDER_STATUS - Takes(s): 599.5, Count: 2436, TPM: 243.8, Sum(ms): 11000.2, Avg(ms): 4.5, 50th(ms): 4.7, 90th(ms): 5.8, 95th(ms): 6.8, 99th(ms): 9.4, 99.9th(ms): 29.4, Max(ms): 62.9 [Summary] PAYMENT - Takes(s): 599.9, Count: 25509, TPM: 2551.4, Sum(ms): 152591.7, Avg(ms): 6.0, 50th(ms): 5.8, 90th(ms): 7.3, 95th(ms): 8.4, 99th(ms): 11.5, 99.9th(ms): 32.5, Max(ms): 125.8 [Summary] PAYMENT_ERR - Takes(s): 599.9, Count: 1, TPM: 0.1, Sum(ms): 2.2, Avg(ms): 2.4, 50th(ms): 2.6, 90th(ms): 2.6, 95th(ms): 2.6, 99th(ms): 2.6, 99.9th(ms): 2.6, Max(ms): 2.6 [Summary] STOCK_LEVEL - Takes(s): 599.5, Count: 2343, TPM: 234.5, Sum(ms): 16923.3, Avg(ms): 7.2, 50th(ms): 7.3, 90th(ms): 8.9, 95th(ms): 9.4, 99th(ms): 12.1, 99.9th(ms): 17.8, Max(ms): 21.0 tpmC: 2686.3, tpmTotal: 5953.9, efficiency: 20888.9%
5.2 tpch测试结果
`[root@vlan99 ~]# tiup bench tpch --count=20 --sf=1 --user=olap_user -polap_user --db=olap run
tiup is checking updates for component bench ...timeout(2s)! Starting component bench: /root/.tiup/components/bench/v1.12.0/tiup-bench tpch --count=20 --sf=1 --user=olap_user -polap_user --db=olap run [Current] Q1: 5.27s [Current] Q2: 1.71s 略`
[Summary] Q7: 1.78s [Summary] Q8: 1.71s [Summary] Q9: 3.12s
6. 测试结果
dashboard观察资源隔离效果
6.1 tpcc(oltp用户)测试结果,可以限制到指定的5000RU
6.2 tpch(olap用户)测试结果,可以限制到指定的2000RU
7. 测试结果分析
- 优点:
- v7.1版本的resource control资源管控,将cpu/内存/io的物理资源抽象成RU的感念,简化了使用方式,并且测试中发现该特性可以正常限制住RU,达到资源控制的效果。
- 改进方向:
- 不能实时限制,需要用户重新登陆。
- resource语句整合到createuser中。
- 建议:最好能用百分比控制RU。
- 整合session/sql的并发限制/执行时长/idle time等资源到resource control中。