参考资料
-
https://aws.amazon.com/cn/blogs/china/microservices-on-amazon-ecs-1/
-
https://aws.amazon.com/cn/blogs/china/microservices-on-amazon-ecs-2/
-
https://zhuanlan.zhihu.com/p/355383555
-
https://docs.amazonaws.cn/en_us/AmazonECS/latest/developerguide/service-auto-scaling.html
ecs在两个层面上支持自动扩缩,即实例层面和服务层面
- 关于实例层面的扩缩容,是通过为ecs集群设置capacity provider实现的,本质上其实是一个ec2 autoscaling 组,通过设置cw告警和事件,实现集群实例的扩缩容。这篇文档对原理解释的很清楚
- 关于服务层面的扩缩容,是通过application autoscaling 服务完成的,该服务可以与很多其他的aws服务集成,但是控制台无法查看此信息,可以通过describe-scalable-targets api查看
ecs支持通过autoscaling来自动扩缩服务中所需的任务数量,在服务创建过程或者更新过程中进行设置,界面如下,中国区缺少一些选项。
-
只能使用平均值发布cw指标
-
支持的自动伸缩策略有限,推荐 Target tracking 策略
-
用于自动扩缩的角色为
AWSServiceRoleForApplicationAutoScaling _ ECSService
-
自动扩缩的起点并非设置的desired count,而是实际的任务数量
-
如果所需数量超过最大数,则首先调整实际数量到最大数,再按照缩放规则缩放。所需数量小于最小值时同理
-
支持从0扩展
-
在服务部署期间会停止缩容,但是扩容不会停止,除非主动暂停
自动扩容
为应用程序施压
$ webbench -c 100 -t 600 http://172.31.20.73/
由于cpu使用率超过50的目标值,导致cw触发告警并执行扩容操作
2023-02-27 09:18:47 State update Alarm updated from In alarm to OK.
2023-02-27 09:06:47 Action Successfully executed action arn:aws-cn:autoscaling:cn-north-1:xxxxxxxx:scalingPolicy:20be5a77-1b23-494c-a9d5-1d21aec6ed61:resource/ecs/service/workfargate/nginx-svc:policyName/testnginx-autoscale:createdBy/63002f92-e64f-484d-a9f5-1035324563ba
扩容后,service事件显示如下启动信息
Message: Successfully set desired count to 3. Change successfully fulfilled by ecs. Cause: monitor alarm TargetTracking-service/workfargate/nginx-svc-AlarmHigh-75d51541-0ab8-4db4-9606-6b204c846c6e in state ALARM triggered policy testnginx-autoscale
由于最大数量为5,因此扩容最大只能到5无法超过
自动缩容
由于cpu使用率过低,导致cw触发告警并执行缩减操作
2023-02-27 09:06:47 Action Successfully executed action arn:aws-cn:autoscaling:cn-north-1:xxxxx:scalingPolicy:20be5a77-1b23-494c-a9d5-1d21aec6ed61:resource/ecs/service/workfargate/nginx-svc:policyName/testnginx-autoscale:createdBy/63002f92-e64f-484d-a9f5-1035324563ba
2023-02-27 09:06:47 State update Alarm updated from Insufficient data to In alarm.
缩减后,停止pod的last status中会显示如下信息
STOPPED (Scaling activity initiated by (deployment ecs-svc/3120824563093745023))
在service的事件中存在以下记录
Message: Successfully set desired count to 1. Change successfully fulfilled by ecs. Cause: monitor alarm TargetTracking-service/workfargate/nginx-svc-AlarmLow-99c588d3-d004-42f6-a10d-67d20c48a4c1 in state ALARM triggered policy testnginx-autoscale
由于min数量为1,因此最小缩容到1个任务
缩减到0的事件
Message: Successfully set desired count to 0. Change successfully fulfilled by ecs. Cause: monitor alarm TargetTracking-service/workfargate/nginx-svc-AlarmLow-99c588d3-d004-42f6-a10d-67d20c48a4c1 in state ALARM triggered policy testnginx-autoscale
其他线索
由于服务数量是由application触发的扩缩容操作,通过cloudtrail查看触发的具体api操作
{
"eventVersion": "1.08",
"userIdentity": {
"type": "AssumedRole",
"principalId": "AROAQRIBWRJKAGUKQLRHZ:AutoScaling-UpdateDesiredCapacity",
"arn": "arn:aws-cn:sts::xxxxxxx:assumed-role/AWSServiceRoleForApplicationAutoScaling_ECSService/AutoScaling-UpdateDesiredCapacity",
"invokedBy": "ecs.application-autoscaling.amazonaws.com"
},
"eventSource": "ecs.amazonaws.com",
"eventName": "UpdateService",
"awsRegion": "cn-north-1",
"sourceIPAddress": "ecs.application-autoscaling.amazonaws.com",
"userAgent": "ecs.application-autoscaling.amazonaws.com",
"requestParameters": {
"cluster": "workfargate",
"desiredCount": 3,
"forceNewDeployment": false,
"service": "nginx-svc"
}
}
从以上trail记录可以看到,对于任务数量的更新调用的仍旧是update service
,并且是由application-autoscaling
触发的。