文档
- Getting started with Amazon Batch on Amazon EKS
- Amazon EKS jobs
- Memory and vCPU considerations for Amazon Batch on Amazon EKS
batch不会管理集群,只是会管理节点(自动扩缩)并运行任务。batch在eks中单独管理自身资源(不会影响其他pod,node和asg),最佳实践是创建一个单独的nsmespace
在batch创建计算环境和eks关联,此时计算环境和eks解耦,用户实际上是在抽象的计算环境中提交任务,任务到pod的转换交给了batch
eks 和 batch job
job是batch的最小单元,eks上的batch job是和pod的映射。当提交jib时,job定义中的eks properties包括了能够在eks上运行的job的参数。
The
podProperties
of a running job havepodName
andnodeName
parameters set for the current job attempt
aws batch describe-jobs --job 2d044787-c663-4ce6-a6fe-f2baf7e51b04
当向eks提交job时,batch会将job转换成pod定义。通过label和taints确保job运行在batch托管的节点上。eks上的job pod定义默认有以下设置
hostNetwork = true
dnsPolicy = ClusterFirstWithHostNet
使用cloudwatch logs监控eks上batch job的运行,https://docs.amazonaws.cn/en_us/batch/latest/userguide/batch-eks-cloudwatch-logs.html
pod中设置了label标识了batch job的jobid
和计算环境的uuid
。通过向pod注入环境变量为job runtime
指明作业信息
kubectl describe pod aws-batch.14638eb9-d218-372d-ba5c-1c9ab9c7f2a1 -n my-aws-batch-namespace
在eks上运行基于gpu的作业,https://docs.amazonaws.cn/en_us/batch/latest/userguide/run-eks-gpu-workload.html
eks上内存和cpu的预留逻辑和GKE有区别,尤其在内存这块。batch的可能会受到预留资源的影响
eks 和 batch 计算环境
配置工具(awscli,kuebctl),配置权限(访问eks),创建集群
注意:batch只支持公共访问的eks集群
在eks创建的资源包括
-
专用名称空间ns
-
clusterrolebinding,batchh监控node和pod
-
role,在ns中创建专用的角色,绑定用户
aws-batch
-
创建
iamidentitymapping
映射到AWSServiceRoleForBatch
(此处有bug尚未解决,这个角色的路径需要删除不饿能直接复制)
注意点:
- EKS 计算环境仅支持
BEST_FIT_PROGRESSIVE
和SPOT_CAPACITY_OPTIMIZED
分配策略 - aws cli工具在
2.8.6
版本之后才支持创建eks计算环境 - 实例role盈盈eks node角色,实例sg对应cluster的安全组
使用命令行出现以下错误,通过--generate-cli-skeleton
生成模板发现确实没有eksConfiguration
配置,检查awscli版本,建议更新到2.8.6
之后
Parameter validation failed:
Unknown parameter in input: "eksConfiguration", must be one of: computeEnvironmentName, type, state, unmanagedvCpus, computeResources, serviceRole, tags
再次执行,同时会创建asg,将网络配置填充到对应的启动模板中
可以在--compute-resources
中配置 ec2Configuration.imageType 选择gpu类型实例
The image type to match with the instance type to select an AMI. The supported values are different for
ECS
andEKS
resources.ECS:ECS_AL2,ECS_AL2_NVIDIA,ECS_AL1,ECS
EKS:EKS,EKS_AL2,EKS_AL2_NVIDIA(例如P4 和 G4)
aws batch create-compute-environment --cli-input-json file://./batch-eks-compute-environment.json
{
"computeEnvironmentName": "My-eks-CE1",
"computeEnvironmentArn": "arn:aws-cn:batch:cn-north-1:xxxxxxxx:compute-environment/My-eks-CE1"
}
aws batch describe-compute-environments
创建计算队列
aws batch create-job-queue --cli-input-json file://./batch-eks-job-queue.json
{
"jobQueueName": "My-eks-JQ1",
"jobQueueArn": "arn:aws-cn:batch:cn-north-1:xxxxxx:job-queue/My-eks-JQ1"
}
创建任务定义,和ecs的任务定义类似,其中有eksProperties
配pod参数,可以对pod的command等参数进行覆盖
aws batch register-job-definition --cli-input-json file://./batch-eks-job-definition.json
{
"jobDefinitionName": "MyJobOnEks_Sleep
"jobDefinitionArn": "arn:aws-cn:batch:cn-north-1:xxxxxxx:job-definition/MyJobOnEks_Sleep:2",
"revision": 2
}
创建简单任务并提交到作业队列,通过以下方式设置job调度
-
设定任务队列的优先级。为任务设定调度优先级
-
调度策略。在创建作业队列时未指定调度策略,作业调度程序默认使用先进先出(FIFO)策略
-
公平调度。使用共享标识标记job,调度器从共享标识的作业中选择使用率最低的作业
aws batch submit-job --job-queue My-eks-JQ1 \
> --job-definition MyJobOnEks_Sleep --job-name My-eks-Job1
{
"jobArn": "arn:aws-cn:batch:cn-north-1:xxxxxxxxxxxx:job/fe10768a-a3b5-4596-93f1-b48083332e73",
"jobName": "My-eks-Job1",
"jobId": "fe10768a-a3b5-4596-93f1-b48083332e73"
}
aws batch describe-jobs --job fe10768a-a3b5-4596-93f1-b48083332e73
控制台查看提交的任务json
此后新的m5.large
实例启动,使用eks优化ami
,配置添加了如下userdate
#!/bin/bash
set -ex
if [ -f /etc/aws-batch/batch.config ]; then
while read line; do
[ $(expr "$line" : "^[A-Za-z_][0-9A-Za-z_]*=.*") -gt 0 ] && eval export $line
done < /etc/aws-batch/batch.config
fi
[ -z "$AWS_BATCH_KUBELET_EXTRA_ARGS" ] && AWS_BATCH_KUBELET_EXTRA_ARGS=""
/etc/eks/bootstrap.sh worklearn \
--kubelet-extra-args ' '"$AWS_BATCH_KUBELET_EXTRA_ARGS"' ... '
节点加入集群失败,出现如下错误Failed to contact API server when waiting for CSINode publishing: Unauthorized
kubelet_node_status.go:70] "Attempting to register node" node="ip-192-168-30-56.cn-north-1.compute.internal"
kubelet.go:2469] "Error getting node" err="node \"ip-192-168-30-56.cn-north-1.compute.internal\" not found"
kubelet_node_status.go:92] "Unable to register node with API server" err="Unauthorized" node="ip-192-168-30-56.cn-north-1.compute.internal"
kubelet.go:2469] "Error getting node" err="node \"ip-192-168-30-56.cn-north-1.compute.internal\" not found"
csi_plugin.go:1063] Failed to contact API server when waiting for CSINode publishing: Unauthorized
最终发现是忘记将node的角色加入eks集群的aws-auth configmap中,加入一下
- groups:
- system:bootstrappers
- system:nodes
rolearn: arn:aws-cn:iam::xxxxxx:role/myEKSNodeRole
这里总结一下batch 节点启动逻辑,batch会在配置的子网中通过dry-run的方式确认实例能够正常启动,随后batch修改ags的启动模板中的desired count数量,将节点启动。在cloudtrail中会看到以下意料中错误
An error occurred (InvalidParameter) when calling the RunInstances operation: Security group sg-0b1e6f21a1a04d078 and subnet subnet-027025e9d9760acdd belong to different networks.
修改之后计算环境启动,并且任务成功执行
node配置如下,节点通过污点排斥其他pod
apiVersion: v1
kind: Node
metadata:
annotations:
alpha.kubernetes.io/provided-node-ip: 192.168.15.116
csi.volume.kubernetes.io/nodeid: '{"efs.csi.aws.com":"i-xxxxxxxx"}'
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
labels:
batch.amazonaws.com/compute-environment-revision: "4"
batch.amazonaws.com/compute-environment-uuid: 6c63cab8-8b00-3021-bb3d-fb990cef9c60
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/instance-type: m5.xlarge
beta.kubernetes.io/os: linux
failure-domain.beta.kubernetes.io/region: cn-north-1
failure-domain.beta.kubernetes.io/zone: cn-north-1a
k8s.io/cloud-provider-aws: f48c3b996b9bce33df562d04d847dfaf
kubernetes.io/arch: amd64
kubernetes.io/hostname: ip-192-168-15-116.cn-north-1.compute.internal
kubernetes.io/os: linux
node.kubernetes.io/instance-type: m5.xlarge
topology.kubernetes.io/region: cn-north-1
topology.kubernetes.io/zone: cn-north-1a
name: ip-192-168-15-116.cn-north-1.compute.internal
resourceVersion: "34308242"
uid: ffc8beb4-f326-4135-a471-e0b1d9511012
spec:
providerID: aws:///cn-north-1a/i-xxxxxxx
taints:
- effect: NoSchedule
key: batch.amazonaws.com/batch-node
- effect: NoExecute
key: batch.amazonaws.com/batch-node
pod配置如下,label指明了计算环境和任务id,任务pod上配置了污点容忍和batch环境变量。默认网络模式为hostNetwork=true
和dnsPolicy=ClusterFirstWithHostNet
任务制定完毕pod立刻被清除,可以配置cloudwatch agent收集日志(使用fluentbit组件需要增加污点容忍配置)
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/psp: eks.privileged
labels:
batch.amazonaws.com/compute-environment-uuid: 6c63cab8-8b00-3021-bb3d-fb990cef9c60
batch.amazonaws.com/job-id: a5b695fc-8847-4a32-bfb5-99c6cf66c1df
batch.amazonaws.com/node-uid: ffc8beb4-f326-4135-a471-e0b1d9511012
name: aws-batch.b08aaab0-59e6-39b7-ada4-bbae690412b2
namespace: my-aws-batch-namespace
spec:
containers:
- command:
- sleep
- "60"
env:
- name: AWS_BATCH_JOB_KUBERNETES_NODE_UID
value: ffc8beb4-f326-4135-a471-e0b1d9511012
- name: AWS_BATCH_JOB_ID
value: a5b695fc-8847-4a32-bfb5-99c6cf66c1df
- name: AWS_BATCH_JQ_NAME
value: My-eks-JQ1
- name: AWS_BATCH_JOB_ATTEMPT
value: "1"
- name: AWS_BATCH_CE_NAME
value: My-eks-CE1
image: public.ecr.aws/amazonlinux/amazonlinux:2
imagePullPolicy: IfNotPresent
name: default
resources:
limits:
cpu: "1"
memory: 1Gi
requests:
cpu: "1"
memory: 1Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-xddn2
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
hostNetwork: true
nodeName: ip-192-168-15-116.cn-north-1.compute.internal
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: batch.amazonaws.com/batch-node
operator: Exists
- effect: NoExecute
key: batch.amazonaws.com/batch-node
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: kube-api-access-xddn2
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace