参考资料
- Getting started with AWS App Mesh and Amazon EC2
之前的文章中我们已经介绍了aws的服务网格场频appmesh,并且在eks环境中进行了部署和简单功能的测试。由于eks环境较为复杂,本文在ec2环境下手动配置appmesh网格环境
需求:
-
两个服务A和B注册到mesh的名称空间
-
A通过80端口和B通信
-
更新B版本为Bv2,A发送75%流量到B,25%到Bv2
-
不更改业务代码或通过服务发现重新注册
先决条件:
- mesh只是对网络的抽象,需要实际的服务进行支持
- 实际服务命名为 serviceA、serviceB 和 serviceBv2,可以在名为 apps.local 的命名空间中发现所有服务
- 要使ec2实例加入网格需要在实例上运行envoy代理
- 实例本身需要和appmesh服务通信的权限
配置appmesh
官方文档的创建顺序非常不符合逻辑,重新整理了下,最终的结构如下图所示
创建网格
创建mesh,默认拒绝网格外部流量,默认ip配置
Proxies will not forward traffic to external services that are not defined in the mesh.
Default IP version preference
- Envoy’s DNS resolver will prefer IPv6 and fall back to IPv4
- The IPv4 address returned by Cloud Map will be used if available and fall back to using the IPv6 address
- The endpoint created for the local app will use an IPv4 address
- Envoy will bind to all IPv4 addresses
$ aws appmesh create-mesh --mesh-name apps
创建后端虚拟服务B
创建virtual service
$ aws appmesh create-virtual-service --mesh-name apps --virtual-service-name serviceb.apps.local --spec {}
创建virtualnode,dns名称为serviceB.apps.local
,监听http80端口,没有配置后端服务
$ aws appmesh create-virtual-node --cli-input-json file://create-virtual-node-serviceb.json
// create-virtual-node-serviceb.json
{
"meshName": "apps",
"spec": {
"listeners": [
{
"portMapping": {
"port" : 8000.
"protocol": "http"
}
}
],
"serviceDiscovery": {
"dns": {
"hostname": "serviceB.apps.local"
}
}
},
"virtualNodeName": "serviceB"
}
创建virtual router
$ aws appmesh create-virtual-router --cli-input-json file://create-virtual-router.json
// create-virtual-router.json
{
"meshName": "apps",
"spec": {
"listeners": [
{
"portMapping": {
"port" : 8000.
"protocol": "http"
}
}
]
},
"virtualRouterName": "serviceB"
}
在serviceB virtual router上创建virtual route
$ aws appmesh create-route --cli-input-json file://create-route.json
// create-route.json
{
"meshName" : "apps",
"routeName" : "serviceB",
"spec" : {
"httpRoute" : {
"action" : {
"weightedTargets" : [
{
"virtualNode" : "serviceB",
"weight" : 100
}
]
},
"match" : {
"prefix" : "/"
}
}
},
"virtualRouterName" : "serviceB"
}
更新virtual service B,将virtual router绑定在虚拟服务B上
$ aws appmesh update-virtual-service --cli-input-json file://update-virtual-service.json
// update-virtual-service.json
{
"meshName" : "apps",
"spec" : {
"provider" : {
"virtualRouter" : {
"virtualRouterName" : "serviceB"
}
}
},
"virtualServiceName" : "serviceb.apps.local"
}
绑定完毕后虚拟服务控制台的信息,虚拟服务指向了虚拟路由
创建前端虚拟服务A
同理,创建serviceA的virtual node,将后端指定为virtual service b
$ aws appmesh create-virtual-node --cli-input-json file://create-virtual-node-servicea.json
// create-virtual-node-servicea.json
{
"meshName" : "apps",
"spec" : {
"backends" : [
{
"virtualService" : {
"virtualServiceName" : "serviceb.apps.local"
}
}
],
"listeners" : [
{
"portMapping" : {
"port" : 8000,
"protocol" : "http"
}
}
],
"serviceDiscovery" : {
"dns" : {
"hostname" : "servicea.apps.local"
}
}
},
"virtualNodeName" : "serviceA"
}
创建virtual service A
$ aws appmesh create-virtual-service --cli-input-json file://create-virtual-servicea.json
// create-virtual-servicea.json
{
"meshName" : "apps",
"spec" : {
"provider" : {
"virtualNode" : {
"virtualNodeName" : "serviceA"
}
}
},
"virtualServiceName" : "servicea.apps.local"
}
控制台上虚拟服务A指向了虚拟节点
虚拟节点A的后端是虚拟服务B
配置ec2实例
我们的目的是使用网格接管实例的网络流量,需要对ec2实例进行配置
配置权限
实例启动时配置的权限如下,我们可以配置arn:aws-cn:iam::aws:policy/AWSAppMeshEnvoyAccess
和arn:aws-cn:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
托管策略
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"appmesh:StreamAggregatedResources"
],
"Resource": "*"
}
]
}
安装docker和awscli工具
启动envoy容器
仅支持 v1.9.0.0-prod 或更高版本与 App Mesh 一起使用
查询ecr公共容器,https://gallery.ecr.aws/appmesh/aws-appmesh-envoy
sudo docker run --rm --env APPMESH_RESOURCE_ARN=mesh/apps/virtualNode/serviceB -u 1337 --network host public.ecr.aws/appmesh/aws-appmesh-envoy:v1.23.1.0-prod
配置iptables将app的流量路由到envoy代理,使用脚本
https://docs.amazonaws.cn/app-mesh/latest/userguide/getting-started-ec2.html#update-services
可以更改 APPMESH_IGNORE_UID
的值,但该值必须与您在上一步中指定的值相同
配置代理接管应用流量
从我们配置eks的经验可知,需要执行一些配置让envoy和应用程序挂钩,在eks中是通过sidecar注入完成的
官方文档提供了脚本执行这个逻辑
https://docs.aws.amazon.com/zh_cn/app-mesh/latest/userguide/getting-started-ec2.html#update-services
- 确保
APPMESH_IGNORE_UID
和之前启动envoy的id一致 - 应用程序运行在8000端口
- envoy的进出端口分别为15000和15001
- 忽略了获取ec2元数据的ip接管
- 实际上是通过修改iptables规则完成流控
我们将应用程序的端口修改为8000,启动一个简单的web服务
注意,开启代理脚本后无法再访问docker仓库拉取镜像,需要暂时disable代理
docker run -d -p 8000:80 public.ecr.aws/nginx/nginx:1-alpine-perl
# docker run -d -p 8000:80 public.ecr.aws/amazonlinux/amazonlinux:2023 sleep infinity
启动脚本,选择enable
sudo ./envoy-networking.sh
新增的iptables规则,对于tcp8000端口的访问直接转发到envoy的15000端口
:APPMESH_EGRESS - [0:0]
:APPMESH_INGRESS - [0:0]
:DOCKER - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A PREROUTING -p tcp -m addrtype ! --src-type LOCAL -j APPMESH_INGRESS
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT -p tcp -m addrtype ! --dst-type LOCAL -j APPMESH_EGRESS
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A APPMESH_EGRESS -m owner --uid-owner 1337 -j RETURN
-A APPMESH_EGRESS -p tcp -m multiport --dports 22 -j RETURN
-A APPMESH_EGRESS -d 169.254.169.254/32 -p tcp -j RETURN
-A APPMESH_EGRESS -d 169.254.170.2/32 -p tcp -j RETURN
-A APPMESH_EGRESS -p tcp -j REDIRECT --to-ports 15001
-A APPMESH_INGRESS -p tcp -m multiport --dports 8000 -j REDIRECT --to-ports 15000
-A DOCKER -i docker0 -j RETURN
如果quit的时候出现错误,则需要手动执行清除逻辑,删除iptables规则
开启脚本后,envoy代理将接管流量,尝试请求nginx无响应
$ curl -v 127.0.0.1:8000
* Trying 127.0.0.1:8000...
* Connected to 127.0.0.1 (127.0.0.1) port 8000 (#0)
> GET / HTTP/1.1
> Host: 127.0.0.1:8000
> User-Agent: curl/7.87.0
> Accept: */*
>
* Empty reply from server
* Closing connection 0
curl: (52) Empty reply from server
由于网络模式为host模式,无法同时启动两个
[2023-02-25 10:48:16.058][17][debug][init] [source/common/init/watcher_impl.cc:31] init manager Server destroyed
unable to bind domain socket with base_id=0, id=0, errno=98 (see --base-id option)
[2023-02-25 10:48:16.086][1][warning] [AppNet Agent] [Envoy process 17] Exited with code [1]
[2023-02-25 10:48:16.086][1][warning] [AppNet Agent] [Envoy process 17] Additional Exit data: [Core Dump: false][Normal Exit: true][Process Signalled: false]
我们将这个nginx作为后端服务B,按照同样的流程启动实例并创建服务A
sudo docker run --rm --env APPMESH_RESOURCE_ARN=mesh/apps/virtualNode/serviceA -u 1338 --network host public.ecr.aws/appmesh/aws-appmesh-envoy:v1.23.1.0-prod
由于需要通过service名称访问后端,我们使用cloudmap创建私有域(感觉eks也是这么完成的服务解析,从sa的权限给了cloudmap,但是实际却没有记录,这一点也很奇怪)
cloudmap创建的r53只能通过cloudmap管理
在实例上解析serviceb.apps.local
$ nslookup serviceb.apps.local
Server: 172.31.0.2
Address: 172.31.0.2#53
Non-authoritative answer:
Name: serviceb.apps.local
Address: 172.31.21.190
从servicea的容器,请求serviceb,注意:这里脚本中监听的是8000端口,由于中国80端口备案原因,需要将之前所有mesh中的端口都修改为8000。内网的话可以修改脚本的端口为80而不是mesh中的配置。
端口不对会被reset
curl -Iv serviceb.apps.local:8000
* Trying 172.31.21.190:8000...
* Connected to serviceb.apps.local (172.31.21.190) port 8000 (#0)
> HEAD / HTTP/1.1
> Host: serviceb.apps.local:8000
> User-Agent: curl/7.87.0
> Accept: */*
>
* Recv failure: Connection reset by peer
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer
修改之后请求,出现了503,但是server已经变为envoy
curl -I serviceb.apps.local:8000
HTTP/1.1 503 Service Unavailable
content-length: 95
content-type: text/plain
date: Sat, 25 Feb 2023 12:36:17 GMT
server: envoy
奇怪的是当把servcea的iptables规则删除后,即servicea节点脱离代理后访问可达
$ curl -I serviceb.apps.local:8000
HTTP/1.1 200 OK
date: Sat, 25 Feb 2023 12:45:14 GMT
content-length: 11
content-type: text/plain; charset=utf-8
x-envoy-upstream-service-time: 0
server: envoy
并且随便找一台机器都可以,这和直觉不符,因为网络流量应该默认被限制在网格内。。。卡到这里了,暂时跳过
配置appmesh切换流量
创建后端虚拟服务Bv2
创建serviceBv2的vtitual node
$ aws appmesh create-virtual-node --cli-input-json file://create-virtual-node-servicebv2.json
// create-virtual-node-servicebv2.json
{
"meshName": "apps",
"spec": {
"listeners": [
{
"portMapping": {
"port" : 8000.
"protocol": "http"
}
}
],
"serviceDiscovery": {
"dns": {
"hostname": "serviceBv2.apps.local"
}
}
},
"virtualNodeName": "serviceBv2"
}
更新路由规则
$ aws appmesh update-route --cli-input-json file://update-route.json
// update-route.json
{
"meshName" : "apps",
"routeName" : "serviceB",
"spec" : {
"http2Route" : {
"action" : {
"weightedTargets" : [
{
"virtualNode" : "serviceB",
"weight" : 75
},
{
"virtualNode" : "serviceBv2",
"weight" : 25
}
]
},
"match" : {
"prefix" : "/"
}
}
},
"virtualRouterName" : "serviceB"
}
在cloudmap中的serciceb中注册新实例
启动实例配置envoy代理和应用
sudo docker run --rm --env APPMESH_RESOURCE_ARN=mesh/apps/virtualNode/serviceBv2 -u 1337 --network host public.ecr.aws/appmesh/aws-appmesh-envoy:v1.23.1.0-prod
docker run -d -p 8000:80 public.ecr.aws/nginx/nginx:1-alpine-perl
总结一下,还是在k8s中更实用,ec2环境几乎没有此需求
排查线索
当在控制台修改mesh的相关配置时,该配置会分发到envoy中,出现类似以下日志
[2023-02-25 12:28:52.381][17][info][upstream] [source/common/upstream/cds_api_helper.cc:35] cds: add 2 cluster(s), remove 1 cluster(s)
[2023-02-25 12:28:52.410][17][info][upstream] [source/common/upstream/cds_api_helper.cc:72] cds: added/updated 1 cluster(s), skipped 1 unmodified cluster(s)
[2023-02-25 12:28:52.419][17][info][upstream] [source/server/lds_api.cc:82] lds: add/update listener 'lds_ingress_0.0.0.0_15000'