在亚马逊云科技上安全、合规地创建AI大模型训练基础设施并开发AI应用服务

news2024/9/23 19:14:31

项目简介:

小李哥将继续每天介绍一个基于亚马逊云科技AWS云计算平台的全球前沿AI技术解决方案,帮助大家快速了解国际上最热门的云计算平台亚马逊云科技AWS AI最佳实践,并应用到自己的日常工作里。

本次介绍的是如何在亚马逊云科技利用Service Catalog服务创建和管理包含AI大模型的应用产品,并通过权限管理基于员工的身份职责限制所能访问的云资源,并创建SageMaker机器学习托管服务并在该服务上训练和部署大模型,通过VPC endpoint节点私密、安全的加载模型文件和模型容器镜像。本架构设计全部采用了云原生Serverless架构,提供可扩展和安全的AI解决方案。本方案的解决方案架构图如下:

方案所需基础知识 

什么是 Amazon SageMaker?

Amazon SageMaker 是亚马逊云科技提供的一站式机器学习服务,旨在帮助开发者和数据科学家轻松构建、训练和部署机器学习模型。SageMaker 提供了从数据准备、模型训练到模型部署的全流程工具,使用户能够高效地在云端实现机器学习项目。

什么是亚马逊云科技 Service Catalog?

亚马逊云科技 Service Catalog 是一项服务,旨在帮助企业创建、管理和分发经过批准的云服务集合。通过 Service Catalog,企业可以集中管理已批准的资源和配置,确保开发团队在使用云服务时遵循组织的最佳实践和合规要求。用户可以从预定义的产品目录中选择所需的服务,简化了资源部署的过程,并减少了因配置错误导致的风险。

利用 SageMaker 构建 AI 服务的安全合规好处

符合企业合规性要求

使用 SageMaker 构建 AI 服务时,可以通过 Service Catalog 预先定义和管理符合公司合规标准的配置模板,确保所有的 AI 模型和资源部署都遵循组织的安全政策和行业法规,如 GDPR 或 HIPAA。

数据安全性

SageMaker 提供了端到端的数据加密选项,包括在数据存储和传输中的加密,确保敏感数据在整个 AI 模型生命周期中的安全性。同时可以利用VPC endpoint节点,私密安全的访问S3中的数据,加载ECR镜像库中保存的AI模型镜像容器。

访问控制和监控

通过与亚马逊云科技的身份和访问管理(IAM)集成,可以细粒度地控制谁可以访问和操作 SageMaker 中的资源。再结合 CloudTrail 和 CloudWatch 等监控工具,企业可以实时跟踪和审计所有的操作,确保透明度和安全性。

本方案包括的内容

1. 通过VPC Endpoint节点,私有访问S3中的模型文件

2. 创建亚马逊云科技Service Catalog资源组,统一创建、管理用户的云服务产品。

3. 作为Service Catalog的使用用户创建一个SageMaker机器学习训练计算实例

项目搭建具体步骤:

1. 登录亚马逊云科技控制台,进入无服务器计算服务Lambda,创建一个Lambda函数“SageMakerBuild”,复制以下代码,用于创建SageMaker Jupyter Notebook,训练AI大模型。

import json
import boto3
import requests
import botocore
import time
import base64

## Request Status ##
global ReqStatus


def CFTFailedResponse(event, status, message):
    print("Inside CFTFailedResponse")
    responseBody = {
        'Status': status,
        'Reason': message,
        'PhysicalResourceId': event['ServiceToken'],
        'StackId': event['StackId'],
        'RequestId': event['RequestId'],
        'LogicalResourceId': event['LogicalResourceId']
    }
	
    headers={
        'content-type':'',
        'content-length':str(len(json.dumps(responseBody)))	 
    }	
    print('Response = ' + json.dumps(responseBody))
    try:	
        req=requests.put(event['ResponseURL'], data=json.dumps(responseBody),headers=headers)
        print("delete_respond_cloudformation res "+str(req))		
    except Exception as e:
        print("Failed to send cf response {}".format(e))
        
def CFTSuccessResponse(event, status, data=None):
    responseBody = {
        'Status': status,
        'Reason': 'See the details in CloudWatch Log Stream',
        'PhysicalResourceId': event['ServiceToken'],
        'StackId': event['StackId'],
        'RequestId': event['RequestId'],
        'LogicalResourceId': event['LogicalResourceId'],
        'Data': data
    }
    headers={
        'content-type':'',
        'content-length':str(len(json.dumps(responseBody)))	 
    }	
    print('Response = ' + json.dumps(responseBody))
    #print(event)
    try:	
        req=requests.put(event['ResponseURL'], data=json.dumps(responseBody),headers=headers)
    except Exception as e:
        print("Failed to send cf response {}".format(e))


def lambda_handler(event, context):
    ReqStatus = "SUCCESS"
    print("Event:")
    print(event)
    client = boto3.client('sagemaker')
    ec2client = boto3.client('ec2')
    data = {}

    if event['RequestType'] == 'Create':
        try:
            ## Value Intialization from CFT ##
            project_name = event['ResourceProperties']['ProjectName']
            kmsKeyId = event['ResourceProperties']['KmsKeyId']
            Tags = event['ResourceProperties']['Tags']
            env_name = event['ResourceProperties']['ENVName']
            subnet_name = event['ResourceProperties']['Subnet']
            security_group_name = event['ResourceProperties']['SecurityGroupName']

            input_dict = {}
            input_dict['NotebookInstanceName'] = event['ResourceProperties']['NotebookInstanceName']
            input_dict['InstanceType'] = event['ResourceProperties']['NotebookInstanceType']
            input_dict['Tags'] = event['ResourceProperties']['Tags']
            input_dict['DirectInternetAccess'] = event['ResourceProperties']['DirectInternetAccess']
            input_dict['RootAccess'] = event['ResourceProperties']['RootAccess']
            input_dict['VolumeSizeInGB'] = int(event['ResourceProperties']['VolumeSizeInGB'])
            input_dict['RoleArn'] = event['ResourceProperties']['RoleArn']
            input_dict['LifecycleConfigName'] = event['ResourceProperties']['LifecycleConfigName']

        except Exception as e:
            print(e)
            ReqStatus = "FAILED"
            message = "Parameter Error: "+str(e)
            CFTFailedResponse(event, "FAILED", message)
        if ReqStatus == "FAILED":
            return None;
        print("Validating Environment name: "+env_name)
        print("Subnet Id Fetching.....")
        try:
            ## Sagemaker Subnet ##
            subnetName = env_name+"-ResourceSubnet"
            print(subnetName)
            response = ec2client.describe_subnets(
                Filters=[
                    {
                        'Name': 'tag:Name',
                        'Values': [
                            subnet_name
                        ]
                    },
                ]
            )
            #print(response)
            subnetId = response['Subnets'][0]['SubnetId']
            input_dict['SubnetId'] = subnetId
            print("Desc sg done!!")
        except Exception as e:
            print(e)
            ReqStatus = "FAILED"
            message = " Project Name is invalid - Subnet Error: "+str(e)
            CFTFailedResponse(event, "FAILED", message)
        if ReqStatus == "FAILED":
            return None;
        ## Sagemaker Security group ##
        print("Security GroupId Fetching.....")
        try:
            sgName = env_name+"-ResourceSG"
            response = ec2client.describe_security_groups(
                Filters=[
                    {
                        'Name': 'tag:Name',
                        'Values': [
                            security_group_name
                        ]
                    },
                ]
            )
            sgId = response['SecurityGroups'][0]['GroupId']
            input_dict['SecurityGroupIds'] = [sgId]
            print("Desc sg done!!")
        except Exception as e:
            print(e)
            ReqStatus = "FAILED"
            message = "Security Group ID Error: "+str(e)
            CFTFailedResponse(event, "FAILED", message)
        if ReqStatus == "FAILED":
            return None;    
        try:
            if kmsKeyId:
                input_dict['KmsKeyId'] = kmsKeyId
            else:
                print("in else")
                
            print(input_dict)
            instance = client.create_notebook_instance(**input_dict)
            print('Sagemager CLI response')
            print(str(instance))
            responseData = {'NotebookInstanceArn': instance['NotebookInstanceArn']}
            
            NotebookStatus = 'Pending'
            response = client.describe_notebook_instance(
                NotebookInstanceName=event['ResourceProperties']['NotebookInstanceName']
            )
            NotebookStatus = response['NotebookInstanceStatus']
            print("NotebookStatus:"+NotebookStatus)
            
            ## Notebook Failure ##
            if NotebookStatus == 'Failed':
                message = NotebookStatus+": "+response['FailureReason']+" :Notebook is not coming InService"
                CFTFailedResponse(event, "FAILED", message)
            else:
                while NotebookStatus == 'Pending':
                    time.sleep(200)
                    response = client.describe_notebook_instance(
                        NotebookInstanceName=event['ResourceProperties']['NotebookInstanceName']
                    )
                    NotebookStatus = response['NotebookInstanceStatus']
                    print("NotebookStatus in loop:"+NotebookStatus)
                
                ## Notebook Success ##
                if NotebookStatus == 'InService':
                    data['Message'] = "SageMaker Notebook name - "+event['ResourceProperties']['NotebookInstanceName']+" created succesfully"
                    print("message InService :",data['Message'])
                    CFTSuccessResponse(event, "SUCCESS", data)
                else:
                    message = NotebookStatus+": "+response['FailureReason']+" :Notebook is not coming InService"
                    print("message :",message)
                    CFTFailedResponse(event, "FAILED", message)
        except Exception as e:
            print(e)
            ReqStatus = "FAILED"
            CFTFailedResponse(event, "FAILED", str(e))
    if event['RequestType'] == 'Delete':
        NotebookStatus = None
        lifecycle_config = event['ResourceProperties']['LifecycleConfigName']
        NotebookName = event['ResourceProperties']['NotebookInstanceName']

        try:
            response = client.describe_notebook_instance(
                NotebookInstanceName=NotebookName
            )
            NotebookStatus = response['NotebookInstanceStatus']
            print("Notebook Status - "+NotebookStatus)
        except Exception as e:
            print(e)
            NotebookStatus = "Invalid"
            #CFTFailedResponse(event, "FAILED", str(e))
        while NotebookStatus == 'Pending':
            time.sleep(30)
            response = client.describe_notebook_instance(
                NotebookInstanceName=NotebookName
            )
            NotebookStatus = response['NotebookInstanceStatus']
            print("NotebookStatus:"+NotebookStatus)
        if NotebookStatus != 'Failed' and NotebookStatus != 'Invalid' :
            print("Delete request for Notebookk name: "+NotebookName)
            print("Stoping the Notebook.....")
            if NotebookStatus != 'Stopped':
                try:
                    response = client.stop_notebook_instance(
                        NotebookInstanceName=NotebookName
                    )
                    NotebookStatus = 'Stopping'
                    print("Notebook Status - "+NotebookStatus)
                    while NotebookStatus == 'Stopping':
                        time.sleep(30)
                        response = client.describe_notebook_instance(
                            NotebookInstanceName=NotebookName
                        )
                        NotebookStatus = response['NotebookInstanceStatus']
                    print("NotebookStatus:"+NotebookStatus)
                except Exception as e:
                    print(e)
                    NotebookStatus = "Invalid"
                    CFTFailedResponse(event, "FAILED", str(e))
                
            else:
                NotebookStatus = 'Stopped'
                print("NotebookStatus:"+NotebookStatus)
        
        if NotebookStatus != 'Invalid':
            print("Deleting The Notebook......")
            time.sleep(5)
            try:
                response = client.delete_notebook_instance(
                    NotebookInstanceName=NotebookName
                )
                print("Notebook Deleted")
                data["Message"] = "Notebook Deleted"
                CFTSuccessResponse(event, "SUCCESS", data)
            except Exception as e:
                print(e)
                CFTFailedResponse(event, "FAILED", str(e))
            
        else:
            print("Notebook Invalid status")
            data["Message"] = "Notebook is not available"
            CFTSuccessResponse(event, "SUCCESS", data)
    
    if event['RequestType'] == 'Update':
        print("Update operation for Sagemaker Notebook is not recommended")
        data["Message"] = "Update operation for Sagemaker Notebook is not recommended"
        CFTSuccessResponse(event, "SUCCESS", data)
        
    
        
		    

2. 接下来我们创建一个yaml脚本,复制以下代码,上传到S3桶中,用于通过CloudFormation,以IaC的形式创建SageMaker Jupyter Notebook。

AWSTemplateFormatVersion: 2010-09-09
Description: Template to create a SageMaker notebook
Metadata:
  'AWS::CloudFormation::Interface':
    ParameterGroups:
      - Label:
          default: Environment detail
        Parameters:
          - ENVName
      - Label:
          default: SageMaker Notebook configuration
        Parameters:
          - NotebookInstanceName
          - NotebookInstanceType
          - DirectInternetAccess
          - RootAccess
          - VolumeSizeInGB
      - Label:
          default: Load S3 Bucket to SageMaker
        Parameters:
          - S3CodePusher
          - CodeBucketName
      - Label:
          default: Project detail
        Parameters:
          - ProjectName
          - ProjectID
    ParameterLabels:
      DirectInternetAccess:
        default: Default Internet Access
      NotebookInstanceName:
        default: Notebook Instance Name
      NotebookInstanceType:
        default: Notebook Instance Type
      ENVName:
        default: Environment Name
      ProjectName:
        default: Project Suffix
      RootAccess:
        default: Root access
      VolumeSizeInGB:
        default: Volume size for the SageMaker Notebook
      ProjectID:
        default: SageMaker ProjectID
      CodeBucketName:
        default: Code Bucket Name        
      S3CodePusher:
        default: Copy code from S3 to SageMaker
Parameters:
  SubnetName:
    Default: ProSM-ResourceSubnet
    Description: Subnet Random String
    Type: String
  SecurityGroupName:
    Default: ProSM-ResourceSG
    Description: Security Group Name
    Type: String
  SageMakerBuildFunctionARN:
    Description: Service Token Value passed from Lambda Stack
    Type: String
  NotebookInstanceName:
    AllowedPattern: '[A-Za-z0-9-]{1,63}'
    ConstraintDescription: >-
      Maximum of 63 alphanumeric characters. Can include hyphens (-), but not
      spaces. Must be unique within your account in an AWS Region.
    Description: SageMaker Notebook instance name
    MaxLength: '63'
    MinLength: '1'
    Type: String
  NotebookInstanceType:
    ConstraintDescription: Must select a valid notebook instance type.
    Default: ml.t3.medium
    Description: Select Instance type for the SageMaker Notebook
    Type: String
  ENVName:
    Description: SageMaker infrastructure naming convention
    Type: String
  ProjectName:
    Description: >-
      The suffix appended to all resources in the stack.  This will allow
      multiple copies of the same stack to be created in the same account.
    Type: String
  RootAccess:
    Description: Root access for the SageMaker Notebook user
    AllowedValues:
      - Enabled
      - Disabled
    Default: Enabled
    Type: String
  VolumeSizeInGB:
    Description: >-
      The size, in GB, of the ML storage volume to attach to the notebook
      instance. The default value is 5 GB.
    Type: Number
    Default: '20'
  DirectInternetAccess:
    Description: >-
      If you set this to Disabled this notebook instance will be able to access
      resources only in your VPC. As per the Project requirement, we have
      Disabled it.
    Type: String
    Default: Disabled
    AllowedValues:
      - Disabled
    ConstraintDescription: Must select a valid notebook instance type.
  ProjectID:
    Type: String
    Description: Enter a valid ProjectID.
    Default: QuickStart007
  S3CodePusher:
    Description: Do you want to load the code from S3 to SageMaker Notebook
    Default: 'NO'
    AllowedValues:
      - 'YES'
      - 'NO'
    Type: String
  CodeBucketName:
    Description: S3 Bucket name from which you want to copy the code to SageMaker.
    Default: lab-materials-bucket-1234
    Type: String    
Conditions:
  BucketCondition: !Equals 
    - 'YES'
    - !Ref S3CodePusher
Resources:
  SagemakerKMSKey:
    Type: 'AWS::KMS::Key'
    Properties:
      EnableKeyRotation: true
      Tags:
        - Key: ProjectID
          Value: !Ref ProjectID
        - Key: ProjectName
          Value: !Ref ProjectName
      KeyPolicy:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Principal:
            AWS: !Sub 'arn:aws:iam::${AWS::AccountId}:root'
          Action: 
            - 'kms:Encrypt'
            - 'kms:PutKeyPolicy' 
            - 'kms:CreateKey' 
            - 'kms:GetKeyRotationStatus' 
            - 'kms:DeleteImportedKeyMaterial' 
            - 'kms:GetKeyPolicy' 
            - 'kms:UpdateCustomKeyStore' 
            - 'kms:GenerateRandom' 
            - 'kms:UpdateAlias'
            - 'kms:ImportKeyMaterial'
            - 'kms:ListRetirableGrants' 
            - 'kms:CreateGrant' 
            - 'kms:DeleteAlias'
            - 'kms:RetireGrant'
            - 'kms:ScheduleKeyDeletion' 
            - 'kms:DisableKeyRotation' 
            - 'kms:TagResource' 
            - 'kms:CreateAlias' 
            - 'kms:EnableKeyRotation' 
            - 'kms:DisableKey'
            - 'kms:ListResourceTags'
            - 'kms:Verify' 
            - 'kms:DeleteCustomKeyStore'
            - 'kms:Sign' 
            - 'kms:ListKeys'
            - 'kms:ListGrants'
            - 'kms:ListAliases' 
            - 'kms:ReEncryptTo' 
            - 'kms:UntagResource' 
            - 'kms:GetParametersForImport'
            - 'kms:ListKeyPolicies'
            - 'kms:GenerateDataKeyPair'
            - 'kms:GenerateDataKeyPairWithoutPlaintext' 
            - 'kms:GetPublicKey' 
            - 'kms:Decrypt' 
            - 'kms:ReEncryptFrom'
            - 'kms:DisconnectCustomKeyStore' 
            - 'kms:DescribeKey'
            - 'kms:GenerateDataKeyWithoutPlaintext'
            - 'kms:DescribeCustomKeyStores' 
            - 'kms:CreateCustomKeyStore'
            - 'kms:EnableKey'
            - 'kms:RevokeGrant'
            - 'kms:UpdateKeyDescription' 
            - 'kms:ConnectCustomKeyStore' 
            - 'kms:CancelKeyDeletion' 
            - 'kms:GenerateDataKey'
          Resource:
            - !Join 
              - ''
              - - 'arn:aws:kms:'
                - !Ref 'AWS::Region'
                - ':'
                - !Ref 'AWS::AccountId'
                - ':key/*'
        - Sid: Allow access for Key Administrators
          Effect: Allow
          Principal:
            AWS: 
              - !GetAtt SageMakerExecutionRole.Arn
          Action:
            - 'kms:CreateAlias'
            - 'kms:CreateKey'
            - 'kms:CreateGrant' 
            - 'kms:CreateCustomKeyStore'
            - 'kms:DescribeKey'
            - 'kms:DescribeCustomKeyStores'
            - 'kms:EnableKey'
            - 'kms:EnableKeyRotation'
            - 'kms:ListKeys'
            - 'kms:ListAliases'
            - 'kms:ListKeyPolicies'
            - 'kms:ListGrants'
            - 'kms:ListRetirableGrants'
            - 'kms:ListResourceTags'
            - 'kms:PutKeyPolicy'
            - 'kms:UpdateAlias'
            - 'kms:UpdateKeyDescription'
            - 'kms:UpdateCustomKeyStore'
            - 'kms:RevokeGrant'
            - 'kms:DisableKey'
            - 'kms:DisableKeyRotation'
            - 'kms:GetPublicKey'
            - 'kms:GetKeyRotationStatus'
            - 'kms:GetKeyPolicy'
            - 'kms:GetParametersForImport'
            - 'kms:DeleteCustomKeyStore'
            - 'kms:DeleteImportedKeyMaterial'
            - 'kms:DeleteAlias'
            - 'kms:TagResource'
            - 'kms:UntagResource'
            - 'kms:ScheduleKeyDeletion'
            - 'kms:CancelKeyDeletion'
          Resource:
            - !Join 
              - ''
              - - 'arn:aws:kms:'
                - !Ref 'AWS::Region'
                - ':'
                - !Ref 'AWS::AccountId'
                - ':key/*'
        - Sid: Allow use of the key
          Effect: Allow
          Principal:
            AWS: 
              - !GetAtt SageMakerExecutionRole.Arn

          Action:
            - kms:Encrypt
            - kms:Decrypt
            - kms:ReEncryptTo
            - kms:ReEncryptFrom
            - kms:GenerateDataKeyPair
            - kms:GenerateDataKeyPairWithoutPlaintext
            - kms:GenerateDataKeyWithoutPlaintext
            - kms:GenerateDataKey
            - kms:DescribeKey
          Resource:
            - !Join 
              - ''
              - - 'arn:aws:kms:'
                - !Ref 'AWS::Region'
                - ':'
                - !Ref 'AWS::AccountId'
                - ':key/*'
        - Sid: Allow attachment of persistent resources
          Effect: Allow
          Principal:
            AWS: 
              - !GetAtt SageMakerExecutionRole.Arn

          Action:
            - kms:CreateGrant
            - kms:ListGrants
            - kms:RevokeGrant
          Resource:
            - !Join 
              - ''
              - - 'arn:aws:kms:'
                - !Ref 'AWS::Region'
                - ':'
                - !Ref 'AWS::AccountId'
                - ':key/*'
          Condition:
            Bool:
              kms:GrantIsForAWSResource: 'true'
  KeyAlias:
    Type: AWS::KMS::Alias
    Properties:
      AliasName: 'alias/SageMaker-CMK-DS'
      TargetKeyId:
        Ref: SagemakerKMSKey
  SageMakerExecutionRole:
    Type: 'AWS::IAM::Role'
    Properties:
      Tags:
        - Key: ProjectID
          Value: !Ref ProjectID
        - Key: ProjectName
          Value: !Ref ProjectName
      AssumeRolePolicyDocument:
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - sagemaker.amazonaws.com
            Action:
              - 'sts:AssumeRole'
      Path: /
      Policies:
        - PolicyName: !Join 
            - ''
            - - !Ref ProjectName
              - SageMakerExecutionPolicy
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - 'iam:ListRoles'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:iam::'
                      - !Ref 'AWS::AccountId'
                      - ':role/*'
              - Sid: CloudArnResource
                Effect: Allow
                Action:
                  - 'application-autoscaling:DeleteScalingPolicy'
                  - 'application-autoscaling:DeleteScheduledAction'
                  - 'application-autoscaling:DeregisterScalableTarget'
                  - 'application-autoscaling:DescribeScalableTargets'
                  - 'application-autoscaling:DescribeScalingActivities'
                  - 'application-autoscaling:DescribeScalingPolicies'
                  - 'application-autoscaling:DescribeScheduledActions'
                  - 'application-autoscaling:PutScalingPolicy'
                  - 'application-autoscaling:PutScheduledAction'
                  - 'application-autoscaling:RegisterScalableTarget'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:autoscaling:'
                      - !Ref 'AWS::Region'
                      - ':'
                      - !Ref 'AWS::AccountId'
                      - ':*'
              - Sid: ElasticArnResource
                Effect: Allow
                Action:
                  - 'elastic-inference:Connect'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:elastic-inference:'
                      - !Ref 'AWS::Region'
                      - ':'
                      - !Ref 'AWS::AccountId'
                      - ':elastic-inference-accelerator/*'  
              - Sid: SNSArnResource
                Effect: Allow
                Action:
                  - 'sns:ListTopics'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:sns:'
                      - !Ref 'AWS::Region'
                      - ':'
                      - !Ref 'AWS::AccountId'
                      - ':*'
              - Sid: logsArnResource
                Effect: Allow
                Action:
                  - 'cloudwatch:DeleteAlarms'
                  - 'cloudwatch:DescribeAlarms'
                  - 'cloudwatch:GetMetricData'
                  - 'cloudwatch:GetMetricStatistics'
                  - 'cloudwatch:ListMetrics'
                  - 'cloudwatch:PutMetricAlarm'
                  - 'cloudwatch:PutMetricData'
                  - 'logs:CreateLogGroup'
                  - 'logs:CreateLogStream'
                  - 'logs:DescribeLogStreams'
                  - 'logs:GetLogEvents'
                  - 'logs:PutLogEvents'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:logs:'
                      - !Ref 'AWS::Region'
                      - ':'
                      - !Ref 'AWS::AccountId'
                      - ':log-group:/aws/lambda/*'
              - Sid: KmsArnResource
                Effect: Allow
                Action:
                  - 'kms:DescribeKey'
                  - 'kms:ListAliases'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:kms:'
                      - !Ref 'AWS::Region'
                      - ':'
                      - !Ref 'AWS::AccountId'
                      - ':key/*'
              - Sid: ECRArnResource
                Effect: Allow
                Action:
                  - 'ecr:BatchCheckLayerAvailability'
                  - 'ecr:BatchGetImage'
                  - 'ecr:CreateRepository'
                  - 'ecr:GetAuthorizationToken'
                  - 'ecr:GetDownloadUrlForLayer'
                  - 'ecr:DescribeRepositories'
                  - 'ecr:DescribeImageScanFindings'
                  - 'ecr:DescribeRegistry'
                  - 'ecr:DescribeImages'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:ecr:'
                      - !Ref 'AWS::Region'
                      - ':'
                      - !Ref 'AWS::AccountId'
                      - ':repository/*'
              - Sid: EC2ArnResource
                Effect: Allow
                Action:        
                  - 'ec2:CreateNetworkInterface'
                  - 'ec2:CreateNetworkInterfacePermission'
                  - 'ec2:DeleteNetworkInterface'
                  - 'ec2:DeleteNetworkInterfacePermission'
                  - 'ec2:DescribeDhcpOptions'
                  - 'ec2:DescribeNetworkInterfaces'
                  - 'ec2:DescribeRouteTables'
                  - 'ec2:DescribeSecurityGroups'
                  - 'ec2:DescribeSubnets'
                  - 'ec2:DescribeVpcEndpoints'
                  - 'ec2:DescribeVpcs'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:ec2:'
                      - !Ref 'AWS::Region'
                      - ':'
                      - !Ref 'AWS::AccountId'
                      - ':instance/*'
              - Sid: S3ArnResource
                Effect: Allow
                Action: 
                  - 's3:CreateBucket'
                  - 's3:GetBucketLocation'
                  - 's3:ListBucket'       
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:s3::'
                      - ':*sagemaker*'                  
              - Sid: LambdaInvokePermission
                Effect: Allow
                Action:
                  - 'lambda:ListFunctions'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:lambda:'
                      - !Ref 'AWS::Region'
                      - ':'
                      - !Ref 'AWS::AccountId'
                      - ':function'
                      - ':*'
              - Effect: Allow
                Action: 'sagemaker:InvokeEndpoint'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:sagemaker:'
                      - !Ref 'AWS::Region'
                      - ':'
                      - !Ref 'AWS::AccountId'
                      - ':notebook-instance-lifecycle-config/*'
                Condition:
                  StringEquals:
                    'aws:PrincipalTag/ProjectID': !Ref ProjectID
              - Effect: Allow
                Action:
                  - 'sagemaker:CreateTrainingJob'
                  - 'sagemaker:CreateEndpoint'
                  - 'sagemaker:CreateModel'
                  - 'sagemaker:CreateEndpointConfig'
                  - 'sagemaker:CreateHyperParameterTuningJob'
                  - 'sagemaker:CreateTransformJob'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:sagemaker:'
                      - !Ref 'AWS::Region'
                      - ':'
                      - !Ref 'AWS::AccountId'
                      - ':notebook-instance-lifecycle-config/*'
                Condition:
                  StringEquals:
                    'aws:PrincipalTag/ProjectID': !Ref ProjectID
                  'ForAllValues:StringEquals':
                    'aws:TagKeys':
                      - Username
              - Effect: Allow
                Action:
                  - 'sagemaker:DescribeTrainingJob'
                  - 'sagemaker:DescribeEndpoint'
                  - 'sagemaker:DescribeEndpointConfig'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:sagemaker:'
                      - !Ref 'AWS::Region'
                      - ':'
                      - !Ref 'AWS::AccountId'
                      - ':notebook-instance-lifecycle-config/*'
                Condition:
                  StringEquals:
                    'aws:PrincipalTag/ProjectID': !Ref ProjectID
              - Effect: Allow
                Action:
                  - 'sagemaker:DeleteTags'
                  - 'sagemaker:ListTags'
                  - 'sagemaker:DescribeNotebookInstance'
                  - 'sagemaker:ListNotebookInstanceLifecycleConfigs'
                  - 'sagemaker:DescribeModel'
                  - 'sagemaker:ListTrainingJobs'
                  - 'sagemaker:DescribeHyperParameterTuningJob'
                  - 'sagemaker:UpdateEndpointWeightsAndCapacities'
                  - 'sagemaker:ListHyperParameterTuningJobs'
                  - 'sagemaker:ListEndpointConfigs'
                  - 'sagemaker:DescribeNotebookInstanceLifecycleConfig'
                  - 'sagemaker:ListTrainingJobsForHyperParameterTuningJob'
                  - 'sagemaker:StopHyperParameterTuningJob'
                  - 'sagemaker:DescribeEndpointConfig'
                  - 'sagemaker:ListModels'
                  - 'sagemaker:AddTags'
                  - 'sagemaker:ListNotebookInstances'
                  - 'sagemaker:StopTrainingJob'
                  - 'sagemaker:ListEndpoints'
                  - 'sagemaker:DeleteEndpoint'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:sagemaker:'
                      - !Ref 'AWS::Region'
                      - ':'
                      - !Ref 'AWS::AccountId'
                      - ':notebook-instance-lifecycle-config/*'
                Condition:
                  StringEquals:
                    'aws:PrincipalTag/ProjectID': !Ref ProjectID
              - Effect: Allow
                Action:
                  - 'ecr:SetRepositoryPolicy'
                  - 'ecr:CompleteLayerUpload'
                  - 'ecr:BatchDeleteImage'
                  - 'ecr:UploadLayerPart'
                  - 'ecr:DeleteRepositoryPolicy'
                  - 'ecr:InitiateLayerUpload'
                  - 'ecr:DeleteRepository'
                  - 'ecr:PutImage'
                Resource: 
                  - !Join 
                    - ''
                    - - 'arn:aws:ecr:'
                      - !Ref 'AWS::Region'
                      - ':'
                      - !Ref 'AWS::AccountId'
                      - ':repository/*sagemaker*'
              - Effect: Allow
                Action:
                  - 's3:GetObject'
                  - 's3:ListBucket'
                  - 's3:PutObject'
                  - 's3:DeleteObject'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:s3:::'
                      - !Ref SagemakerS3Bucket
                  - !Join 
                    - ''
                    - - 'arn:aws:s3:::'
                      - !Ref SagemakerS3Bucket
                      - /*
                Condition:
                  StringEquals:
                    'aws:PrincipalTag/ProjectID': !Ref ProjectID
              - Effect: Allow
                Action: 'iam:PassRole'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:iam::'
                      - !Ref 'AWS::AccountId'
                      - ':role/*'
                Condition:
                  StringEquals:
                    'iam:PassedToService': sagemaker.amazonaws.com
  CodeBucketPolicy:
    Type: 'AWS::IAM::Policy'
    Condition: BucketCondition
    Properties:
      PolicyName: !Join 
        - ''
        - - !Ref ProjectName
          - CodeBucketPolicy
      PolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Action:
              - 's3:GetObject'
            Resource:
              - !Join 
                - ''
                - - 'arn:aws:s3:::'
                  - !Ref CodeBucketName
              - !Join 
                - ''
                - - 'arn:aws:s3:::'
                  - !Ref CodeBucketName
                  - '/*'
      Roles:
        - !Ref SageMakerExecutionRole
  SagemakerS3Bucket:
    Type: 'AWS::S3::Bucket'
    Properties:
      BucketEncryption:
        ServerSideEncryptionConfiguration:
          - ServerSideEncryptionByDefault:
              SSEAlgorithm: AES256
      Tags:
        - Key: ProjectID
          Value: !Ref ProjectID
        - Key: ProjectName
          Value: !Ref ProjectName
  S3Policy:
    Type: 'AWS::S3::BucketPolicy'
    Properties:
      Bucket: !Ref SagemakerS3Bucket
      PolicyDocument:
        Version: 2012-10-17
        Statement:
          - Sid: AllowAccessFromVPCEndpoint
            Effect: Allow
            Principal: "*"
            Action:
              - 's3:Get*'
              - 's3:Put*'
              - 's3:List*'
              - 's3:DeleteObject'
            Resource:
              - !Join 
                - ''
                - - 'arn:aws:s3:::'
                  - !Ref SagemakerS3Bucket
              - !Join 
                - ''
                - - 'arn:aws:s3:::'
                  - !Ref SagemakerS3Bucket
                  - '/*'
            Condition:
              StringEquals:
                "aws:sourceVpce": "<PASTE S3 VPC ENDPOINT ID>"
  EFSLifecycleConfig:
    Type: 'AWS::SageMaker::NotebookInstanceLifecycleConfig'
    Properties:
      NotebookInstanceLifecycleConfigName: 'Provisioned-LC'
      OnCreate:
        - Content: !Base64 
            'Fn::Join':
              - ''
              - - |
                  #!/bin/bash 
                - |
                  aws configure set sts_regional_endpoints regional 
                - yes | cp -rf ~/.aws/config /home/ec2-user/.aws/config
      OnStart:
        - Content: !Base64 
            'Fn::Join':
              - ''
              - - |
                  #!/bin/bash  
                - |
                  aws configure set sts_regional_endpoints regional 
                - yes | cp -rf ~/.aws/config /home/ec2-user/.aws/config  
  EFSLifecycleConfigForS3:
    Type: 'AWS::SageMaker::NotebookInstanceLifecycleConfig'
    Properties:
      NotebookInstanceLifecycleConfigName: 'Provisioned-LC-S3'
      OnCreate:
        - Content: !Base64 
            'Fn::Join':
              - ''
              - - |
                  #!/bin/bash 
                - |
                  # Copy Content
                - !Sub >
                  aws s3 cp s3://${CodeBucketName} /home/ec2-user/SageMaker/ --recursive 
                - |
                  # Set sts endpoint
                - >
                  aws configure set sts_regional_endpoints regional 
                - yes | cp -rf ~/.aws/config /home/ec2-user/.aws/config
      OnStart:
        - Content: !Base64 
            'Fn::Join':
              - ''
              - - |
                  #!/bin/bash  
                - |
                  aws configure set sts_regional_endpoints regional 
                - yes | cp -rf ~/.aws/config /home/ec2-user/.aws/config  
  SageMakerCustomResource:
    Type: 'Custom::SageMakerCustomResource'
    DependsOn: S3Policy
    Properties:
      ServiceToken: !Ref SageMakerBuildFunctionARN
      NotebookInstanceName: !Ref NotebookInstanceName
      NotebookInstanceType: !Ref NotebookInstanceType
      KmsKeyId: !Ref SagemakerKMSKey
      ENVName: !Join 
        - ''
        - - !Ref ENVName
          - !Sub Subnet1Id
      Subnet: !Ref SubnetName
      SecurityGroupName: !Ref SecurityGroupName
      ProjectName: !Ref ProjectName
      RootAccess: !Ref RootAccess
      VolumeSizeInGB: !Ref VolumeSizeInGB
      LifecycleConfigName: !If [BucketCondition, !GetAtt EFSLifecycleConfigForS3.NotebookInstanceLifecycleConfigName, !GetAtt EFSLifecycleConfig.NotebookInstanceLifecycleConfigName]  
      DirectInternetAccess: !Ref DirectInternetAccess
      RoleArn: !GetAtt 
        - SageMakerExecutionRole
        - Arn
      Tags:
        - Key: ProjectID
          Value: !Ref ProjectID
        - Key: ProjectName
          Value: !Ref ProjectName
Outputs:
  Message:
    Description: Execution Status
    Value: !GetAtt 
      - SageMakerCustomResource
      - Message
  SagemakerKMSKey:
    Description: KMS Key for encrypting Sagemaker resource
    Value: !Ref KeyAlias
  ExecutionRoleArn:
    Description: ARN of the Sagemaker Execution Role
    Value: !Ref SageMakerExecutionRole
  S3BucketName:
    Description: S3 bucket for SageMaker Notebook operation
    Value: !Ref SagemakerS3Bucket
  NotebookInstanceName:
    Description: Name of the Sagemaker Notebook instance created
    Value: !Ref NotebookInstanceName
  ProjectName:
    Description: Project ID used for SageMaker deployment
    Value: !Ref ProjectName
  ProjectID:
    Description: Project ID used for SageMaker deployment
    Value: !Ref ProjectID

3. 接下来我们进入VPC服务主页,进入Endpoint功能,点击Create endpoint创建一个VPC endpoint节点,用于SageMaker私密安全的访问S3桶中的大模型文件。

4. 为节点命名为“s3-endpoint”,并选择节点访问对象类型为AWS service,选择s3作为访问服务。

5. 选择节点所在的VPC,并配置路由表,最后点击创建。

6. 接下来我们进入亚马逊云科技service catalog服务主页,进入Portfolio功能,点击create创建一个新的portfolio,用于统一管理一整个包括不同云资源的服务。

7. 为service portfolio起名“SageMakerPortfolio“,所有者选为CQ。

8. 接下来我们为Portfolio添加云资源,点击"create product"

9. 我们选择通过CloudFormation IaC脚本的形式创建Product云资源,为Product其名为”SageMakerProduct“,所有者设置为CQ。

10. 在Product中添加CloudFormation脚本文件,我们通过URL的形式,将我们在第二步上传到S3中的CloudFormation脚本URL填入,并设置版本为1,最后点击Create创建Product云资源。

11.接下来我们进入到Constraints页面,点击create创建Constraints,用于通过权限管理限制利用Service Catalog Product对云资源的操作。

12. 选择限制我们刚刚创建的的Product: "SageMakerProduct",选择限制的类型为创建。

13. 为限制添加IAM角色规则,IAM角色中配置了对Product权限管理规则,再点击Create创建。

14. 接下来我们点击Access,创建一个Access来限制可以访问Product云资源的用户。

15. 我们添加了角色”SCEndUserRole“,用户代替用户访问Product创建云资源。

16. 接下来我们开始利用Service Catalog Product创建一些列的云资源。选中我们刚创建的Product,点击Launch

17. 为我们要创建的云资源Product起一个名字”DataScientistProduct“, 选择我们前一步创建的版本号1。

18. 为将要通过Product创建的SageMaker配置参数,环境名以及实例名

19. 添加我们在最开始创建的Lambda函数ARN ID,点击Launch开始创建。

20. 最后回到SageMaker服务主页,可以看到我们利用Service Catalog Product功能成功创建了一个新的Jupyter Notebook实例。利用这个实例,我们就可以开发我们的AI服务应用。

以上就是在亚马逊云科技上利用亚马逊云科技安全、合规地训练AI大模型和开发AI应用全部步骤。欢迎大家未来与我一起,未来获取更多国际前沿的生成式AI开发方案。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2041669.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

ARM架构(四)——异常中断和中断控制器(GIC)①

中断术语1——assert、routing、target、target to、target fromtaken 几个重要的概念:assert。routing、target、target to、target fromtaken 1.2 assert 外设发给GIC一个中断信号&#xff0c;GIC发给PE&#xff0c;PE对中断进行assert&#xff0c;断言这个中断是IRQ还是FI…

PPT怎么锁定图片不被移动?2个办公必备的实用技巧盘点!

插入到ppt的图片&#xff0c;怎么锁定不被移动&#xff1f;这是不少做PPT的人都会遇到的问题&#xff0c;想要移动的图片不会移动&#xff0c;不想移动的图片反而动了……诸如此类的迷之操作&#xff0c;直接把人整迷糊了。 ppt怎么锁定图片不被移动&#xff1f;就着这个问题&…

(贪心 + 双指针) LeetCode 455. 分发饼干

原题链接 一. 题目描述 假设你是一位很棒的家长&#xff0c;想要给你的孩子们一些小饼干。但是&#xff0c;每个孩子最多只能给一块饼干。 对每个孩子 i&#xff0c;都有一个胃口值 g[i]&#xff0c;这是能让孩子们满足胃口的饼干的最小尺寸&#xff1b;并且每块饼干 j&…

【数学建模备赛】Ep03:皮尔逊person相关系数

文章目录 一、前言&#x1f680;&#x1f680;&#x1f680;二、皮尔逊person相关系数&#xff1a;☀️☀️☀️1. 总体皮尔逊person相关系数① 总体和样本② 理解协方差&#xff08;受量纲影响&#xff09;③ 剔除量纲影响 2. 样本皮尔逊person相关系数3. 相关性可视化① 皮尔…

后端代码练习1——加法计算器

1. 需求 输入两个整数&#xff0c;点击 “点击相加” 按钮&#xff0c;显示计算结果。 2.准备工作 创建Spring Boot项目&#xff0c;引入Spring Web依赖&#xff0c;把前端代码放入static目录下。 2.1 前端代码 <!DOCTYPE html> <html lang"en"> <h…

Unity + HybridCLR 从零开始

官方文档开始学习,快速上手 | HybridCLR (code-philosophy.com)是官方文档链接 1.建议使用2019.4.40、2020.3.26、 2021.3.0、2022.3.0 中任一版本至于其他2019-2022LTS版本可能出现打包失败情况 2. Windows Win下需要安装visual studio 2019或更高版本。安装时至少要包含 使…

apache-lotdb集群部署

一、下载 发行版本 | IoTDB Website jdk版本&#xff1a; 系统版本&#xff1a; 二、服务器规划 节点名称主机名服务192.168.110.110master01.110110.cnConfigNode、DataNode192.168.110.111node01.110111.cnConfigNode、DataNode192.168.110.112node02.110112.cnConfigNode、…

一文搞懂Python自动化测试框架!

一文搞懂Python自动化测试框架 如果你选用python做自动化测试&#xff0c;那么python测试框架的相关知识你必须要了解下。 首先我们先学习一下框架的基本知识。 什么是框架&#xff08;百度百科&#xff09;&#xff1f; 框架( Framwork )是构成一类特定软件可复用设计的一组…

100V-50mA超高压低压差线性稳压器具有电流保护功能

产品概述 PC6001 是一款能够耐受超高电压的线性稳压器&#xff0c;不仅融合了耐热增强型封装的优势&#xff0c;还能够承受持续直流电压或最高达 100V 的瞬态输入电压。 PC6001 器件与任何高于 2.2F 的输出电容以及高于0.47F 的输入电容搭配使用时均可保持稳定&#xff08;过…

mfc140u.dll丢失错误解决方法的基本思路——四种修复mfc140u.dll的方法

当遇到mfc140u.dll丢失的错误时&#xff0c;意味着你的系统中缺失了一个重要的动态链接库文件&#xff0c;该文件是微软 Visual C Redistributable for Visual Studio 2015 的一部分&#xff0c;对于运行那些用 Visual C 开发的程序是必需的。今天就教你mfc140u.dll丢失错误解决…

派单系统功能案例分析

派单系统是一种专门用于协调和分配任务的软件系统&#xff0c;它通过自动化和智能化的方式&#xff0c;确保任务能够高效地完成。以下是对派单系统功能的案例分析&#xff0c;主要从任务分配、实时监控、数据统计与分析以及行业应用等方面进行阐述。 一、任务分配 派单系统的核…

​线上教育_VR虚拟实验室​解决方案优缺点

线上教育的兴起也预示着对VR虚拟实验室的需求&#xff0c;这些虚拟实验室可以帮助学生学习他们研究的经验和进行实践&#xff0c;帮助学生更好地理解知识。但是&#xff0c;基于VR虚拟现实技术的虚拟实验室本质上是灵活的&#xff0c;它能让孩子们更轻松、更快速地探索各种新事…

【博主推荐】HTML5新闻,博客,官网网站源码文章瀑布流+详情页面

文章目录 1.设计来源1.1 主界面1.2 文章详情界面1.3 联系我们界面1.4 关于我们界面 2.效果和源码2.1 动态效果2.2 源代码 源码下载万套模板&#xff0c;程序开发&#xff0c;在线开发&#xff0c;在线沟通 【博主推荐】&#xff1a;前些天发现了一个巨牛的人工智能学习网站&…

mfc运行时报错内存不足闪退等问题

问题 mfc的打包程序源代码所在主机可以运行&#xff0c;在其他主机不能脱机运行&#xff0c;会报内存不足等莫名其妙的问题。 解决方法 排除其他代码上的问题后&#xff0c;看看是不是编译链的工具组件一致&#xff0c;我看新建项目的教程时没注意&#xff0c;红色框里的俩一…

Grok-2惊艳亮相,文生图功能竟然“无所不能“!

Grok-2 生成的图片 在人工智能的战场上&#xff0c;一场新的风暴正在酝酿。埃隆马斯克&#xff0c;这位科技界的"钢铁侠"&#xff0c;再次以其独特的方式搅动了 AI 的风云。就在谷歌和OpenAI互相角力之际&#xff0c;马斯克的 xAI 公司悄然推出了新一代AI模型 Grok-2…

深度学习中之前馈神经网络

目录 基本结构和工作原理 神经元和权重 激活函数 深度前馈网络 应用场景 优缺点 深度前馈神经网络与卷积神经网络&#xff08;CNN&#xff09;和循环神经网络&#xff08;RNN&#xff09;的具体区别和联系是什么&#xff1f; 具体区别 联系 如何有效解决前馈神经网络…

微软开源库 Detours 详细介绍与使用实例分享

目录 1、Detours概述 2、Detours功能特性 3、Detours工作原理 4、Detours应用场景 5、Detours兼容性 6、Detours具体使用方法 7、Detours使用实例 - 使用Detours拦截系统库中的UnhandledExceptionFilter接口&#xff0c;实现对程序异常的拦截 C软件异常排查从入门到精通…

Spring SSM框架--MVC

一、介绍 Spring 框架是一个资源整合的框架&#xff0c;可以整合一切可以整合的资源&#xff08;Spring 自身和第三方&#xff09;&#xff0c;是一个庞大的生态&#xff0c;包含很多子框架&#xff1a;Spring Framework、Spring Boot、Spring Data、Spring Cloud…… 其中Spr…

全球海事航行通告解析辅助决策系统

“全球海事航行通告解析辅助决策系统”是一个针对海事行业设计的智能系统&#xff0c;旨在帮助海上导航和航运操作人员解析和应对全球发布的海事航行通告。 要做这样的系统我们必须要了解海事签派员的日常工作。 海事签派员&#xff0c;也称为船舶操作员或船运调度员&#xff0…

国产车规级TSN芯片获十万片订单,将步入规模化应用

近日&#xff0c;从北京科技企业东土科技获悉&#xff0c;其研发并孵化的我国首颗列入工信部国产汽车芯片名录的车规级时间敏感网络&#xff08;TSN&#xff09;交换网络芯片&#xff0c;于近期获得国家新能源汽车技术创新中心10万片芯片订单&#xff0c;将规模化应用于车载网关…