在亚马逊云科技AWS上开发大模型应用服务并设计提示词工程

项目简介：

接下来，小李哥将继续每天介绍一个基于亚马逊云科技AWS云计算平台的全球前沿AI技术解决方案，帮助大家快速了解国际上最热门的云计算平台亚马逊云科技AWS AI最佳实践，并应用到自己的日常工作里。

本次介绍的是如何利用亚马逊云科技大模型托管服务Amazon Bedrock设计GenAI驱动的网页应用服务，本架构设计全部采用了云原生Serverless架构，提供可扩展和安全的AI解决方案。通过Amazon API Gateway和AWS Lambda将应用程序与AI模型集成。本方案的解决方案架构图如下：

本方案将主要使用亚马逊云科技AWS上的大模型模型托管服务Amazon Bedrock，下面我们介绍一下该服务。

什么是Amazon Bedrock？

Amazon Bedrock 是亚马逊云客户（AWS）推出的一项专为生成式AI开发设计的大模型（LLM）托管服务。它提供了基础模型的便捷访问，使开发者能够在自己的应用中集成和使用这些强大的AI模型，而无需深入了解底层的机器学习技术。通过Bedrock，开发者可以轻松调用多种领先的生成式AI模型，例如图像生成、文本生成等，从而大幅缩短AI应用开发周期，降低开发难度。

Amazon Bedrock 的优势包括：

多样的基础模型选择：

Bedrock 提供了广泛的模型选择，如AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, 和Amazon，支持不同的生成式AI应用场景，满足开发者多样化的需求。

高效的无缝集成：

利用Bedrock，无需深入机器学习算法的实现细节，即可将复杂的AI能力集成到现有的亚马逊云科技上的应用程序中。

可扩展性和弹性：

得益于AWS的强大基础设施，Bedrock 可以灵活应对各种规模的数据和流量需求，确保应用的稳定性和高性能。

安全与合规：

Bedrock目前已经通过多项国内外主流合规认证，如ISO, SOC, CSA STAR Level 2, 和HIPAA。通过Bedrock，开发者可以享受AWS平台提供的安全性和合规性保障，确保数据隐私和安全性。

Amazon Bedrock 是开发者在亚马逊云科技上快速构建和部署AI应用的必备方案，让开发者可以更专注于创新，而不是被底层技术的复杂性所束缚。

本方案包括的内容：

将代码部署到 AWS Lambda 函数（无服务器计算资源）来调用 Amazon Bedrock 模型。

配置 Amazon API Gateway（云原生网关管理服务）与该 Lambda 函数的集成，提供公开访问的API端点。

在S3对象存储中部署和测试一个与 Amazon Bedrock 模型交互的单页面应用程序。

使用 Amazon SageMaker （开源模型训练、托管服务）进行实验并微调 LLM 提示。

项目搭建具体步骤：

下面跟着小李哥手把手搭建一个亚马逊云科技AWS上的生成式AI软件应用，并且微调大模型并设计高质量提示词工程。

1. 打开亚马逊云科技控制台，进入Amazon Bedrock服务

2. 进入服务后，点击“Model Access”，开启模型访问权限

3. 我们这里会使用到亚马逊自己家的模型“Titan Text G1 - Premier”，我们选择该模型并开启

4. 开启后，我们进入左侧的Text页面，我们会在这里利用Amazon Bedrock上的Titan模型生成内容

5. 进入Text文字生产页面后，我们选择刚刚开启的Titan模型

6. 我们输入“What is prompt engineering?”，对模型回复进行测试

7.接下来我们进入到模型训练/托管服务SageMaker中

8.进入到左侧Studio，并点击Open Studio

9.接下来我们打开SageMaker Notebook，开始大模型的微调

10. 接下来我们建立一个Lambda函数“Invoke_bedrock”，用于运行我们的应用代码。在Lambda函数前，我们添加API Gateway作为对外的API接口

Lambda代码如下:

#!/usr/bin/env python3
import boto3
import json
from datetime import datetime


def lambda_handler(event, context):
    print(event, context)

    body = event['body'] 
    path = event['path']
    method = event['httpMethod']
    prompt = json.loads(body)['prompt']

    if path == "/invokemodel" and method == "POST":
        model_id = 'amazon.titan-text-premier-v1:0'
        model_response = call_bedrock(model_id, prompt)
        return {
            'statusCode': 200,
            'headers': {
                'Content-Type': 'application/json',
                'Access-Control-Allow-Origin': '*'
            },
            'body': json.dumps(model_response)
        }
    else:
        return {
            'statusCode': 404,
            'headers': {
                'Content-Type': 'application/json',
                'Access-Control-Allow-Origin': '*'
            },
            'body': json.dumps({'message': 'Not Found'})
        }


def call_bedrock(model_id, prompt_data):
    bedrock_runtime = boto3.client('bedrock-runtime')

    body = json.dumps({
        "inputText": prompt_data,
        "textGenerationConfig":
        {
            "maxTokenCount":1000,
            "stopSequences":[],
            "temperature":0.7,
            "topP":0.9
        }
    })
    print("bedrock-input:", body)

    accept = 'application/json'
    content_type = 'application/json'

    before = datetime.now()
    response = bedrock_runtime.invoke_model(body=body, modelId=model_id, accept=accept, contentType=content_type)
    latency = (datetime.now() - before).seconds
    response_body = json.loads(response.get('body').read())
    response = response_body.get('results')[0].get('outputText')

    return {
        'latency': str(latency),
        'response': response
    }

11. 接下来我们编写前端代码，前端html代码如下

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Prompt Engineering with Amazon Bedrock</title>
  <link href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css" rel="stylesheet">
  <style>
    body, html {
      height: 100%;
      display: flex;
      justify-content: center;
      align-items: center;
      background-color: #f8f9fa;
      text-align: center;
      flex-direction: column;
    }
    #chat-container {
      width: 100%;
      max-width: 1600px;
    }
    #messages {
      height: 60vh;
      overflow-y: auto;
      background-color: #ffffff;
      padding: 20px;
      border-radius: 10px;
      box-shadow: 0 0 10px rgba(0, 0, 0, 0.1);
      margin-bottom: 20px;
    }
    .message {
      margin-bottom: 10px;
    }
    .message.user {
      text-align: right;
    }
    .message.bot {
      text-align: left;
    }
    .message p {
      display: inline-block;
      padding: 10px;
      border-radius: 10px;
    }
    .message.user p {
      background-color: #007bff;
      color: white;
    }
    .message.bot p {
      background-color: #f1f1f1;
    }
    #input-container {
      display: flex;
    }
    #prompt {
      flex: 1;
      padding: 10px;
      border-radius: 10px 0 0 10px;
    }
    #send-btn {
      padding: 10px 20px;
      border-radius: 0 10px 10px 0;
    }
    #spinner {
      display: none;
      margin-left: 10px;
    }
    #disclaimer {
      margin-top: 20px;
      font-size: 0.9em;
      color: #555;
    }
  </style>
</head>
<body>
  <div id="chat-container" class="container">
    <h1>Prompt Engineering with Amazon Bedrock</h1>
    <div id="messages">
      <!-- Messages will be appended here -->
    </div>
    <div id="input-container">
      <input id="prompt" type="text" class="form-control" placeholder="Type your message here...">
      <button id="send-btn" class="btn btn-primary">Send</button>
      <div id="spinner" class="spinner-border text-primary" role="status">
        <span class="sr-only">Loading...</span>
      </div>
    </div>
    <div id="disclaimer">
      Please note: As with all AI-powered applications, outputs should be reviewed for accuracy and appropriateness.
    </div>
  </div>

  <script>
    document.addEventListener('DOMContentLoaded', function() {
      const promptInput = document.getElementById('prompt');
      const sendBtn = document.getElementById('send-btn');
      const messagesContainer = document.getElementById('messages');
      const spinner = document.getElementById('spinner');

      const appendMessage = (text, isUser = true) => {
        const messageDiv = document.createElement('div');
        messageDiv.classList.add('message', isUser ? 'user' : 'bot');
        const messageText = document.createElement('p');
        messageText.innerText = text;
        messageDiv.appendChild(messageText);
        messagesContainer.appendChild(messageDiv);
        messagesContainer.scrollTop = messagesContainer.scrollHeight;
      };

      const sendMessage = () => {
        const prompt = promptInput.value;
        if (!prompt) return;
        appendMessage(prompt);
        promptInput.value = '';
        spinner.style.display = 'block';

        fetch('https://6y4hknvnub.execute-api.us-east-1.amazonaws.com/prod/invokemodel', {
          method: 'POST',
          headers: {
            'Content-Type': 'application/json'
          },
          body: JSON.stringify({ prompt: prompt })
        })
        .then(response => response.json())
        .then(data => {
          console.log(data); // Debugging: log the entire response data
          spinner.style.display = 'none';
          if (data && data.response) {
            appendMessage(data.response, false);
          } else {
            appendMessage('No response from server.', false);
          }
        })
        .catch(error => {
          console.error('Error:', error);
          spinner.style.display = 'none';
          appendMessage('An error occurred. Please try again.', false);
        });
      };

      sendBtn.addEventListener('click', sendMessage);
      promptInput.addEventListener('keydown', function(event) {
        if (event.key === 'Enter') {
          sendMessage();
        }
      });
    });
  </script>
</body>
</html>

12. 接下来我们利用以下命令，将html文件上传到S3桶中，作为前端网页服务器

aws s3 cp index.html s3://$WEB_BUCKET/

13. 在SageMaker中新建一个Notebook，运行以下代码

导入必要依赖，初始化调用Bedrock的客户端

## Code Cell 1 ##

import boto3
import json
import csv
from datetime import datetime


bedrock = boto3.client('bedrock')
bedrock_runtime = boto3.client('bedrock-runtime')
bedrock.list_foundation_models()

14. 定义调用模型回复的函数和代码段

## Code Cell 2 ##

def call_bedrock(modelId, prompt_data): 
    if 'amazon' in modelId:
        body = json.dumps({
            "inputText": prompt_data,
            "textGenerationConfig":
            {
                "maxTokenCount": 1024,
                "stopSequences":[],
                "temperature":0.7,
                "topP":0.9
            }
        })
    elif 'meta' in modelId:
        body = json.dumps({
            "prompt": prompt_data,
            "max_tokens_to_sample": 4096,
            "stop_sequences":[],
            "temperature":0,
            "top_p":0.9
        })
    elif 'mistral' in modelId:
        body = json.dumps({
            "prompt": prompt_data,
            "max_tokens_to_sample": 4096,
            "stop_sequences":[],
            "temperature":0,
            "top_p":0.9
        })
        print('Parameter model must be one of Titan, Lama, or Mixtral')
        return
    accept = 'application/json'
    contentType = 'application/json'

    before = datetime.now()
    response = bedrock_runtime.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
    latency = (datetime.now() - before)
    response_body = json.loads(response.get('body').read())

    if 'amazon' in modelId:
        response = response_body.get('results')[0].get('outputText')
    elif 'meta' in modelId:
        response = response_body.get('completion')
    elif 'mistral' in modelId:
        response = response_body.get('completions')[0].get('data').get('text')
        

    return response, latency

15. 定义不同场景的提示词

为娱乐公司的客户提供节目建议

## Code Cell 3 ##

prompt_data ="""
Human:
Generate a list of 10 recommended TV shows to watch, considering the information in the <metadata></metadata> XML tags. Include a very brief description of each recommendation.

<metadata>
Country is UK
Age range between 20-30
Shows must be about sports
</metadata>

Assistant:
"""


response, latency = call_bedrock('amazon.titan-text-premier-v1:0', prompt_data)
print(response, "\n\n", "Inference time:", latency)

根据过去的历史数据，预测未来电视节目观众

## Code Cell 4 ##

prompt_data ="""
Human: Last week, three television channels had the following viewership data:
- Monday: SportsTV 6500, NewsTV 3200, EntertainmentTV 4150
- Tuesday: SportsTV 6400, NewsTV 3300, EntertainmentTV 4100
- Wednesday: SportsTV 6300, NewsTV 3400, EntertainmentTV 4250

Question: How many viewers can we expect next Friday on SportsTV?
Answer: According to the numbers given, and without having more information, there is a daily decrease of 100 viewers for SportsTV.
If we assume that this trend will continue for the next few days, we can expect 6200 viewers for the next day, which is Thursday, and
therefore 6100 viewers for Friday.

Question: How many viewers can we expect on Saturday for each of the three channels? Think step-by-step, and provide recommendations for increasing the viewers.
Assistant:
Answer:
"""

response, latency = call_bedrock('amazon.titan-text-premier-v1:0', prompt_data)
print(response, "\n\n", "Inference time:", latency)

为娱乐公司创建一个客户机器人

## Code Cell 5 ##

prompt_data ="""
Context: The shows available are as follows
1. Circus, showing at the Plaza venue, assigned seating, live at 8pm on weekends
2. Concert, showing at the Main Theater, assigned seating, live at 10pm everyday
3. Basketball tricks, showing at the Sports venue, standing seating, live at 5pm on weekdays

Instruction: Answer any questions about the available shows. If you don't know the answer, say 'Apologies, I don't have the answer for that. Please contact our team by phone.'

Assistant: Welcome to Entertainment Tonight, how can I help you?
Human: Hi, I would like to know what shows are available please.
Assistant: Of course. Right now, we have the Circus, the Concert, and the Basketball tricks shows.
Human: Thank you. I would like to know when and where those shows are available please.
Assistant:
"""

response, latency = call_bedrock('amazon.titan-text-premier-v1:0', prompt_data)
print(response, "\n\n", "Inference time:", latency)

利用大模型开发生成网页代码

## Code Cell 6 ##

prompt_data ="""
An upcoming music concert is presented by the company, Music Promotions.
The event targets a young audience, age range between 18 and 40.
The event will occur in the Royal Music Theater.
Seating is assigned and tickets can be purchased through the Music Promotions website.
The event is a music concert performed by the band, Super Rockers.
The event will occur on June 30, 2023, and doors will open at 20:00.

Based on the information provided above, generate the HTML code for an attractive splash page to promote the event.
"""

response, latency = call_bedrock('amazon.titan-text-premier-v1:0', prompt_data)
print(response, "\n\n", "Inference time:", latency)
from IPython.display import display, HTML
display(HTML(response))

16. 如果大家想更新提示词，只需要将在SageMaker中测试的上述提示词，更新同步到Lambda函数中即可获得新版本的大模型应用响应。

以上就是在亚马逊云科技上利用大模型托管服务开发网页应用，并且设计高质量提示词工程的全部步骤。欢迎大家关注小李哥，未来获取更多国际前沿的生成式AI开发方案。