九、OpenAI之图片生成(Image generation)

news2025/7/13 22:11:01

学习用DALL.E的API怎样生成和操作图片

1 介绍

图片API提供3个方法来和图片进行交互：

从0开始基于文字提示创建图片(DALL.E 3 and DALL.E2)
基于一个新的提示词，通过让模型替换已有图像的某些区域来创建图像的编辑版本;（DALL.E2）
对1个已有的图片创建多个变化的版本（DALL.E2）

本指南涵盖了使用这三个API端点的基础知识，并提供了有用的代码示例。要尝试DALL·e3，请转到ChatGPT。

2 使用

图片生成的端点允许你通过一个提示词创建一个原始的图片。使用DALL.E3可以创建图片的大小有：024x1024, 1024x1792 或 1792x1024 像素。
默认图片的生成为标准质量，但使用DALL.E3可以通过quality: ‘hd’ 来强化细节。标准质量的图片生成速度是最快的，
你可能使用DALL.E3一次请求一个图片，或使用DALL.E2一次请求10个图片，通过设置n参数。

from openai import OpenAI
client = OpenAI()

response = client.images.generate(
  model="dall-e-3",
  prompt="a white siamese cat",
  size="1024x1024",
  quality="standard",
  n=1,
)

image_url = response.data[0].url

3 提示词

对于发布的DALL.E3为了安全模型采用默认的提示词和自动重写功能，并加入了一些细节（更详细的提示词会生成更高质量的图片）。
目前不可能屏蔽这个特性，为达到理想的图片你可以使用下面的提示词：I NEED to test how the tool works with extremely simple prompts. DO NOT add any detail, just use it AS-IS。在数据返回对象的revised_prompt字段中，可以查看更新的提示词。

4 DALL-E3生成案例

在这里插入图片描述
使用response_format参数，每个图像都可以作为URL或Base64数据返回。url将在一小时后过期

5 编辑(DALL-E2)

也被称为“inpainting”，图像编辑端点允许您通过上传图像和指示应该替换哪些区域的掩码来编辑或扩展图像。遮罩的透明区域表示应该编辑图像的位置，提示符应该描述完整的新图像，而不仅仅是擦除的区域。这个端点可以在ChatGPT Plus中实现DALL·E图像编辑等体验。

from openai import OpenAI
client = OpenAI()

response = client.images.edit((
  model="dall-e-2",
  image=open("sunlit_lounge.png", "rb"),
  mask=open("mask.png", "rb"),
  prompt="A sunlit indoor lounge area with a pool containing a flamingo",
  n=1,
  size="1024x1024"
)
image_url = response.data[0].url

在这里插入图片描述
上传的图片和蒙版必须是大小小于4MB的正方形PNG图片，且尺寸必须相同。在生成输出时不使用遮罩的非透明区域，因此它们不一定需要像上面的例子那样与原始图像匹配。

6 变化(DALL-E2)

图像变化的端点允许你生成一个变化的图片

from openai import OpenAI
client = OpenAI()

response = client.images.create_variation(
  model="dall-e-2",
  image=open("corgi_and_cat_paw.png", "rb"),
  n=1,
  size="1024x1024"
)

image_url = response.data[0].url

在这里插入图片描述
与编辑端点类似，输入图像必须是大小小于4MB的正方形PNG图像。

6 内容审核

提示和图像根据我们的内容策略进行过滤，当提示或图像被标记时返回错误。
特定语言技巧：

使用内存中的图片数据
在上面的Python的案例中使用open读取磁盘上的数据。在其它一些案例中，你可以读取内存中的图片数据。下面是读取内存中字节流的API的例子：

from io import BytesIO
from openai import OpenAI
client = OpenAI()

# This is the BytesIO object that contains your image data
byte_stream: BytesIO = [your image data]
byte_array = byte_stream.getvalue()
response = client.images.create_variation(
  image=byte_array,
  n=1,
  model="dall-e-2",
  size="1024x1024"
)

操作图片数据
在调用API之前你可以需要对图片进行处理，下面是使用PIL库调整图片大小的案例：

from io import BytesIO
from PIL import Image
from openai import OpenAI
client = OpenAI()

# Read the image file from disk and resize it
image = Image.open("image.png")
width, height = 256, 256
image = image.resize((width, height))

# Convert the image to a BytesIO object
byte_stream = BytesIO()
image.save(byte_stream, format='PNG')
byte_array = byte_stream.getvalue()

response = client.images.create_variation(
  image=byte_array,
  n=1,
  model="dall-e-2",
  size="1024x1024"
)

处理错误
由于无效输入、速率限制或其他问题，API请求可能会返回错误。这些错误可以用try来处理。Except语句，错误详细信息可在e.error中找到

import openai
from openai import OpenAI
client = OpenAI()

try:
  response = client.images.create_variation(
    image=open("image_edit_mask.png", "rb"),
    n=1,
    model="dall-e-2",
    size="1024x1024"
  )
  print(response.data[0].url)
except openai.OpenAIError as e:
  print(e.http_status)
  print(e.error)