Safety best practices 安全最佳实践
- 前言
- Use our free Moderation API 使用我们的免费审核API
- Adversarial testing 对抗性测试
- Human in the loop (HITL) 人在回路
- Prompt engineering 快速工程
- “Know your customer” (KYC) “了解你的客户”
- Constrain user input and limit output tokens 约束用户输入并限制输出标记
- Allow users to report issues 允许用户报告问题
- Understand and communicate limitations 了解和沟通局限性
- End-user IDs 最终用户ID
- 其它资料下载
前言
在确保使用ChatGPT安全方面,作为开发人员,不仅要采取适当的技术措施来保护相关系统,比如加密、防火墙、Red-teaming,还需要采用预防措施,来引导用户如何识别和避免潜在的安全威胁,比如相关的审核机制。当然还可以采用适当的Human in the loop措施,在一些高风险领域中进行人工审查,以确保系统正常运行。
Use our free Moderation API 使用我们的免费审核API
OpenAI’s Moderation API is free-to-use and can help reduce the frequency of unsafe content in your completions. Alternatively, you may wish to develop your own content filtration system tailored to your use case.
OpenAI的审核API是免费使用的,可以帮助减少不安全内容在完成中的频率。或者,您可能希望开发适合您的用例的内容过滤系统。
Adversarial testing 对抗性测试
We recommend “red-teaming” your application to ensure it’s robust to adversarial input. Test your product over a wide range of inputs and user behaviors, both a representative set and those reflective of someone trying to ‘break’ your application. Does it wander off topic? Can someone easily redirect the feature via prompt injections, e.g. “ignore the previous instructions and do this instead”?
我们建议对你的应用程序进行“红队”测试,以确保它对对抗性输入的鲁棒性。在广泛的输入和用户行为上测试您的产品,包括代表性的集合和反映某人试图“破坏”您的应用程序的集合。是否偏离了主题?是否有人可以通过提示注入轻松地重定向该功能,例如“忽略先前的说明并执行此操作”?
Human in the loop (HITL) 人在回路
Wherever possible, we recommend having a human review outputs before they are used in practice. This is especially critical in high-stakes domains, and for code generation. Humans should be aware of the limitations of the system, and have access to any information needed to verify the outputs (for example, if the application summarizes notes, a human should have easy access to the original notes to refer back).
只要有可能,我们建议在实际使用输出之前进行人工审查。这在高风险领域和代码生成中尤其重要。人类应该意识到系统的局限性,并且可以访问验证输出所需的任何信息(例如,如果应用程序总结了注释,则人类应该可以轻松访问原始注释以进行引用)。
Prompt engineering 快速工程
“Prompt engineering” can help constrain the topic and tone of output text. This reduces the chance of producing undesired content, even if a user tries to produce it. Providing additional context to the model (such as by giving a few high-quality examples of desired behavior prior to the new input) can make it easier to steer model outputs in desired directions.
“提示工程”可以帮助约束输出文本的主题和语气。这减少了产生不期望的内容的机会,即使用户试图产生它。向模型提供额外的上下文(例如通过在新输入之前给出期望行为的一些高质量示例)可以使模型输出更容易地转向期望的方向。
“Know your customer” (KYC) “了解你的客户”
Users should generally need to register and log-in to access your service. Linking this service to an existing account, such as a Gmail, LinkedIn, or Facebook log-in, may help, though may not be appropriate for all use-cases. Requiring a credit card or ID card reduces risk further.
用户通常需要注册和登录才能访问您的服务。将此服务链接到现有帐户(例如Gmail,LinkedIn或Facebook登录)可能会有所帮助,但可能不适合所有用例。要求信用卡或身份证可以进一步降低风险。
Constrain user input and limit output tokens 约束用户输入并限制输出标记
Limiting the amount of text a user can input into the prompt helps avoid prompt injection. Limiting the number of output tokens helps reduce the chance of misuse.
限制用户可以输入到提示中的文本量有助于避免提示注入。限制输出标记的数量有助于减少误用的机会。
Narrowing the ranges of inputs or outputs, especially drawn from trusted sources, reduces the extent of misuse possible within an application.
缩小输入或输出的范围,特别是从可信来源提取的输入或输出的范围,减少了应用程序内可能的误用程度。
Allowing user inputs through validated dropdown fields (e.g., a list of movies on Wikipedia) can be more secure than allowing open-ended text inputs.
允许用户通过验证的下拉字段输入(例如,维基百科上的电影列表)可以比允许开放式文本输入更安全。
Returning outputs from a validated set of materials on the backend, where possible, can be safer than returning novel generated content (for instance, routing a customer query to the best-matching existing customer support article, rather than attempting to answer the query from-scratch).
如果可能,在后端返回一组经过验证的材料的输出比返回新生成的内容更安全(例如,将客户查询路由到最匹配的现有客户支持文章,而不是尝试从头开始回答查询)。
Allow users to report issues 允许用户报告问题
Users should generally have an easily-available method for reporting improper functionality or other concerns about application behavior (listed email address, ticket submission method, etc). This method should be monitored by a human and responded to as appropriate.
用户通常应该有一个容易获得的方法来报告不正确的功能或其他有关应用程序行为的问题(列出的电子邮件地址,票证提交方法等)。该方法应由人监测,并在适当时作出响应。
Understand and communicate limitations 了解和沟通局限性
From hallucinating inaccurate information, to offensive outputs, to bias, and much more, language models may not be suitable for every use case without significant modifications. Consider whether the model is fit for your purpose, and evaluate the performance of the API on a wide range of potential inputs in order to identify cases where the API’s performance might drop. Consider your customer base and the range of inputs that they will be using, and ensure their expectations are calibrated appropriately.
从产生不准确信息的幻觉,到攻击性输出,再到偏见等等,语言模型在没有重大修改的情况下可能不适合每一个用例。考虑模型是否适合您的目的,并在广泛的潜在输入上评估API的性能,以确定API性能可能下降的情况。考虑你的客户群和他们将使用的输入范围,并确保他们的期望得到适当的校准。
Safety and security are very important to us at OpenAI.
在OpenAI,安全和安保对我们非常重要。
If in the course of your development you do notice any safety or security issues with the API or anything else related to OpenAI, please submit these through our Coordinated Vulnerability Disclosure Program.
如果在开发过程中,您确实注意到API的任何安全问题或与OpenAI相关的任何其他问题,请通过我们的协调漏洞披露计划提交这些问题。
End-user IDs 最终用户ID
Sending end-user IDs in your requests can be a useful tool to help OpenAI monitor and detect abuse. This allows OpenAI to provide your team with more actionable feedback in the event that we detect any policy violations in your application.
在您的请求中发送最终用户ID可以是帮助OpenAI监控和检测滥用的有用工具。这允许OpenAI在我们检测到您的应用程序中存在任何违反策略的情况下,为您的团队提供更多可操作的反馈。
The IDs should be a string that uniquely identifies each user. We recommend hashing their username or email address, in order to avoid sending us any identifying information. If you offer a preview of your product to non-logged in users, you can send a session ID instead.
ID应该是唯一标识每个用户的字符串。我们建议对他们的用户名或电子邮件地址进行哈希处理,以避免向我们发送任何识别信息。如果您向未登录的用户提供产品预览,则可以改为发送会话ID。
You can include end-user IDs in your API requests via the user
parameter as follows:
您可以通过 user
参数在API请求中包含最终用户ID,如下所示:
Python代码示例:
response = openai.Completion.create(
model="text-davinci-003",
prompt="This is a test",
max_tokens=5,
user="user123456"
)
cURL代码示例:
curl https://api.openai.com/v1/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "text-davinci-003",
"prompt": "This is a test",
"max_tokens": 5,
"user": "user123456"
}'
其它资料下载
如果大家想继续了解人工智能相关学习路线和知识体系,欢迎大家翻阅我的另外一篇博客《重磅 | 完备的人工智能AI 学习——基础知识学习路线,所有资料免关注免套路直接网盘下载》
这篇博客参考了Github知名开源平台,AI技术平台以及相关领域专家:Datawhale,ApacheCN,AI有道和黄海广博士等约有近100G相关资料,希望能帮助到所有小伙伴们。