跟TED演讲学英文:A new way to build AI, openly by Percy Liang

news2025/2/25 17:34:56

A new way to build AI, openly

在这里插入图片描述

Link: https://www.ted.com/talks/percy_liang_a_new_way_to_build_ai_openly?

Speaker: Percy Liang

Date: October 2023

文章目录

  • A new way to build AI, openly
    • Introduction
    • Vocabulary
    • Transcript
    • Summary
    • 后记

Introduction

Today’s AI is trained on the work of artists and writers without attribution, its core values decided by a privileged few. What if the future of AI was more open and democratic? Researcher Percy Liang offers a vision of a transparent, participatory future for emerging technology, one that credits contributors and gives everyone a voice.

今天的人工智能是在没有归属的艺术家和作家的作品上接受训练的,其核心价值由少数特权阶层决定。如果人工智能的未来更加开放和民主会怎样?研究员Percy Liang为新兴技术提供了一个透明、参与性的未来愿景,一个表彰贡献者并给予每个人发言权的愿景。

Vocabulary

participatory: 美 [pɑːrˈtɪsəpətɔːri] 参与性的

core value:核心价值

intrigue:美 [ɪnˈtriːɡ] 引起xxx的好奇心;耍阴谋

I was intrigued, I wanted to understand it, I wanted to see how far we could go with this.我很感兴趣,我想了解它,我想看看我们能走多远。

enter the mainstream:跻身主流,成为主流

Language models and more generally, foundation models, have taken off and entered the mainstream. 语言模型和更一般的基础模型已经起飞并进入主流。

ensemble:美 [ɑːnˈsɑːmbl] 乐团,剧团: jazz ensemble 爵士乐合奏团, 注意发音

It was like a jazz ensemble where everyone was riffing off of each other, developing the technology that we have today. 这就像一个爵士乐合奏团,每个人都在即兴表演,发展我们今天拥有的技术。

not released openly: 没有开源

recipe:注意发音 美 [ˈresəpi] 烹饪法;食谱

And then today, the most advanced foundation models in the world are not released openly. They are instead guarded closely behind black box APIs with little to no information about how they’re built. So it’s like we have these castles which house the world’s most advanced AIs and the secret recipes for creating them. 然后今天,世界上最先进的基础模型没有公开发布。相反,它们被严密保护在黑盒API之后,几乎没有关于它们是如何构建的信息。这就像我们有这些城堡,里面有世界上最先进的人工智能和创造它们的秘方。

asymmetry: 不对称

stark: 明显的

but the resource and information asymmetry is stark. 但是资源和信息的不对称是明显的。

opacity:美 [oʊˈpæsədi] 不透明,晦涩,难懂

This opacity and centralization of power is concerning. 这种不透明和权力集中令人担忧。

tenet:美 [ˈtenɪt] 原则,信条

The most basic tenet of machine learning is that the training data and the test data have to be independent for evaluation to be meaningful. So if we don’t know what’s in the training data, then that 95 percent number is meaningless. 机器学习的最基本原则是训练数据和测试数据必须独立,评估才有意义。因此,如果我们不知道训练数据中有什么,那么95%的数字就没有意义。

we are flying blind.

accountability: 有责任,责任制

And with all the enthusiasm to deploying these models in the real world without meaningful evaluation, we are flying blind. And transparency isn’t just about the training data or evaluation. It’s also about environmental impact, labor practices, release processes, risk mitigation strategies. Without transparency, we lose accountability. 尽管我们满怀热情地在现实世界中部署这些模型,但却没有进行有意义的评估,这无疑是盲目的。透明度不仅仅是关于训练数据或评估。它还涉及环境影响、劳工实践、发布流程、风险缓解策略。没有透明度,我们就失去了问责制。

affirmative action

Affirmative action (also sometimes called reservations, alternative access, positive discrimination or positive action in various countries’ laws and policies)[1][2][3][4][5][6][7] refers to a set of policies and practices within a government or organization seeking to benefit marginalized groups. Historically and internationally, support for affirmative action has been justified by the idea that it may help with bridging inequalities in employment and pay, increasing access to education, and promoting diversity, social equity and redressing alleged wrongs, harms, or hindrances, also called substantive equality.[8]

subjective,controversial,contested questions

These are highly subjective, controversial, contested questions, and any decision on how to answer them is necessarily value-laden.这些都是高度主观的、有争议的、有争议的问题,任何关于如何回答这些问题的决定都必然是基于价值(观)的。

without attribution or consent:没有归属或者未经同意

The data here is a result of human labor, and currently this data is being scraped, often without attribution or consent. 这里的数据是人类劳动的结果,目前这些数据正在被爬取,通常没有归属或同意。

status quo:现状,美 [ˌsteɪtəs ˈkwoʊ]

So how can we change the status quo? 我们如何才能改变现状?

bleak:美 [bliːk] 凄凉的,暗淡的

situation seems pretty bleak:情况看起来相当惨淡。

With these castles,the situation might seem pretty bleak. But let me try to give you some hope.

encyclopedia:美 [ɪnˌsaɪkləˈpiːdiə] 百科全书, 注意发音

against all odds:尽管很困难,排除万难

But against all odds, Wikipedia prevailed. 但尽管困难重重,维基百科还是流行开来。

hobbyist:美 [ˈhɑbiɪst] 业余爱好者

peer production:对等生产

Peer production (also known as mass collaboration) is a way of producing goods and services that relies on self-organizing communities of individuals. In such communities, the labor of many people is coordinated towards a shared outcome.

embark on:开始从事,着手

I feel the same excitement about this vision as I did 19 years ago as that master’s student, embarking on his first NLP research project. 我对这个愿景感到兴奋,就像我19年前作为那个硕士生开始他的第一个NLP研究项目时一样。

Transcript

I was a young masters student

about to start my first
NLP research project,

and my task was to train a language model.

Now that language model was a little bit
smaller than the ones we have today.

It was trained on millions
rather than trillions of words.

I used a hidden Markov model
as opposed to a transformer,

but that little language model I trained

did something I thought was amazing.

It took all this raw text

and somehow it organized it into concepts.

A concept for months,

male first names,

words related to the law,

countries and continents and so on.

But no one taught
these concepts to this model.

It discovered them all by itself,
just by analyzing the raw text.

But how?

I was intrigued,
I wanted to understand it,

I wanted to see how far
we could go with this.

So I became an AI researcher.

In the last 19 years,

we have come a long way
as a research community.

Language models and more generally,
foundation models, have taken off

and entered the mainstream.

But, it is important to realize
that all of these achievements

are based on decades of research.

Research on model architectures,

research on optimization algorithms,
training objectives, data sets.

For a while,

we had an incredible free culture,

a culture of open innovation,

a culture where researchers published,

researchers released data sets, code,

so that others can go further.

It was like a jazz ensemble where everyone
was riffing off of each other,

developing the technology
that we have today.

But then in 2020,

things started changing.

Innovation became less open.

And then today, the most advanced
foundation models in the world

are not released openly.

They are instead guarded closely
behind black box APIs

with little to no information
about how they’re built.

So it’s like we have these castles

which house the world’s most advanced AIs

and the secret recipes for creating them.

Meanwhile, the open community
still continues to innovate,

but the resource and information
asymmetry is stark.

This opacity and centralization
of power is concerning.

Let me give you three reasons why.

First, transparency.

With closed foundation models,
we lose the ability to see,

to evaluate, to audit these models

which are going to impact
billions of people.

Say we evaluate a model through an API
on medical question answering

and it gets 95 percent accuracy.

What does that 95 percent mean?

The most basic tenet of machine learning

is that the training data
and the test data

have to be independent
for evaluation to be meaningful.

So if we don’t know
what’s in the training data,

then that 95 percent
number is meaningless.

And with all the enthusiasm
to deploying these models

in the real world
without meaningful evaluation,

we are flying blind.

And transparency isn’t just
about the training data or evaluation.

It’s also about environmental impact,

labor practices, release processes,

risk mitigation strategies.

Without transparency,
we lose accountability.

It’s like not having nutrition labels
on the food you eat,

or not having safety ratings
on the cars you drive.

Fortunately, the food and auto industries
have matured over time,

but AI still has a long way to go.

Second, values.

So model developers like to talk
about aligning foundation models

to human values,
which sounds wonderful.

But whose values
are we talking about here?

If we were just building a model
to answer math questions,

maybe we wouldn’t care,

because as long as the model
produces the right answer,

we would be happy,
just as we’re happy with calculators.

But these models are not calculators.

These models will attempt to answer
any question you throw it.

Who is the best basketball
player of all time?

Should we build nuclear reactors?

What do you think of affirmative action?

These are highly subjective,
controversial, contested question,

and any decision on how to answer them
is necessarily value laden.

And currently, these values
are unilaterally decided

by the rulers of the castles.

So can we imagine
a more democratic process

for determining these values
based on the input from everybody?

So foundation models will be the primary
way that we interact with information.

And so determining these values
and how we set them

will have a sweeping impact

on how we see the world and how we think.

Third, attribution.

So why are these foundation
models so powerful?

It’s because they’re trained
on massive amounts of data.

See what machine-learning
researchers call data

is what artists call art

or writers call books

or programers call software.

The data here is a result of human labor,

and currently this data is being scraped,

often without attribution or consent.

So understandably, some people are upset,

filing lawsuits, going on strike.

But this is just an indication
that the incentive system is broken.

And in order to fix it,
we need to center the creators.

We need to figure out
how to compensate them

for the value of the content
they produced,

and how to incentivize them
to continue innovating.

Figuring this out
will be critical to sustaining

the long term development of AI.

So here we are.

We don’t have transparency
about how the models are being built.

We have to live with a fixed values
set by the rulers of the castles,

and we have no means of attributing

the creators who make
foundation models possible.

So how can we change the status quo?

With these castles,

the situation might seem pretty bleak.

But let me try to give you some hope.

In 2001,

Encyclopedia Britannica was a castle.

Wikipedia was an open experiment.

It was a website
where anyone could edit it,

and all the resulting knowledge
would be made freely available

to everyone on the planet.

It was a radical idea.

In fact, it was a ridiculous idea.

But against all odds, Wikipedia prevailed.

In the '90s, Microsoft
Windows was a castle.

Linux was an open experiment.

Anyone could read its source code,
anyone could contribute.

And over the last two decades,

Linux went from being a hobbyist toy

to the dominant operating system
on mobile and in the data center.

So let us not underestimate
the power of open source

and peer production.

These examples show us a different way
that the world could work.

A world in which everyone can participate

and development is transparent.

So how can we do the same for AI?

Let me end with a picture.

The world is filled
with incredible people:

artists, musicians, writers, scientists.

Each person has unique skills,
knowledge and values.

Collectively, this defines
the culture of our civilization.

And the purpose of AI, as I see it,

should be to organize
and augment this culture.

So we need to enable people to create,
to invent, to discover.

And we want everyone to have a voice.

The research community has focused
so much on the technical progress

that is necessary to build these models,

because for so long,
that was the bottleneck.

But now we need to consider
the social context

in which these models are built.

Instead of castles,

let us imagine a more transparent
and participatory process for building AI.

I feel the same excitement
about this vision

as I did 19 years ago
as that masters student,

embarking on his first
NLP research project.

But realizing this vision will be hard.

It will require innovation.

It will require participation
of researchers, companies, policymakers,

and all of you

to not accept the status quo as inevitable

and demand a more participatory
and transparent future for AI.

Thank you.

(Applause)

Summary

The speaker’s manuscript outlines his journey from a young master’s student working on his first NLP research project in 2004 to becoming an AI researcher. He highlights the significant advancements made by the research community over the last 19 years, particularly in language and foundation models. However, he expresses concerns about the recent trend towards less open innovation, with advanced models now hidden behind closed APIs. This shift raises issues of transparency, values, and attribution in AI development.

The speaker emphasizes the importance of transparency in evaluating and auditing models, as well as the need to consider whose values are embedded in these models. He also discusses the lack of attribution and consent in the data used to train these models, calling attention to the broken incentive system in AI development.

To address these challenges, the speaker advocates for a more open and participatory approach to AI development, citing the success of projects like Wikipedia and Linux. He believes that by embracing open source and peer production principles, the AI community can create a more transparent and inclusive future for AI development.

演讲者的手稿概述了他从2004年作为年轻的硕士生开始进行他的第一个自然语言处理研究项目,到成为人工智能研究员的旅程。他强调了过去19年来研究界取得的重大进展,特别是在语言和基础模型方面。然而,他对最近向较少开放创新的趋势表示担忧,因为现在先进的模型都隐藏在封闭的API背后。这种转变引发了AI开发中透明度、价值观和归因的问题。

演讲者强调了在评估和审计模型时透明度的重要性,以及需要考虑到这些模型中嵌入的价值观。他还讨论了在训练这些模型所使用的数据中缺乏归因和同意,引起了人工智能开发中破碎的激励机制的关注。

为了解决这些挑战,演讲者主张采取更开放和参与式的人工智能开发方式,引用了维基百科和Linux等项目的成功。他认为,通过 embracing开源和peer production原则,AI社区可以为AI开发创造一个更透明和包容的未来。

后记

2024年4月10日19点17分写于上海市。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1585677.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

HiSilicon352 android9.0 系统显示方向旋转与截屏问题分析

一,系统显示方向 1. 概述 Android的旋转显示,主要运用于广告机。Android的旋转,包括图形UI的旋转、鼠标和遥控器的旋转及媒体旋转。 下图为竖屏UI的绘制坐标系和显示坐标系。 2. 功能说明 方案依据Android原生的旋转原理设计&#xff0c…

“安全边际大师”卡拉曼2023年珍贵访谈:如果视市场为狂躁的交易对手,那你就能利用反复无常来获利

“对于那些被广泛跟踪的股票,如果你不比其他人更聪明,且你的观点与别人无异,你是赚不到钱的。” “足够大的折价也许可以抵消你对它缺乏最深刻的认识。” “拓宽投资视野不仅诱人,也可能价值连城。” “如果向市场寻求答案&…

JAVA面试八股文之数据库

MySQL面试题 MySQL 存储引擎架构了解吗?CHAR 和 VARCHAR 的区别是什么?索引是越多越好嘛?MySQL数据库中空值(null)和空字符串()的区别?SQL 中 on 条件与 where 条件的区别&#xff1…

Java创建对象内存分析-JVM

Java 创建对象的内存分析-JVM 复习的时候看到这篇,看完自己背着画了一下。 https://blog.csdn.net/qq_60264381/article/details/119276824

进军俄罗斯市场,一站式Yandex广告开户与代运营服务

俄罗斯作为欧洲最大的经济体之一,拥有庞大的消费者群体和独特的市场环境,成为了众多国际商家关注的焦点。要有效地触达这片潜力无限的市场,精准且高效的网络营销策略至关重要。利用Yandex——俄罗斯最大的搜索引擎与数字广告平台,…

【LAMMPS学习】八、基础知识(1.3)从一个输入脚本运行多个模拟

8. 基础知识 此部分描述了如何使用 LAMMPS 为用户和开发人员执行各种任务。术语表页面还列出了 MD 术语,以及相应 LAMMPS 手册页的链接。 LAMMPS 源代码分发的 examples 目录中包含的示例输入脚本以及示例脚本页面上突出显示的示例输入脚本还展示了如何设置和运行各…

ELK——日志处理界的瑞士军刀

目录 引言 一、ELK简介 (一)基本概述 1.Elasticsearch服务 2.Logstash服务 2.2 logstash关键组件 2.2 logstash数据流向 3.Kibana服务 (二)ELK工作流程 (三)ELK的应用价值 二、部署搭建ELK &…

如何使用Android手机通过JuiceSSH远程访问本地Linux服务器

文章目录 1. Linux安装cpolar2. 创建公网SSH连接地址3. JuiceSSH公网远程连接4. 固定连接SSH公网地址5. SSH固定地址连接测试 处于内网的虚拟机如何被外网访问呢?如何手机就能访问虚拟机呢? cpolarJuiceSSH 实现手机端远程连接Linux虚拟机(内网穿透,手机端连接Linux虚拟机) …

Cuda编程-NPP库

Cuda编程先前有过研究,现在记录下Cuda相关的库使用 目录 0.参考文档1.NPP简介1.1 头文件1.2 库文件1.3 编译时链接关系1.4 NPP函数的命名方式1.5 General Conventions 一般约定1.6 Image Processing Conventions 图像处理约定 2.举例:NPP实现YUV转BGR2.1…

从 auto 到 Lambda:全面解析 C++11 核心新特性

在介绍 C11 之前,我们先回顾一下 C98和C03。C98 作为 C 的第一个国际标准,奠定了这门语言的基础结构和核心特性,比如类、继承、模板、异常处理等。这些特性使得 C 成为一门强大的、面向对象的编程语言,广泛应用于系统/应用软件、游…

vue3中ElMessage如何动态更改提示消息

1.需求 要求ElMessage的提示消息做成倒计时的效果 2.效果 3.实现代码 function shower(){const message ElMessage({type: warning, // 提示类型dangerouslyUseHTMLString: true, // 使用 HTML 片段作为正文内容message: <div id"kanno"><span>不敢吃…

js 数组 按列循环二维数组

期待效果&#xff1a; 核心代码&#xff1a; //js function handle(array) {var result [];for (let i 0; i < array[0].length; i) {var item []; for (let j 0; j < array.length; j) {item.push(array[j][i])} result.push(item);} return result; } 运行代码&a…

数据库体系概述:详述其基本概念、多样分类、关键作用及核心特性

数据库是一个用于存储、管理和检索数据的系统&#xff0c;它按照特定的数据结构和模式组织数据&#xff0c;确保数据的一致性、安全性和高效访问。 数据库&#xff08;Database, DB&#xff09;是一个长期存储在计算机内&#xff0c;用来组织、存储和管理大量数据的集合。数据…

【LAMMPS学习】八、基础知识(1.8)键的断裂

8. 基础知识 此部分描述了如何使用 LAMMPS 为用户和开发人员执行各种任务。术语表页面还列出了 MD 术语&#xff0c;以及相应 LAMMPS 手册页的链接。 LAMMPS 源代码分发的 examples 目录中包含的示例输入脚本以及示例脚本页面上突出显示的示例输入脚本还展示了如何设置和运行各…

备战蓝桥杯---DP刷题3

1.博弈区间DP&#xff1a; 当游戏轮到A时 &#xff0c;它可以选左右两点&#xff0c;而他的目标就是让A-B最大&#xff0c;此时因为对手也是最优策略&#xff0c;因此我们要在最坏的情况下拓展&#xff0c;即应该选(L,R-1)(L1,R)上的max,答案为其相反数端点值&#xff0c;然后对…

计算机视觉 | 基于二值图像数字矩阵的距离变换算法

Hi&#xff0c;大家好&#xff0c;我是半亩花海。本实验基于 OpenCV 实现了二值图像数字矩阵的距离变换算法。首先生成一个 480x480 的黑色背景图像&#xff08;定义黑色为0&#xff0c;白色为1&#xff09;&#xff0c;在其中随机选择了三个白色像素点作为距离变换的原点&…

微服务学习2

目录 一.网关路由 1.1.认识网关 1.2网关快速入门 1.2.1.创建项目 1.2.2.引入依赖 1.2.3.启动类 1.2.4.配置路由 1.3.路由过滤 二.网关登录校验 2.1网关请求处理流程 2.2网关过滤器 2.2.2网关过滤器 2.3自定义GlobalFilter 2.4.登录校验 2.4.1.JWT工具 2.4.2.登…

中文自然语言处理流程

这是博主自己根据网上资料进行整理的&#xff0c;希望对你有所帮助~

openlayer实现webgis端绘制制图及编辑

在WebGIS端制图是指通过Web浏览器界面实现地理信息数据的可视化、编辑、分析以及地图产品的制作。这一过程通常涉及以下几个关键环节&#xff1a; **1. 前端技术栈&#xff1a; •HTML/CSS/JavaScript&#xff1a;作为Web开发的基础&#xff0c;用于构建用户界面布局、样式设…

【线段树】2213. 由单个字符重复的最长子字符串

算法可以发掘本质&#xff0c;如&#xff1a; 一&#xff0c;若干师傅和徒弟互有好感&#xff0c;有好感的师徒可以结对学习。师傅和徒弟都只能参加一个对子。如何让对子最多。 二&#xff0c;有无限多1X2和2X1的骨牌&#xff0c;某个棋盘若干格子坏了&#xff0c;如何在没有坏…