跟TED演讲学英文:The next grand challenge for AI by Jim Fan

news2025/1/26 15:48:01

The next grand challenge for AI

在这里插入图片描述

Link: https://www.ted.com/talks/jim_fan_the_next_grand_challenge_for_ai?

Speaker: Jim Fan

Date: October 2023

文章目录

  • The next grand challenge for AI
    • Introduction
    • Vocabulary
    • Transcript
    • Summary
    • 后记

Introduction

Researcher Jim Fan presents the next grand challenge in the quest for AI: the “foundation agent,” which would seamlessly operate across both the virtual and physical worlds. He explains how this technology could fundamentally change our lives — permeating everything from video games and metaverses to drones and humanoid robots — and explores how a single model could master skills across these different realities.

研究员Jim Fan介绍了人工智能探索中的下一个重大挑战:“基础agent”,它将在虚拟和现实世界中无缝运行。他解释了这项技术如何从根本上改变我们的生活——渗透到从视频游戏和元宇宙到无人机和人形机器人的所有东西——并探索了一个模型如何掌握这些不同现实的技能。

Vocabulary

grand challenge:重大挑战

permeate:美 [ˈpɜːrmieɪt] 渗透,弥漫

in the quest for:在寻求

It is still early days in the quest for global power of the Chinese carmaker.

中国汽车制造商对全球影响力的争夺目前仍处于初期阶段.

www.24en.com

In the quest for self-expression, composers were imbued with a concern for detail.

为追求自我表现,本时期的作曲家极为注重细节。

www.tdict.com

Perhaps his greatest joy is in the chance encounters during the quest for the perfect image.

也许,在现实中寻找令他耿耿于心的心像并终于与之邂逅的过程,才是他最大的喜悦。

dict.youdao.com

She does gymnastic exercises four times a week in the quest for achieving the perfect figure.

为练成完美的体型,她每周做四次健身操.

wk.baidu.com

humanoid:英 [ˈhjuːmənɔɪd] 类人的,人形的

humanoid robot:人形机器人

metaverse:元宇宙

adrenaline:美 [əˈdrenəlɪn] 肾上腺素

I still remember the adrenaline of seeing history unfold that day. 我仍然记得那天看到历史展开时的肾上腺素。

embodiment:美 [ɪmˈbɑːdimənt] 具体形象;化身;具体表现;体现

It was in Germany alone that his hope seemed capable of embodiment.

似乎只有在德国他的希望才能得到体现。

牛津词典

Kofi is the embodiment of possibility.

Kofi身上体现了一种可能性。

www.kekenet.com

But it is perfect embodiment of modern styles.

是现代时尚风格的完美体现。

bbs.chinadaily.com.cn

A circle was the embodiment of his concept of life.

圈子是他生活理念的具体体现.

dict.engbus.cn

terrain:美 [təˈreɪn] 地形,地带

tree of skills: 技能树

It can explore the terrains, mine all kinds of materials, fight monsters, craft hundreds of recipes, and unlock an ever-expanding tree of skills. 它可以探索地形,开采各种材料,与怪物战斗,制作数百种食谱,并解锁不断扩展的技能树。

indefinitely: 美 [ɪnˈdefɪnətli] 无限期地

how does Voyager keep exploring indefinitely? 旅行者号是如何无限期地继续探索的?

kinematic:美 [ˌkɪnə’mætɪk] 运动学的;运动学上的

Metamorph is able to handle extremely varied kinematic characteristics from different robot bodies. Metamorph能够处理来自不同机器人主体的极其不同的运动学特征。

envision:美 [ɪnˈvɪʒn] 想象,设想

take a big stride:迈出一大步

The speaker envisions that MetaMorph 2.0 will be able to generalize to robot hands, humanoids, dogs, drones, and even beyond. Compared to Voyager, MetaMorph takes a big stride towards multi-body control. 演讲者设想MetaMorph 2.0将能够推广到机器人手,人形机器人,狗,无人机,甚至更远。与旅行者号相比,MetaMorph向多体控制迈出了一大步。

uncanny:奇怪的;神秘的;怪异的

And this car racing scene is where simulation has crossed the uncanny valley. 这个赛车场景是模拟穿越鬼谷的地方。

hardware accelerated ray tracing:硬件加速光线追踪

render extremely complex scenes: 渲染极其复杂的场景

photorealism: 美 [ˌfoʊdoʊˈri(ə)lɪzəm] 摄影写实主义;照相现实主义;超级现实主义

Thanks to hardware accelerated ray tracing, we’re able to render extremely complex scenes with breathtaking levels of details. And this photorealism you see here will help us train computer vision models that will become the eyes of every AI agent. 由于硬件加速光线跟踪,我们能够以惊人的细节水平渲染极其复杂的场景。你在这里看到的照片真实感将帮助我们训练计算机视觉模型,这些模型将成为每个AI智能体的眼睛。

be it xxx, or xxx:无论xxx还是xxx

All language tasks can be expressed as text in and text out. Be it writing poetry, translating English to Spanish, or coding Python, it’s all the same. 所有的语言任务都可以表示为文本输入和文本输出。无论是写诗、将英语翻译成西班牙语,还是编写Python代码,都是一样的。

Transcript

In spring of 2016,

I was sitting in a classroom
at Columbia University

but wasn’t paying attention
to the lecture.

Instead, I was watching a board game
tournament on my laptop.

And it wasn’t just any tournament,
but a very, very special one.

The match was between AlphaGo
and Lee Sedol.

The AI had just won three
out of five games

and became the first ever to beat
a human champion at a game of Go.

I still remember the adrenaline
of seeing history unfold that day.

The [glorious] moment when AI agents
finally entered the mainstream.

But when the excitement fades,

I realized that as mighty as AlphaGo was,

it could only do one thing
and one thing alone.

It isn’t able to play any other games,
like Super Mario or Minecraft,

and it certainly cannot do dirty laundry

or cook a nice dinner for you tonight.

But what we truly want
are AI agents as versatile as Wall-E,

as diverse as all the robot body forms

or embodiments in Star Wars

and works across infinite realities,

virtual or physical,
as in Ready Player One.

在这里插入图片描述

So how can we achieve
these science fictions

in possibly the near future?

This is a practitioner’s guide
towards generally capable AI agents.

Most of the ongoing research efforts
can be laid out nicely across three axes:

the number of skills an agent can do;

the body forms or embodiments
it can control;

and the realities it can master.

AlphaGo is somewhere here,

but the upper right corner
is where we need to go.

So let’s take it one axis at a time.

Earlier this year,
I led the Voyager project,

which is an agent that scales up massively
on a number of skills.

And there’s no game better than Minecraft

for the infinite creative
things it supports.

And here’s a fun fact for all of you.

Minecraft has 140 million active players.

And just to put that number
in perspective,

it’s more than twice
the population of the UK.

And Minecraft is so insanely popular
because it’s open-ended:

it does not have a fixed storyline
for you to follow,

and you can do whatever
your heart desires in the game.

And when we set Voyager free in Minecraft,

we see that it’s able to play
the game for hours on end

without any human intervention.

The video here shows snippets

from a single episode of Voyager
where it just keeps going.

It can explore the terrains,

mine all kinds of materials,
fight monsters,

craft hundreds of recipes

and unlock an ever-expanding
tree of skills.

So what’s the magic?

The core insight is coding as action.

First, we convert the 3D world
into a textual representation

using a Minecraft JavaScript API
made by the enthusiastic community.

Voyager invokes GPT4 to write
code snippets in JavaScript

that become executable skills in the game.

Yet, just like human engineers,
Voyager makes mistakes.

It isn’t always able to get a program
correct on the first try.

So we add a self-reflection
mechanism for it to improve.

There are three sources of feedback
for the self-reflection:

the JavaScript code execution error;

the agent state, like health and hunger;

and a world state, like terrains
and enemies nearby.

So Voyager takes an action,

observes the consequences of its action
on the world and on itself,

reflects on how it can possibly do better,

[tries] out some new action plans
and rinse and repeat.

And once the skill becomes mature,

Voyager saves it to a skill library
as a persistent memory.

You can think of the skill library
as a code repository

written entirely by a language model.

And in this way,

Voyager is able to bootstrap
its own capabilities recursively

as it explores
and experiments in Minecraft.

So let’s work through an example together.

Voyager finds itself hungry

and needs to get food as soon as possible.

It senses four entities nearby:

a cat, a villager, a pig
and some wheat seeds.

Voyager starts an inner monologue.

"Do I kill the cat or villager for food?

Horrible idea.

How about a wheat seed?

I can grow a farm out of the seeds,

but that’s going to take a long time.

So sorry, piggy, you are the chosen one."

(Laughter)

And Voyager finds a piece
of iron in its inventory.

So it recalls an old skill
from the library to craft an iron sword

and starts to learn
a new skill called “hunt pig.”

And now we also know that, unfortunately,
Voyager isn’t vegetarian.

(Laughter)

One question still remains:

how does Voyager keep
exploring indefinitely?

We only give it a high-level directive,

that is, to obtain as many
unique items as possible.

And Voyager implements a curriculum
to find progressively harder

and more novel challenges
to solve all by itself.

And putting all of these together,

Voyager is able to not only master

but also discover new skills
along the way.

在这里插入图片描述

And we did not pre-program any of this.

It’s all Voyager’s idea.

And this, what you see here,
is what we call lifelong learning.

When an agent is forever curious
and forever pursuing new adventures.

Compared to AlphaGo,

Voyager scales up massively
on a number of things he can do,

but still controls only one
body in Minecraft.

So the question is:
can we have an algorithm

that works across many different bodies?

Enter MetaMorph.

It is an initiative
I co-developed at Stanford.

We created a foundation model
that can control not just one

but thousands of robots

with very different
arm and leg configurations.

Metamorph is able to handle extremely
varied kinematic characteristics

from different robot bodies.

And this is the intuition
on how we create a MetaMorph.

First, we design a special vocabulary

to describe the body parts

so that every robot body
is basically a sentence

written in the language
of this vocabulary.

And then we just apply
a transformer to it,

much like ChatGPT,

but instead of writing out text,
MetaMorph writes out motor controls.

We show that MetaMorph is able to control
thousands of robots to go upstairs,

cross difficult terrains
and avoid obstacles.

Extrapolating into the future,

if we can greatly expand
this robot vocabulary,

I envision MetaMorph 2.0 will be able
to generalize to robot hands, humanoids,

dogs, drones and even beyond.

Compared to Voyager,

MetaMorph takes a big stride
towards multi-body control.

在这里插入图片描述

And now, let’s take everything
one level further

and transfer the skills
and embodiments across realities.

Enter IsaacSim,
Nvidia’s simulation effort.

The biggest strength of IsaacSim
is to accelerate physics simulation

to 1,000x faster than real time.

For example,

this character here learns
some impressive martial arts

by going through ten years
of intense training

in only three days of simulation time.

So it’s very much like the virtual
sparring dojo in the movie “Matrix.”

And this car racing scene

is where simulation has crossed
the uncanny valley.

Thanks to hardware
accelerated ray tracing,

we’re able to render
extremely complex scenes

with breathtaking levels of details.

And this photorealism you see here
will help us train computer vision models

that will become the eyes
of every AI agent.

And what’s more, IsaacSim
can procedurally generate worlds

with infinite variations
so that no two look the same.

So here’s an interesting idea.

If an agent is able to master
10,000 simulations,

then it may very well just generalize
to our real physical world,

which is simply the 10,001st reality.

And let that sink in.

As we progress through this map,

we will eventually get
to the upper right corner,

which is a single agent that generalizes
across all three axes,

and that is the “Foundation Agent.”

在这里插入图片描述

I believe training Foundation Agent
will be very similar to ChatGPT.

All language tasks can be expressed
as text in and text out.

Be it writing poetry,

translating English to Spanish
or coding Python,

it’s all the same.

And ChatGPT simply scales this up
massively across lots and lots of data.

It’s the same principle.

The Foundation Agent takes as input
an embodiment prompt and a task prompt

and output actions,

and we train it by simply
scaling it up massively

across lots and lots of realities.

I believe in a future where everything
that moves will eventually be autonomous.

And one day we will realize
that all the AI agents,

across Wall-E, Star Wars,
Ready Player One,

no matter if they are
in the physical or virtual spaces,

will all just be different prompts
to the same Foundation Agent.

And that, my friends,

will be the next grand challenge
in our quest for AI.

(Applause)

Summary

In spring of 2016, the speaker was sitting in a classroom at Columbia University but wasn’t paying attention to the lecture. Instead, he was watching a board game tournament on his laptop, a very special one between AlphaGo and Lee Sedol. AlphaGo had just made history by beating a human champion at the game of Go, winning three out of five games. The adrenaline of witnessing this historic moment marked the entry of AI agents into the mainstream.

After the excitement faded, the speaker realized that as mighty as AlphaGo was, it could only play Go and nothing else. The vision for AI agents that we truly want is to be as versatile as Wall-E, capable of diverse actions across infinite realities. To achieve this, research efforts are focused on three axes: the number of skills an agent can perform, the body forms or embodiments it can control, and the realities it can master.

Taking one axis at a time, progress is being made. The Voyager project, which he led, demonstrated an agent’s ability to scale up massively in the number of skills it can perform, using Minecraft as a platform for its diverse actions. Voyager’s core insight is “coding as action,” where it converts the 3D world into a textual representation, uses GPT-4 to write executable skills in JavaScript, and employs self-reflection mechanisms for improvement.

MetaMorph, another initiative, aims to control thousands of robots with varied configurations. By designing a special vocabulary to describe robot body parts and applying a transformer model to generate motor controls, MetaMorph demonstrates the potential for multi-body control. IsaacSim, Nvidia’s simulation effort, accelerates physics simulations to enable rapid skill acquisition in virtual environments, bridging the gap between virtual and physical realities.

The ultimate goal is to develop a Foundation Agent that can generalize across all three axes, mastering diverse skills, controlling various bodies, and understanding multiple realities. This agent, trained on massive amounts of data and across numerous realities, represents the next grand challenge in the quest for AI.

后记

2024年4月11日20点19分于上海。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1586603.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

MySQL事务、主从、分库分表常见面试题

文章目录 1.事务的特性2.并发事务问题,如何解决,默认隔离级别3.undo log和redo log的区别4.事务中的隔离性是如何保证的(解释一下MVCC)5.主从同步原理6.分库分表 1.事务的特性 2.并发事务问题,如何解决,默认…

2022年蓝桥杯省赛软件类C/C++B组----积木画

想借着这一个题回顾一下动态规划问题的基本解法,让解题方法清晰有条理,希望更多的人可以更轻松的理解动态规划! 目录 【题目】 【本题解题思路】 【DP模版】 总体方针: 具体解题时的套路: 【题目】 【本题解题思…

压缩列表ziplist

目录 1压缩列表的结构 2.压缩列表节点的构成 previous_entry_length encodeing conent 3.压缩列表API 1.创建列表 ziplistNew 2.正向遍历 ziplistNext 3.反向遍历 ziplistPrev 4.插入元素 ziplistInsert 5.级联更新 __ziplistCascadeUpdate 6.删除节点 7.查找entr…

L2-2 巴音布鲁克永远的土(二分+并查集)

思路:我们可以二分答案,然后判断当前答案合不合理。 对于判断答案合理,可以用并查集,看mid能否把所有检查点连进一个集合中,枚举每个结点,如何当前结点周围的四个方向可以连的话,就加进同一个集…

host文件nginx代理 综合理解

之前一直理解得很糙 导致遇到问题很迷糊 今天经过大佬两句话讲明白了 特此记录 host文件 首先host做了代理 也就是对浏览器的DNS寻址做了拦截 具体原理可以参照当我们在地址栏输入URL的时候浏览器发生了什么 例如127.0.0.1 www.baidu.com 将 127.0.0.1 www.baidu.com 链接自…

DHCP抓包分析

DHCP动态路由配置协议,是C/S架构,由DHCP服务器为客户端动态分配IP信息。 DHCP客户端首次接入网络数据交互过程: 如何解决IP地址的冲突: ▫ DHCP服务器端:收到DHCP DISCOVER报文时,给客户端分配IP地址前会发…

有关NVIDIA 4090的pytorch版本安装心得

目录 1.查看版本2.安装对应版本的pytorch3.检测是否安装成功4.问题 1.查看版本 输入 nvidia-smi 查看GPU支持的cuda最高版本 如上图所示,你的GPU支持的最高CUDA版本便为12.2,所以你要安装CUDA版本不能高于该版本。安装版本不能太低,建议就小…

高阶流程图(SIPOC)

SIPOC高阶流程图是一种流程映射和改进方法,它使用可视化的方式描述一个或多个流程的输入和输出。SIPOC是五个单词的首字母缩写,分别代表供应商(Suppliers)、输入(Inputs)、过程(Processes&#…

python使用概率算法模拟寻找伴侣

对于模拟寻找伴侣的问题,人在特定时期内能够遇见的异性是有限的,假设在某段时期内能够遇见的异性人数是n,那么问题在于在不能全面对n位候选人进行评估的情况下,以最大概率选中最好的那位。 假设n位候选人的水平用n个不同的数值表示,数值越大表示其水平则越高,问题在于,…

深入剖析Tomcat(一) 实现一个简单的Web容器

前言 在Java生态中,Tomcat的应用可谓经久不衰,众多Java Web应用都依赖它来进行部署。 虽然我们经常使用它,但是如果不深入了解的话,它对我们来说就一直是一个黑盒。就单纯的作为一个使用者来说,肯定也知道它内部应用…

Win10下安装Anaconda

Anaconda是可以便捷获取包且对包能够进行管理,同时对环境可以统一管理的发行版本,它包含了conda、Python在内的超过180个科学包及其依赖项。 安装Anaconda Anaconda官方下载网址:https://www.anaconda.com/download 官网页面中&#xff0c…

探究 ChatGPT 的心脏--Transformer(基础知识第一篇)

Transformer 是 ChatGPT 的核心部分,如果将 AI 看做一辆高速运转的汽车,那么 Transformer 就是最重要的引擎。它是谷歌于 2017 年发表的《Attention is All You Need》中提出的 Sequence-to-sequence 的模型,诞生之后便一统江湖,在…

【考研数学】《660》+《880》高分搭配方法

📝《660题》和《880题》高效刷题方法 1️⃣做题要有针对性,不要为了做题而做题 💪660和880题虽然多,但是你不用全都做完,你可以把它当成是题源,里面的每一道题都很经典,如果搞懂一道&#xff…

51蓝桥杯之DS18B20

DS18B20 基础知识 代码流程实现 将官方提供例程文件添加到工程中 添加onewire.c文件到keil4里面 一些代码补充知识 代码 #include "reg52.h" #include "onewire.h" #include "absacc.h" unsigned char num[10]{0xc0,0xf9,0xa4,0xb0,0x99,…

最近一些前端面试问题整理

最近一些前端面试问题整理 4月8号1. TS 中的 类型别名 和接口的区别是什么?2. 什么是深拷贝和浅拷贝?深浅拷贝的方法有哪些?浅拷贝(Shallow Copy)深拷贝(Deep Copy)区别总结 3. 使用 JSON.strin…

如何在淘~宝接单和解决别人问题-java开发

如下这是一个连接:https://s.tb.cn/c.0vDtL3https://s.tb.cn/c.0vDtL3 解决各种问题。可付费咨询

炒股自动化:交易接口API才是重点,券商官方散户可用的接口

上一篇我们用get_full_tick取到了数据,也讲了变量和字典的基本概念,这次我们向交易所发送订单试试。前面文章的链接放在文末了,需要的可以看一下 这些内容是给新手看的,找接口的大佬们直接拉到文末即可 取市场数据的方法很多&am…

查询优化-ANY类型子连接提升

瀚高数据库 目录 文档用途 详细信息 文档用途 已知查询树基本结构的基础上,在一定条件下对SQL中的子连接提升, 详细信息 子连接提升流程 SQL: SELECT sname FROM student WHERE sno > ANY (SELECT sno FROM score);图1.1是该SQL对应的查询树&#…

代码随想录算法训练营第三十一天| 455.分发饼干、376.摆动序列、53.最大子序和

系列文章目录 目录 系列文章目录455.分发饼干贪心算法大饼干喂胃口大的(先遍历胃口)胃口大的先吃大饼干(先遍历饼干)小饼干先喂胃口小的(先遍历胃口)胃口小的先吃小饼干(先遍历饼干) 376. 摆动序…