2023智源大会议程公开丨视觉与多模态大模型论坛

6月9日，2023北京智源大会，将邀请AI领域的探索者、实践者、以及关心智能科学的每个人，共同拉开未来舞台的帷幕，你准备好了吗？与会知名嘉宾包括，图灵奖得主Yann LeCun、OpenAI创始人Sam Altman、图灵奖得主Geoffrey Hinton、图灵奖得主Joseph Sifakis、诺贝尔奖得主Arieh Warshel、未来生命研究所创始人Max Tegmark、2021年科学突破奖得主David Baker、2022吴文俊最高成就奖得主郑南宁院士以及中国科学院院士张钹等。目前已正式开放大会线上报名渠道。大会将同步向全球线上直播。

北京智源大会倒计时：5天

视觉与多模态大模型丨6月9日下午

近年来，鳞次栉比的语言大模型、多模态大模型纷纷涌现，为研究者们开启了广阔的新舞台，深刻影响了人类社会。进入2023年，以SAM、SegGPT为代表的一系列「视觉大模型」相继问世，基于这些模型的后续工作也呈井喷式爆发。可以预见，「视觉大模型」将成为接下来一段时间计算机视觉领域无法绕开的重要课题。本论坛邀请了来自英伟达、南洋理工大学、北京交通大学、智源研究院、摩尔线程等知名校企和研究机构的杰出学者，将从3D视觉、AIGC、生成模型、通用视觉大模型等方面对「视觉大模型」的理论、技术、应用展开讨论，旨在传播知识、分享观点，共同打造「视觉大模型」生态圈，为该领域发展做出贡献。

论坛议程

论坛主席

颜水成，智源研究院访问首席科学家

Prof. Yan is currently Visiting Chief Scientist at Beijing Academy of Artificial Intelligence (non-profit organization), and former Group Chief Scientist of Sea Group.

Prof. Yan Shuicheng is a Fellow of Singapore's Academy of Engineering, AAAI, ACM, IEEE, and IAPR. His research areas include computer vision, machine learning, and multimedia analysis. Till now, Prof Yan has published over 600 papers at top international journals and conferences, with an H-index of 130+. He has also been named among the annual World's Highly Cited Researchers eight times.

Prof. Yan's team received ten-time winners or honorable-mention prizes at two core competitions, Pascal VOC and ImageNet (ILSVRC), deemed the “World Cup” in the computer vision community. Besides, his team won more than ten best papers and best student paper awards, particularly a grand slam at the ACM Multimedia, the top-tiered conference in multimedia, including the Best Paper Awards thrice, Best Student Paper Awards twice, and Best Demo Award once.

主持人

魏云超，北京交通大学教授、博导

曾在新加坡国立大学、美国伊利诺伊大学厄巴纳-香槟分校、悉尼科技大学从事研究工作。入选MIT TR35 China，百度全球高潜力华人青年学者、《澳大利亚人》TOP 40 Rising Star，国家重点研发计划青年科学家项目负责人，曾获教育部高等学校自然科学奖一等奖、中国图象图形学学会科技技术奖一等奖、计算机视觉世界杯ImageNet目标检测冠军及多项CVPR竞赛冠军，发表TPAMI、CVPR顶级期刊/会议论文100多篇，Google引用15000多次。主要研究方向包括面向非完美数据的视觉感知和多模态数据分析等。

演讲主题及嘉宾介绍

1、Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

议题简介：Synthesizing visual content that meets users' needs often requires flexible and precise controllability of the pose, shape, expression, and layout of the generated objects. Existing approaches gain controllability of generative adversarial networks (GANs) via manually annotated training data or a prior 3D model, which often lack flexibility, precision, and generality. In this talk, we will introduce a powerful yet much less explored way of controlling GANs, that is, to "drag" any points of the image to precisely reach target points in a user-interactive manner. To achieve this, we propose DragGAN, which consists of two main components including: 1) a feature-based motion supervision that drives the handle point to move towards the target position, and 2) a new point tracking approach that leverages the discriminative GAN features to keep localizing the position of the handle points. Through DragGAN, anyone can deform an image with precise control over where pixels go, thus manipulating the pose, shape, expression, and layout of diverse categories such as animals, cars, humans, landscapes, etc. As these manipulations are performed on the learned generative image manifold of a GAN, they tend to produce realistic outputs even for challenging scenarios such as hallucinating occluded content and deforming shapes that consistently follow the object's rigidity.

潘新钢，南洋理工大学计算机科学与工程系助理教授

隶属于MMLab-NTU和S-Lab。他的研究方向是生成式人工智能与神经渲染，主要工作包括DragGAN，Deep Generative Prior，GAN2Shape等。加入南洋理工大学前，他曾马克斯普朗克计算机科学研究所Christian Theobalt教授组从事博士后研究。他在香港中文大学MMLAB获得博士学位，师从汤晓鸥教授，并在清华大学获得学士学位。

2、Machine Learning for 3D Content Creation

议题简介：With the increasing demand for creating large-scale 3D virtual worlds in many industries, there is an immense need for diverse and high-quality 3D content. Machine learning is existentially enabling this quest. In this talk, I will discuss how looking from the perspective of combining differentiable iso-surfacing with differentiable rendering could enable 3D content creation at scale and make real-world impact. Towards this end, we first introduce a differentiable 3D representation based on a tetrahedral grid to enable high-quality recovery of 3D mesh with arbitrary topology. By incorporating differentiable rendering, we further design a generative model capable of producing 3D shapes with complex textures and materials for mesh generation. Our framework further paves the way for innovative high-quality 3D mesh creation from text prompt leveraging 2D diffusion models, which democretizes 3D content creation for novice users.

高俊，英伟达研究科学家

高俊是多伦多大学PhD，以及NVIDIA的research scientist。他的研究方向是三维计算机视觉以及图形学，主要关注机器学习在large-scale 3D内容生成方向上的应用。他的代表工作包括GET3D，Magic3D，DefTet等，其中很多被集成于NVIDIA的产品，包括NVIDIA Picasso, GANVerse3D, Neural DriveSim 以及 Toronto Annotation Suite。他即将担任2023年NeurIPS领域主席。

3、通用视觉模型初探

王鑫龙，智源研究院研究员

王鑫龙，智源研究院视觉模型研究中心研究员，博士毕业于澳大利亚阿德莱德大学，研究领域为计算机视觉和基础模型，近年研究工作包括SOLO、SOLOv2、DenseCL、EVA、Painter和SegGPT等。获得奖项包括Google PhD Fellowship、国家优秀自费留学生奖学金、阿德莱德大学Doctoral Research Medal等。

4、Image, Video and 3D Content Creation with Diffusion Models

议题简介：Denoising diffusion-based generative models have led to multiple breakthroughs in deep generative learning. In this talk, we will provide an overview over recent works by NVIDIA on diffusion models and their applications for image, video, and 3D content creation. We will start with a short introduction to diffusion models and then discuss large-scale text-to-image generation. Next, we will highlight different efforts on 3D generative modeling. This includes both object-centric 3D synthesis as well as full scene-level generation. Finally, we will discuss our recent work on high-resolution video generation with video latent diffusion models. We turn the state-of-the-art text-to-image model Stable Diffusion into a high-resolution text-to-video generator and we also demonstrate the simulation of real in-the-wild driving scene videos.

Karsten Kreis，英伟达研究科学家

Karsten Kreis is a senior research scientist at NVIDIA’s Toronto AI Lab. Prior to joining NVIDIA, he worked on deep generative modeling at D-Wave Systems and co-founded Variational AI, a startup utilizing generative models for drug discovery. Before switching to deep learning, Karsten did his M.Sc. in quantum information theory at the Max Planck Institute for the Science of Light and his Ph.D. in computational and statistical physics at the Max Planck Institute for Polymer Research. Currently, Karsten’s research focuses on developing novel generative learning methods, primarily diffusion models, and on applying deep generative models on problems in areas such as computer vision, graphics and digital artistry, as well as in the natural sciences.

凌欢，英伟达研究科学家

凌欢是Nvidia Toronto AI Lab的人工智能科学家，多伦多大学PhD，和多伦多Vector Institute成员。博士期间凌欢师从Sanja Fidler教授，发表顶会共10余篇并拥有多项相关专利。他的研究方向主攻大规模图像视屏生成模型，和生成模型在计算机视觉领域的应用。他的代表作包括PolyRNN++, DatasetGAN, EditGAN以及近期的Align Your Latents: VideoLDM.

5、圆桌讨论

圆桌论坛嘉宾：

魏云超：北京交通大学教授

潘新钢：南洋理工大学计算机科学与工程系助理教授

高俊：英伟达研究科学家

王鑫龙：智源研究院研究员

夏威：摩尔线程AI副总裁

夏威，摩尔线程研发副总裁

新加坡国立大学博士，曾在松下新加坡研究院和欧洲Lund大学访问研究，先后在国际期刊和会议发表30多篇论文，有30多项美国专利，并多次获得Pascal VOC，Imagenet挑战赛的冠亚军。曾在硅谷参与创立人工智能公司Orbeus，推出Rekognition智能识别平台和美国市场第一款智能相册PhotoTime。后公司被亚马逊收购，在AWS AI担任首席科学家（Principal Scientist），负责AWS人工智能云服务Rekognition/Textract等产品的研发工作。在AWS期间和团队共同开创了机器学习模型兼容性的新研究领域。

扫码二维码或点击「阅读原文」报名线下参会&线上直播