【强化学习论文合集】AAAI-2022 强化学习论文合集(附论文链接)

news2024/10/5 17:20:59

强化学习(Reinforcement Learning, RL),又称再励学习、评价学习或增强学习,是机器学习的范式和方法论之一,用于描述和解决智能体(agent)在与环境的交互过程中通过学习策略以达成回报最大化或实现特定目标的问题。
本专栏整理了近几年国际顶级会议中,涉及强化学习(Reinforcement Learning, RL)领域的论文。顶级会议包括但不限于:ICML、AAAI、IJCAI、NIPS、ICLR、AAMAS、CVPR、ICRA等。

在这里插入图片描述

今天给大家分享的是2022年人工智能AAAI会议(AAAI Conference on Artificial Intelligence, AAAI)中涉及“强化学习”主题的论文。AAAI旨在促进人工智能的研究和负责任的使用,AAAI还旨在增加公众对人工智能的了解,改善人工智能从业者的教学和培训,并为研究计划者和资助方提供关于当前人工智能发展的重要性和潜力以及未来方向的指导。

  • [1]. Backprop-Free Reinforcement Learning with Active Neural Generative Coding.
  • [2]. Multi-Sacle Dynamic Coding Improved Spiking Actor Network for Reinforcement Learning.
  • [3]. CADRE: A Cascade Deep Reinforcement Learning Framework for Vision-Based Autonomous Urban Driving.
  • [4]. Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach.
  • [5]. OAM: An Option-Action Reinforcement Learning Framework for Universal Multi-Intersection Control.
  • [6]. EMVLight: A Decentralized Reinforcement Learning Framework for Efficient Passage of Emergency Vehicles.
  • [7]. DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning.
  • [8]. AlphaHoldem: High-Performance Artificial Intelligence for Heads-Up No-Limit Poker via End-to-End Reinforcement Learning.
  • [9]. Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning.
  • [10]. Robust Adversarial Reinforcement Learning with Dissipation Inequation Constraint.
  • [11]. Enforcement Heuristics for Argumentation with Deep Reinforcement Learning.
  • [12]. Programmatic Modeling and Generation of Real-Time Strategic Soccer Environments for Reinforcement Learning.
  • [13]. Learning by Competition of Self-Interested Reinforcement Learning Agents.
  • [14]. Reinforcement Learning with Stochastic Reward Machines.
  • [15]. Reinforcement Learning Based Dynamic Model Combination for Time Series Forecasting.
  • [16]. Theoretical Guarantees of Fictitious Discount Algorithms for Episodic Reinforcement Learning and Global Convergence of Policy Gradient Methods.
  • [17]. Learning Action Translator for Meta Reinforcement Learning on Sparse-Reward Tasks.
  • [18]. Wasserstein Unsupervised Reinforcement Learning.
  • [19]. Reinforcement Learning of Causal Variables Using Mediation Analysis.
  • [20]. Globally Optimal Hierarchical Reinforcement Learning for Linearly-Solvable Markov Decision Processes.
  • [21]. Creativity of AI: Automatic Symbolic Option Discovery for Facilitating Deep Reinforcement Learning.
  • [22]. Same State, Different Task: Continual Reinforcement Learning without Interference.
  • [23]. Introducing Symmetries to Black Box Meta Reinforcement Learning.
  • [24]. Deep Reinforcement Learning Policies Learn Shared Adversarial Features across MDPs.
  • [25]. Conjugated Discrete Distributions for Distributional Reinforcement Learning.
  • [26]. Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep Reinforcement Learning.
  • [27]. Fast and Data Efficient Reinforcement Learning from Pixels via Non-parametric Value Approximation.
  • [28]. Recursive Reasoning Graph for Multi-Agent Reinforcement Learning.
  • [29]. Exploring Safer Behaviors for Deep Reinforcement Learning.
  • [30]. Constraint Sampling Reinforcement Learning: Incorporating Expertise for Faster Learning.
  • [31]. Unsupervised Reinforcement Learning in Multiple Environments.
  • [32]. Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation.
  • [33]. Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning.
  • [34]. Offline Reinforcement Learning as Anti-exploration.
  • [35]. Regularization Guarantees Generalization in Bayesian Reinforcement Learning through Algorithmic Stability.
  • [36]. Sample-Efficient Reinforcement Learning via Conservative Model-Based Actor-Critic.
  • [37]. Controlling Underestimation Bias in Reinforcement Learning via Quasi-median Operation.
  • [38]. Structure Learning-Based Task Decomposition for Reinforcement Learning in Non-stationary Environments.
  • [39]. Generalizing Reinforcement Learning through Fusing Self-Supervised Learning into Intrinsic Motivation.
  • [40]. Reinforcement Learning Augmented Asymptotically Optimal Index Policy for Finite-Horizon Restless Bandits.
  • [41]. Constraints Penalized Q-learning for Safe Offline Reinforcement Learning.
  • [42]. Q-Ball: Modeling Basketball Games Using Deep Reinforcement Learning.
  • [43]. Natural Black-Box Adversarial Examples against Deep Reinforcement Learning.
  • [44]. SimSR: Simple Distance-Based State Representations for Deep Reinforcement Learning.
  • [45]. State Deviation Correction for Offline Reinforcement Learning.
  • [46]. Multi-Agent Reinforcement Learning with General Utilities via Decentralized Shadow Reward Actor-Critic.
  • [47]. A Multi-Agent Reinforcement Learning Approach for Efficient Client Selection in Federated Learning.
  • [48]. Batch Active Learning with Graph Neural Networks via Multi-Agent Deep Reinforcement Learning.
  • [49]. Stackelberg Actor-Critic: Game-Theoretic Reinforcement Learning Algorithms.
  • [50]. Invariant Action Effect Model for Reinforcement Learning.
  • [51]. Locality Matters: A Scalable Value Decomposition Approach for Cooperative Multi-Agent Reinforcement Learning.
  • [52]. Concentration Network for Reinforcement Learning of Large-Scale Multi-Agent Systems.
  • [53]. A Deeper Understanding of State-Based Critics in Multi-Agent Reinforcement Learning.
  • [54]. Goal Recognition as Reinforcement Learning.
  • [55]. NICE: Robust Scheduling through Reinforcement Learning-Guided Integer Programming.
  • [56]. MAPDP: Cooperative Multi-Agent Reinforcement Learning to Solve Pickup and Delivery Problems.
  • [57]. Eye of the Beholder: Improved Relation Generalization for Text-Based Reinforcement Learning Agents.
  • [58]. Text-Based Interactive Recommendation via Offline Reinforcement Learning.
  • [59]. Multi-Agent Reinforcement Learning Controller to Maximize Energy Efficiency for Multi-Generator Industrial Wave Energy Converter.
  • [60]. Bayesian Model-Based Offline Reinforcement Learning for Product Allocation.
  • [61]. Reinforcement Learning for Datacenter Congestion Control.
  • [62]. Creating Interactive Crowds with Reinforcement Learning.
  • [63]. Using Graph-Aware Reinforcement Learning to Identify Winning Strategies in Diplomacy Games (Student Abstract).
  • [64]. Reinforcement Learning Explainability via Model Transforms (Student Abstract).
  • [65]. Using Reinforcement Learning for Operating Educational Campuses Safely during a Pandemic (Student Abstract).
  • [66]. Criticality-Based Advice in Reinforcement Learning (Student Abstract).
  • [67]. VeNAS: Versatile Negotiating Agent Strategy via Deep Reinforcement Learning (Student Abstract).

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/28557.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

【测试沉思录】16. 性能测试中的系统资源分析之三:磁盘

作者:马海琴 编辑:毕小烦 三. 磁盘 磁盘是可以持久化存储的设备,根据存储介质的不同,常见磁盘可以分为两类:机械磁盘和固态磁盘。磁盘就像人的大脑皮层,负责数据的储存、记忆。 磁盘对于服务器来说十分重…

大学生静态HTML网页源码——佛山旅游景点介绍网页代码 家乡旅游网页制作模板 web前端期末大作业

家乡旅游景点网页作业制作 网页代码运用了DIV盒子的使用方法,如盒子的嵌套、浮动、margin、border、background等属性的使用,外部大盒子设定居中,内部左中右布局,下方横向浮动排列,大学学习的前端知识点和布局方式都有…

学生个人网页模板 简单个人主页--贝聿铭人物介绍 6页带表单 带报告3800字

⛵ 源码获取 文末联系 ✈ Web前端开发技术 描述 网页设计题材,DIVCSS 布局制作,HTMLCSS网页设计期末课程大作业 | ‍个人博客网站 | ‍个人主页介绍 | 个人简介 | 个人博客设计制作 | 等网站的设计与制作 | 大学生个人HTML网页设计作品 | HTML期末大学生网页设计作业…

c语言:初识指针(二)

初识指针一.野指针1.野指针形成原因一是:未初始化2.野指针形成原因二:指针越界3.野指针形成原因三:指针所指向的内存空间被释放二.指针的运算1.指针-整数运算2.指针-指针3.指针的关系运算三.指针和数组四.二级指针五.指针数组1.定义2.用一维数…

APS生产计划排产在装备制造业的应用

装备制造业是对所有为国民经济各部门的简单再生产和扩大再生产提供技术装备的制造工业的总称,范围包括航空、航天、军工制造;民用飞机、铁路、船舶、汽车等先进交通运输设备制造;石油、矿产、化工、压力容器、电力成套设备制造;以…

Jaya算法在电力系统最优潮流计算中的应用(创新点)【Matlab代码实现】

目录 1 概述 2 数学模型 2.1 目标函数 2.2 约束条件 2.3 Jaya 算法 3 仿真结果 4 Matlab代码实现 1 概述 最优潮流计算与电力系统的稳定、经济运行密切相关,自20世纪60年代提出最优潮流的概念,大量学者相继提出了各种优化技术来求解电力系统的最…

Google Earth Engine(GEE)—— 多光谱指数整合中推导出湿地覆盖类型 (WCT)

湿地覆盖被定义为归因于植被、浑浊度、含水土壤和水量等潜在生物物理条件的湿地空间均匀区域。在这里,我们提出了一种新方法来导出湿地覆盖类型 (WCT),该方法结合了三个常用的多光谱指数 NDVI、MNDWI 和 NDTI,位于印度各地不同地貌和气候环境的三个大型拉姆萨尔湿地。这些湿…

【CNN】SENet——将注意力机制引入通道维度

前言 SENet,胡杰(Momenta)在2017.9提出,通过显式地建模卷积特征通道之间的相互依赖性来提高网络的表示能力,即,通道维度上的注意力机制。SE块以微小的计算成本为现有的最先进的深层架构产生了显著的性能改…

高等数学(第七版)同济大学 习题10-2(中5题) 个人解答

高等数学(第七版)同济大学 习题10-2(中5题) 函数作图软件:Mathematica 11.画出积分区域,把积分∬Df(x,y)dxdy表示为极坐标形式的二次积分,其中积分区域D是:\begin{aligned}&11. \ 画出积分…

算法图解学习2 大O表示

random recording 随心记录 What seems to us as bitter trials are often blessings in disguise. 看起来对我们痛苦的试炼,常常是伪装起来的好运。 大O表示法是一种特殊的表示法,指出了算法的速度有多快。 背景引入 Bob要为NASA编写一个查找算法&…

MobaXterm连接报错Network error: Connection timed out

今天打开MobaXterm远程连接我VMware虚拟机的时候出现以下界面&#xff0c;问题详情如下&#xff1a;Network error: Connection timed out Session stopped - Press <return> to exit tab - Press R to restart session - Press S to save terminal output t…

RabbitMQ介绍

介绍 RabbitMQ是 一个由erlang语言编写的、开源的&#xff0c;基于AMQP协议实现的消息队列&#xff0c;具有MQ应用解耦、流量削峰、异步的特点 官网地址 https://www.rabbitmq.com/ 功能特点 支持顺序消息&#xff0c;保证消息送达消费端的前后顺序 支持消息补偿&#xff0…

计算机是什么

文章目录计算机是什么计算机类别计算机优缺点计算机应用实例计算机是什么 计算机&#xff08;英文“computer”&#xff09;一词源自拉丁文中的“computare”&#xff0c;本意为计算。 发展至今&#xff0c;计算机专门代指“可编程”的电子设备。所谓“可编程”&#xff0c;…

图像超分辨率模型:Real-ESRGAN | 论文阅读+实战记录

前言 最近需要一个超分的模型&#xff0c;经过调研准备用 Real-ESRGAN。特此记录论文阅读和实战过程。 论文阅读 论文地址&#xff1a;Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data Github&#xff1a;https://github.com/xinntao/Re…

[附源码]java毕业设计学生宿舍设备报修

项目运行 环境配置&#xff1a; Jdk1.8 Tomcat7.0 Mysql HBuilderX&#xff08;Webstorm也行&#xff09; Eclispe&#xff08;IntelliJ IDEA,Eclispe,MyEclispe,Sts都支持&#xff09;。 项目技术&#xff1a; SSM mybatis Maven Vue 等等组成&#xff0c;B/S模式 M…

【教学类-16-01】20221122《世界杯七巧板A4-随机参考图》(大班)

效果展示&#xff1a; 单页效果 多页效果 背景需求&#xff1a; 2022年11月20日 2022年卡塔尔世界杯足球赛在卡塔尔首都多哈举行。借此契机&#xff0c;我设计一份七巧板让幼儿拼”踢足球“的造型。 图片准备&#xff1a; 从网上拉了18张各种”踢球七巧板造型图“ 新建一个…

几种常用关系型数据库架构和实现原理

【摘要】本文介绍几种常用(闭源、开源)关系型数据库的架构和实现原理,包括Oracle、MySQL、PostgreSQL、GaussDB T,涉及产品最新参数指标等请以各官网为准。 一、 Oracle (一) Oracle 架构 Oracle Server包括数据库(Database)和实例(Instance)两大部分,两者相互独立…

SSO单点登录流程详解

概念 单点登录&#xff08;Single Sign On&#xff09;&#xff0c;简称为 SSO&#xff0c;是比较流行的企业业务整合的解决方案之一。SSO的定义是在多个应用系统中&#xff0c;用户只需要登录一次就可以访问所有相互信任的应用系统。 背景 企业发展初期&#xff0c;系统设计不…

关于pbootcms中被挂马以后的处理

最近一段时间很多使用pbootcms建设的网站都遭遇到了首页挂马的问题,表现形式便是首页页面增加了很多?id123,?/?id37087875.csv,?id26993492.shtml等等形态.当我们查看后台系统日志中的蜘蛛访问或者首页上多了这些链接的时候,基本上就是被挂马了,这个时候我们需要及时作出处…

Go:微服务架构下的单元测试(基于 Ginkgo、gomock 、Gomega)

文章目录简介一、Ginkgo包的引入和简单介绍二、Dockertest 使用三、编写单元测试1. 编写 data 层的测试代码四、引入 gomock 包&#xff0c;mock 对象模拟依赖项1. 编写生成 mock 文件方法2. 编写 biz 层的测试方法3. 验证单元测试小结简介 本文主要使用 Ginkgo[2] 、gomock[3…