【强化学习论文合集】ICRA-2022 强化学习论文 | 2022年合集(六)

news2024/11/20 10:32:12

强化学习(Reinforcement Learning, RL),又称再励学习、评价学习或增强学习,是机器学习的范式和方法论之一,用于描述和解决智能体(agent)在与环境的交互过程中通过学习策略以达成回报最大化或实现特定目标的问题。
本专栏整理了近几年国际顶级会议中,涉及强化学习(Reinforcement Learning, RL)领域的论文。顶级会议包括但不限于:ICML、AAAI、IJCAI、NIPS、ICLR、AAMAS、CVPR、ICRA等。

在这里插入图片描述

今天给大家分享的是2022年IEEE世界机器人与自动化大会(IEEE InternationalConference on Robotics and Automation,简称IEEE ICRA)中涉及“强化学习”主题的论文。

IEEE国际机器人与自动化协会每年主办一次IEEE世界机器人与自动化大会(IEEE InternationalConference on Robotics and Automation,IEEE ICRA),IEEE ICRA是机器人领域规模(千人以上)和影响力都排名第一的顶级国际会议,是机器人领域权威研究人员介绍其研究成果的首要国际论坛。

  • [1]. Backprop-Free Reinforcement Learning with Active Neural Generative Coding.
  • [2]. Multi-Sacle Dynamic Coding Improved Spiking Actor Network for Reinforcement Learning.
  • [3]. CADRE: A Cascade Deep Reinforcement Learning Framework for Vision-Based Autonomous Urban Driving.
  • [4]. Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach.
  • [5]. OAM: An Option-Action Reinforcement Learning Framework for Universal Multi-Intersection Control.
  • [6]. EMVLight: A Decentralized Reinforcement Learning Framework for Efficient Passage of Emergency Vehicles.
  • [7]. DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning.
  • [8]. AlphaHoldem: High-Performance Artificial Intelligence for Heads-Up No-Limit Poker via End-to-End Reinforcement Learning.
  • [9]. Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning.
  • [10]. Robust Adversarial Reinforcement Learning with Dissipation Inequation Constraint.
  • [11]. Enforcement Heuristics for Argumentation with Deep Reinforcement Learning.
  • [12]. Programmatic Modeling and Generation of Real-Time Strategic Soccer Environments for Reinforcement Learning.
  • [13]. Learning by Competition of Self-Interested Reinforcement Learning Agents.
  • [14]. Reinforcement Learning with Stochastic Reward Machines.
  • [15]. Reinforcement Learning Based Dynamic Model Combination for Time Series Forecasting.
  • [16]. Theoretical Guarantees of Fictitious Discount Algorithms for Episodic Reinforcement Learning and Global Convergence of Policy Gradient Methods.
  • [17]. Learning Action Translator for Meta Reinforcement Learning on Sparse-Reward Tasks.
  • [18]. Wasserstein Unsupervised Reinforcement Learning.
  • [19]. Reinforcement Learning of Causal Variables Using Mediation Analysis.
  • [20]. Globally Optimal Hierarchical Reinforcement Learning for Linearly-Solvable Markov Decision Processes.
  • [21]. Creativity of AI: Automatic Symbolic Option Discovery for Facilitating Deep Reinforcement Learning.
  • [22]. Same State, Different Task: Continual Reinforcement Learning without Interference.
  • [23]. Introducing Symmetries to Black Box Meta Reinforcement Learning.
  • [24]. Deep Reinforcement Learning Policies Learn Shared Adversarial Features across MDPs.
  • [25]. Conjugated Discrete Distributions for Distributional Reinforcement Learning.
  • [26]. Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep Reinforcement Learning.
  • [27]. Fast and Data Efficient Reinforcement Learning from Pixels via Non-parametric Value Approximation.
  • [28]. Recursive Reasoning Graph for Multi-Agent Reinforcement Learning.
  • [29]. Exploring Safer Behaviors for Deep Reinforcement Learning.
  • [30]. Constraint Sampling Reinforcement Learning: Incorporating Expertise for Faster Learning.
  • [31]. Unsupervised Reinforcement Learning in Multiple Environments.
  • [32]. Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation.
  • [33]. Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning.
  • [34]. Offline Reinforcement Learning as Anti-exploration.
  • [35]. Regularization Guarantees Generalization in Bayesian Reinforcement Learning through Algorithmic Stability.
  • [36]. Sample-Efficient Reinforcement Learning via Conservative Model-Based Actor-Critic.
  • [37]. Controlling Underestimation Bias in Reinforcement Learning via Quasi-median Operation.
  • [38]. Structure Learning-Based Task Decomposition for Reinforcement Learning in Non-stationary Environments.
  • [39]. Generalizing Reinforcement Learning through Fusing Self-Supervised Learning into Intrinsic Motivation.
  • [40]. Reinforcement Learning Augmented Asymptotically Optimal Index Policy for Finite-Horizon Restless Bandits.
  • [41]. Constraints Penalized Q-learning for Safe Offline Reinforcement Learning.
  • [42]. Q-Ball: Modeling Basketball Games Using Deep Reinforcement Learning.
  • [43]. Natural Black-Box Adversarial Examples against Deep Reinforcement Learning.
  • [44]. SimSR: Simple Distance-Based State Representations for Deep Reinforcement Learning.
  • [45]. State Deviation Correction for Offline Reinforcement Learning.
  • [46]. Multi-Agent Reinforcement Learning with General Utilities via Decentralized Shadow Reward Actor-Critic.
  • [47]. A Multi-Agent Reinforcement Learning Approach for Efficient Client Selection in Federated Learning.
  • [48]. Batch Active Learning with Graph Neural Networks via Multi-Agent Deep Reinforcement Learning.
  • [49]. Stackelberg Actor-Critic: Game-Theoretic Reinforcement Learning Algorithms.
  • [50]. Invariant Action Effect Model for Reinforcement Learning.
  • [51]. Locality Matters: A Scalable Value Decomposition Approach for Cooperative Multi-Agent Reinforcement Learning.
  • [52]. Concentration Network for Reinforcement Learning of Large-Scale Multi-Agent Systems.
  • [53]. A Deeper Understanding of State-Based Critics in Multi-Agent Reinforcement Learning.
  • [54]. Goal Recognition as Reinforcement Learning.
  • [55]. NICE: Robust Scheduling through Reinforcement Learning-Guided Integer Programming.
  • [56]. MAPDP: Cooperative Multi-Agent Reinforcement Learning to Solve Pickup and Delivery Problems.
  • [57]. Eye of the Beholder: Improved Relation Generalization for Text-Based Reinforcement Learning Agents.
  • [58]. Text-Based Interactive Recommendation via Offline Reinforcement Learning.
  • [59]. Multi-Agent Reinforcement Learning Controller to Maximize Energy Efficiency for Multi-Generator Industrial Wave Energy Converter.
  • [60]. Bayesian Model-Based Offline Reinforcement Learning for Product Allocation.
  • [61]. Reinforcement Learning for Datacenter Congestion Control.
  • [62]. Creating Interactive Crowds with Reinforcement Learning.
  • [63]. Using Graph-Aware Reinforcement Learning to Identify Winning Strategies in Diplomacy Games (Student Abstract).
  • [64]. Reinforcement Learning Explainability via Model Transforms (Student Abstract).
  • [65]. Using Reinforcement Learning for Operating Educational Campuses Safely during a Pandemic (Student Abstract).
  • [66]. Criticality-Based Advice in Reinforcement Learning (Student Abstract).
  • [67]. VeNAS: Versatile Negotiating Agent Strategy via Deep Reinforcement Learning (Student Abstract).

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/29160.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

生成者(建造者)模式

思考生成者模式 生成者模式就是将对象构建和对象内部构建分离 对象构建:手机的构建 对象内部构建:手机中屏幕和电池的构建 1.生成者模式的本质 生成器模式的本质:分离整体对象构建算法和对象的部件构造。 构建一个复杂的对象,本来就有构建的过…

前端程序员辞掉朝九晚五工作成为独立开发者一年开发出6款软件的故事

一个前端程序员的梦想 作为一个程序员,陈明福的梦想是: 自主自由的工作内容和方式。在全球范围内发展个人品牌和影响力。学习技术和经验,成为 SaaS 软件方面的专家。对世界产生积极影响。财务自由,能提前退休。 他的故事 1、他…

JS 根据某个字段进行排序或分组

JS 数组中根据某个字段进行排序 const arr [ { name: "崔喻琪", age: 32 }, { name: " 王忱景", age: 18 }, { name: " 房真睿", age: 27 }, { name: "姬泉孝", age: 20 }, { name: "余嘉芳", age: 16 }, { na…

Deep Few-Shot Learning for Hyperspectral Image Classification-浅读

这里写目录标题Deep Few-Shot Learning for Hyperspectral Image ClassificationIntroductionMethodExperimentDeep Few-Shot Learning for Hyperspectral Image Classification 我看的第一篇 few-shot learning 文章,记录一下,看看能不能说明few-shot …

学生家乡网页设计作品静态HTML网页—— HTML+CSS+JavaScript制作辽宁沈阳家乡主题网页源码(11页)

家乡旅游景点网页作业制作 网页代码运用了DIV盒子的使用方法,如盒子的嵌套、浮动、margin、border、background等属性的使用,外部大盒子设定居中,内部左中右布局,下方横向浮动排列,大学学习的前端知识点和布局方式都有…

Python绘制三维图详解

利用Python绘制三维图 目标: 绘制图像z2x2y2z^2 x^2 y^2z2x2y2 import matplotlib.pyplot as plt import numpy as np from mpl_toolkits.mplot3d import Axes3D #绘制3D图案画曲面的第一步是就是要创建一个二维平面的网格,在Python当中,…

设备树_基础知识

设备树 格式 /dts-v1/; // 版本号 / { // /表示根节点string-property "xxx"; // string 类型string-list-property "xxx", "yyy"; // strin…

Linux下的Framebuffer编程

文章目录前言一、LCD操作原理二、代码解析及编写程序的步骤0.定义各类参数1.打开LCD设备节点2.获取触摸屏数据3.mmap映射Framebuffer,在Framebuffer中写入数据三、LCD操作函数解析1.描点函数2.显示字符函数总结前言 本篇文章将会介绍Linux下的Framebuffer编程&…

html中的定位知识点如何使用

目录 系列文章目录 文章目录 前言 一、定位是什么?有什么用? 二、定位方式有哪些?怎么使用? 1、静态定位:就是默认的定位方式,意思就是没有定位; 2、相对定位: 3、绝对定位&…

PDF怎么转成Word?安利几个转换小技巧

平时我们工作学习的时候,经常要跟文件打交道,并且接触最多的文件形式就是PDF与Word两种文件格式,它们各有各的好处,PDF的保密性以及兼容性好,便于我们进行文件分享查阅,而Word就方便我们进行编辑。如果我们…

Ubuntu Server 22.04.1配置(配置root账号、设置固定IP、更改SSH端口、配置UFW、VM扩展磁盘后Ubuntu的扩容)

为了能快速的创建虚拟机,通过VM创建了一个2核CPU、4G内存、40G硬盘,安装Ubuntu Server 22.04.1的虚拟机,以便在需要的时候随时克隆一个新的虚拟机出来。 在新的虚拟机克隆出来后可能会调整硬件的配置,例如将40G硬盘扩展到50G&…

Python编程 字典创建

作者简介:一名在校计算机学生、每天分享Python的学习经验、和学习笔记。 座右铭:低头赶路,敬事如仪 个人主页:网络豆的主页​​​​​​ 目录 前言 一.字典 1.字典介绍 (了解) 2.字典创建&#xff0…

艾美捷nickases内切酶活性检测及相关研究

艾美捷nickases内切酶组分: NLS-Cas9(D10A) Nickase(0.1 μg/μl) 500 μL 10Reaction Buffer 1 ml 艾美捷nickases内切酶切割活性检测: NLS-Cas9(D10A) Nickase(0.1 μg/μl) 500 μL 10Reaction Buffer 1 ml 经多次柱纯化,SDS-PAGE 胶检…

70. 爬楼梯(动态规划解法)

题目 假设你正在爬楼梯。需要 n 阶你才能到达楼顶。 每次你可以爬 1 或 2 个台阶。你有多少种不同的方法可以爬到楼顶呢? 示例 1: 输入:n 2 输出:2 解释:有两种方法可以爬到楼顶。 1. 1 阶 1 阶 2. 2 阶 示例 2&a…

【异常】com.alicp.jetcache.CacheException: refresh error

jetcache refresh error一、背景描述二、报错内容三、报错原因四、解决方案4.1 解决方案一,使用一级缓存4.2 解决方案二,开启写入权限一、背景描述 技术栈:Spring Boot(2.1.5.RELEASE) Spring Cloud Oopenfeign(2.1.1.RELEASE) jetCache(2.…

基于nodejs电商购物系统的设计与实现(论文+源码+ppt文档+视频录制)

资料下载地址:请点击》》》》 1 前言 2 1.1课题背景 2 1.2课题内容 3 2 需求分析 4 2.1 功能需求 4 2.2 性能需求 5 系统安全性 5 系统数据完整性 5 2.3 数据需求 6 2.4 运行环境需求 6 客户端配置 6 服务器配置 6 2.5 nodejs框架分析 6 3 系统设计 7 3.1 系统设计…

软件项目管理期中准备(自用,仅供参考)

考前拿到了样卷(#^.^#) 直接面向样卷备考 软件项目管理期中准备(自用,仅供参考)选择题计算题1.进度管理-关键路径法,时间压缩法2.进度管理-任务历时估计3.成本管理-COCOMO估算法4.项目的执行与控制-挣值分析法5.成本管理-专家估算…

数据库-----JDBC技术

JDBC概述 数据的持久化 持久化(persistence):将内存中的数据保存到可永久保存的存储 设备中(如磁盘)。 持久化的主要应用是将内存中的数据存储在关系型数据库中,当 然也可以存储在磁盘文件、XML数据文件中。 什么是 JDBC 1、JDBC…

DIN EN ISO 4589-2塑料 用氧指数法测定燃烧行为 第2 部分:室温试验

ISO 4589-2 塑料-用氧指数法测定燃烧行为-第2部分:室温测试-标准名称: ISO 4589-2 塑料-用氧指数法测定燃烧行为-第2部分:室温测试 ISO 4589-2 Plastics-Determination of burning behaviour by oxygen index –Part 2: Ambient-temperature test ISO 4…

推荐10个不错的React开源项目

1,Kutt.it Kutt是一个现代的URL缩短器,支持自定义域,可以用来缩短网址、管理链接并查看点击率统计信息。Kutt支持自定义域名,设置链接密码和描述,缩短URL的私人统计信息,查看、编辑、删除和管理链接&#…