【强化学习论文合集】NeurIPS-2021 强化学习论文

news2025/1/19 2:31:36

强化学习(Reinforcement Learning, RL),又称再励学习、评价学习或增强学习,是机器学习的范式和方法论之一,用于描述和解决智能体(agent)在与环境的交互过程中通过学习策略以达成回报最大化或实现特定目标的问题。
本专栏整理了近几年国际顶级会议中,涉及强化学习(Reinforcement Learning, RL)领域的论文。顶级会议包括但不限于:ICML、AAAI、IJCAI、NIPS、ICLR、AAMAS、CVPR、ICRA等。

在这里插入图片描述

今天给大家分享的是2021年神经信息处理系统大会(Conference and Workshop on Neural Information Processing Systems)中涉及“强化学习”主题的论文。

NIPS(NeurIPS),全称神经信息处理系统大会(Conference and Workshop on Neural Information Processing Systems),是一个关于机器学习和计算神经科学的国际会议。该会议固定在每年的12月举行,由NIPS基金会主办。NIPS是机器学习领域的顶级会议。在中国计算机学会的国际学术会议排名中,NIPS为人工智能领域的A类会议。

  • [1]. Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning.
  • [2]. Understanding End-to-End Model-Based Reinforcement Learning Methods as Implicit Parameterization.
  • [3]. Fault-Tolerant Federated Reinforcement Learning with Theoretical Guarantee.
  • [4]. Risk-Averse Bayes-Adaptive Reinforcement Learning.
  • [5]. Offline Reinforcement Learning as One Big Sequence Modeling Problem.
  • [6]. Distributional Reinforcement Learning for Multi-Dimensional Reward Functions.
  • [7]. A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning.
  • [8]. Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation.
  • [9]. There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning.
  • [10]. Reinforcement Learning in Reward-Mixing MDPs.
  • [11]. Goal-Aware Cross-Entropy for Multi-Target Reinforcement Learning.
  • [12]. On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement Learning.
  • [13]. Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks.
  • [14]. On the Theory of Reinforcement Learning with Once-per-Episode Feedback.
  • [15]. On Effective Scheduling of Model-based Reinforcement Learning.
  • [16]. Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization.
  • [17]. Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration.
  • [18]. Information Directed Reward Learning for Reinforcement Learning.
  • [19]. Celebrating Diversity in Shared Multi-Agent Reinforcement Learning.
  • [20]. Towards Instance-Optimal Offline Reinforcement Learning with Pessimism.
  • [21]. Environment Generation for Zero-Shot Compositional Reinforcement Learning.
  • [22]. Offline Meta Reinforcement Learning - Identifiability Challenges and Effective Data Collection Strategies.
  • [23]. PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for Reinforcement Learning.
  • [24]. Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation.
  • [25]. Automatic Data Augmentation for Generalization in Reinforcement Learning.
  • [26]. RLlib Flow: Distributed Reinforcement Learning is a Dataflow Problem.
  • [27]. Brick-by-Brick: Combinatorial Construction with Deep Reinforcement Learning.
  • [28]. Bellman-consistent Pessimism for Offline Reinforcement Learning.
  • [29]. Teachable Reinforcement Learning via Advice Distillation.
  • [30]. Inverse Reinforcement Learning in a Continuous State Space with Formal Guarantees.
  • [31]. Online Robust Reinforcement Learning with Model Uncertainty.
  • [32]. Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble.
  • [33]. A Provably Efficient Sample Collection Strategy for Reinforcement Learning.
  • [34]. Near-Optimal Offline Reinforcement Learning via Double Variance Reduction.
  • [35]. Multi-Agent Reinforcement Learning in Stochastic Networked Systems.
  • [36]. When Is Generalizable Reinforcement Learning Tractable?
  • [37]. Learning Markov State Abstractions for Deep Reinforcement Learning.
  • [38]. Towards Deeper Deep Reinforcement Learning with Spectral Normalization.
  • [39]. Adversarial Intrinsic Motivation for Reinforcement Learning.
  • [40]. Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Making by Reinforcement Learning.
  • [41]. TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning.
  • [42]. Model-Based Reinforcement Learning via Imagination with Derived Memory.
  • [43]. Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning.
  • [44]. Compositional Reinforcement Learning from Logical Specifications.
  • [45]. Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning.
  • [46]. Local Differential Privacy for Regret Minimization in Reinforcement Learning.
  • [47]. Continuous Doubly Constrained Batch Reinforcement Learning.
  • [48]. Conservative Data Sharing for Multi-Task Offline Reinforcement Learning.
  • [49]. Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism.
  • [50]. A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning.
  • [51]. Optimization-Based Algebraic Multigrid Coarsening Using Reinforcement Learning.
  • [52]. EDGE: Explaining Deep Reinforcement Learning Policies.
  • [53]. Provably Efficient Black-Box Action Poisoning Attacks Against Reinforcement Learning.
  • [54]. Cross-modal Domain Adaptation for Cost-Efficient Visual Reinforcement Learning.
  • [55]. Pretraining Representations for Data-Efficient Reinforcement Learning.
  • [56]. Tactical Optimism and Pessimism for Deep Reinforcement Learning.
  • [57]. Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning.
  • [58]. Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings.
  • [59]. Outcome-Driven Reinforcement Learning via Variational Inference.
  • [60]. Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning.
  • [61]. Provably Efficient Reinforcement Learning with Linear Function Approximation under Adaptivity Constraints.
  • [62]. Heuristic-Guided Reinforcement Learning.
  • [63]. Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning.
  • [64]. Safe Reinforcement Learning with Natural Language Constraints.
  • [65]. Safe Reinforcement Learning by Imagining the Near Future.
  • [66]. Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation.
  • [67]. MAP Propagation Algorithm: Faster Learning with a Team of Reinforcement Learning Agents.
  • [68]. PettingZoo: Gym for Multi-Agent Reinforcement Learning.
  • [69]. Decision Transformer: Reinforcement Learning via Sequence Modeling.
  • [70]. Nearly Horizon-Free Offline Reinforcement Learning.
  • [71]. Reinforcement Learning with State Observation Costs in Action-Contingent Noiselessly Observable Markov Decision Processes.
  • [72]. Contrastive Reinforcement Learning of Symbolic Reasoning Domains.
  • [73]. Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection.
  • [74]. Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting.
  • [75]. Scalable Online Planning via Reinforcement Learning Fine-Tuning.
  • [76]. An Efficient Transfer Learning Framework for Multiagent Reinforcement Learning.
  • [77]. Risk-Aware Transfer in Reinforcement Learning using Successor Features.
  • [78]. Regret Minimization Experience Replay in Off-Policy Reinforcement Learning.
  • [79]. Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning.
  • [80]. A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning.
  • [81]. Autonomous Reinforcement Learning via Subgoal Curricula.
  • [82]. PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators.
  • [83]. Taming Communication and Sample Complexities in Decentralized Policy Evaluation for Cooperative Multi-Agent Reinforcement Learning.
  • [84]. Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations.
  • [85]. Functional Regularization for Reinforcement Learning via Learned Fourier Features.
  • [86]. Agent Modelling under Partial Observability for Deep Reinforcement Learning.
  • [87]. Conservative Offline Distributional Reinforcement Learning.
  • [88]. Learning Tree Interpretation from Object Representation for Deep Reinforcement Learning.
  • [89]. Explicable Reward Design for Reinforcement Learning Agents.
  • [90]. A Minimalist Approach to Offline Reinforcement Learning.
  • [91]. BCORLE(λ): An Offline Reinforcement Learning and Evaluation Framework for Coupons Allocation in E-commerce Market.
  • [92]. Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning.
  • [93]. Reinforcement Learning based Disease Progression Model for Alzheimer’s Disease.
  • [94]. Accelerating Quadratic Optimization with Reinforcement Learning.
  • [95]. Provably Efficient Causal Reinforcement Learning with Confounded Observational Data.
  • [96]. Hierarchical Reinforcement Learning with Timed Subgoals.
  • [97]. Accelerating Robotic Reinforcement Learning via Parameterized Action Primitives.
  • [98]. Widening the Pipeline in Human-Guided Reinforcement Learning with Explanation and Context-Aware Data Augmentation.
  • [99]. Reinforcement Learning in Newcomblike Environments.
  • [100]. Reinforcement Learning with Latent Flow.
  • [101]. Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs.
  • [102]. Reinforcement Learning Enhanced Explainer for Graph Neural Networks.
  • [103]. The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning.
  • [104]. Causal Influence Detection for Improving Efficiency in Reinforcement Learning.
  • [105]. Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model.
  • [106]. RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents.
  • [107]. The Difficulty of Passive Learning in Deep Reinforcement Learning.
  • [108]. A Hierarchical Reinforcement Learning Based Optimization Framework for Large-scale Dynamic Pickup and Delivery Problems.
  • [109]. Symbolic Regression via Deep Reinforcement Learning Enhanced Genetic Programming Seeding.
  • [110]. Machine versus Human Attention in Deep Reinforcement Learning Tasks.
  • [111]. Offline Constrained Multi-Objective Reinforcement Learning via Pessimistic Dual Value Iteration.
  • [112]. Learning Barrier Certificates: Towards Safe Reinforcement Learning with Zero Training-time Violations.
  • [113]. A Max-Min Entropy Framework for Reinforcement Learning.
  • [114]. Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch.
  • [115]. Robust Deep Reinforcement Learning through Adversarial Loss.
  • [116]. Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature.
  • [117]. Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings.
  • [118]. Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning.
  • [119]. Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning.
  • [120]. Online and Offline Reinforcement Learning by Planning with a Learned Model.
  • [121]. Variational Bayesian Reinforcement Learning with Regret Bounds.
  • [122]. Landmark-Guided Subgoal Generation in Hierarchical Reinforcement Learning.
  • [123]. Parametrized Quantum Policies for Reinforcement Learning.
  • [124]. On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations.
  • [125]. Continual World: A Robotic Benchmark For Continual Reinforcement Learning.
  • [126]. Unsupervised Domain Adaptation with Dynamics-Aware Rewards in Reinforcement Learning.
  • [127]. Deep Reinforcement Learning at the Edge of the Statistical Precipice.
  • [128]. Offline Reinforcement Learning with Reverse Model-based Imagination.
  • [129]. Program Synthesis Guided Reinforcement Learning for Partially Observed Environments.
  • [130]. Structural Credit Assignment in Neural Networks using Reinforcement Learning.

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/41338.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

js——高阶函数、闭包、递归以及浅拷贝和深拷贝

目录 一、高阶函数 1、什么是高阶函数 2、把一个函数作为参数 3、return 返回的也是一个函数 二、闭包 1、闭包是什么 2、变量的作用域 3、案例 4、结果展示: 5、总结: 三、递归 1、什么是递归 2、案例一 3、分析 4、问题 5、栈溢出又是什…

【Unity Shader​】 屏幕后处理5.0:讨论双重模糊的Bloom

接上一篇基于高斯模糊的Bloom继续进行接下来的学习。 1 一些必要的思考* 1.1 关于高质量Bloom 前面提到了,Bloom对于游戏必不可少的效果之一,于是我们不仅仅要把Bloom效果实现出来,效果的质量好坏就更加是我们需要关注的点了。高质量泛光&a…

面试宝典之C++多态灵魂拷问

🧸🧸🧸各位大佬大家好,我是猪皮兄弟🧸🧸🧸 文章目录一、重载,隐藏/重定义,覆盖/重写二、多态的原理三、inline可以是虚函数吗四、静态成员函数可以是虚函数吗五、构造函…

海丝一号-中国-2020

2020年12月22日,由中国电科38所和天仪研究院联合研制的我国首颗商业SAR卫星“海丝一号”搭载长征八号运载火箭在文昌卫星发射中心成功发射。海丝一号历时一年完成研制,整星重量小于185kg,成像最高分辨率1m,可以全天时、全天候对陆…

章节5 文件与目录管理

5-Linux文件和目录管理 (Linux操作系统-2022的前面章节都为铺垫) 常见命令格式 Command Options Arguments 命令 选项 参数 rm -rf /* -一个字母或字母组合,此选项为短选项,–单词,此选项为长选项 Options选项&…

因果推断 | 双重差分法笔记补充

换了新的环境后,一直在适应(其实是一直被推着走),所以停更了笔记好久啦。这一周周末终于有点得空,当然也是因为疫情,哪里都不能去,哈哈,所以来冒个泡~ 整理了最近pre的作业&#xf…

ESP32-CAM初始篇:Arduino环境搭建-->实现局域网推流

ESP32-CAM初始篇:Arduino环境搭建–>实现局域网推流 入手产品:安信可科技:ESP32-CAM摄像头开发板: 相关产品特性请访问安信可ESP32-CAM官网:https://docs.ai-thinker.com/esp32-cam 第一步:下载Ardui…

基于51单片机数字频率计的设计

目录 前 言 1 第一章 总体设计方案 2 1.1 总设计框图 2 1.2 硬件设计分析 2 1.2.1 电源的设计 2 (4):LCD1602的指令说明及时序 10 (5): LCD1602的RAM地址映射及标准字库表 13 第二章 软件设计与分析 15 2.1…

谷粒商城十一商城系统及整合thymeleaf渲染商城首页

我们的商城系统本应该也是前后端分离的,就像后台管理系统那样,然而出于教学考虑,前后端分离的话就会屏蔽掉很多细节,所以我们进行服务端的页面渲染式开发(有点儿类似freemarker) 这些页面直接粘贴到微服务…

含论文基于JSP的零食销售商城【数据库设计、源码、开题报告】

数据库脚本下载地址: https://download.csdn.net/download/itrjxxs_com/86500759 主要使用技术 ServletJSPcssjsMysqlTomcat 功能介绍 (1)前台功能模块: 注册登陆:顾客可以通过填写注册信息成为会员,登陆后才能进行购物车的管…

汽车 Automotive > SOME/IP应用学习

目录 SOME/IP介绍 SOME/IP主要功能 SOME/IP协议 SOME/IP服务类型 SOME/IP-举例 SOME/IP各模块协议 SOME/IP-基础元件 SOME/IP-SoAD SOME/IP-SD协议 SOME/IP-SD举例 SOME/IP-TP协议 SOME/IP-TP举例 SOME/IP介绍 SOME/IP ( Scalable service-Oriented Middleware ove…

基于Android的JavaEE课设

目录 1 技术栈 2 android前端 2.1 概述 2.1.1 目录结构 2.1.2 代码分层 2.2 技术点 2.2.1 数据绑定 2.2.2 前后端数据交互 2.2.3 九宫格图片 2.2.4 未处理消息提醒 2.2.5 动画效果 2.2.6 实时聊天 2.2.7 文件上传 2.2.8 底部弹窗 2.2.9 其他 3 后端 3.1 概述 …

BUUCTF Misc 假如给我三天光明 数据包中的线索 后门查杀 webshell后门

假如给我三天光明 下载文件,一个压缩包(需要密码)和图片 百度得知下面一行是盲文,根据盲文对照表 和上述图片对照,得到字符串:kmdonowg 。使用它解压压缩包 使用Audacity打开 转换成摩斯密码,…

C语言程序设计 复习总结[持续更新ing]

目录 一 初识C语言 1 main 主函数 2 注释 3 C 程序执行的过程: 4 C 程序的结构 5 进制的转换 1.内存容量 2.二进制与十进制的转换 1>将二进制数转换成十进制 2>将十进制转换成二进制数 3.二进制与八进制的转换 1>将八进制数转换成二进制: 2>将二进…

Java项目:JSP酒店客房管理系统

作者主页:源码空间站2022 简介:Java领域优质创作者、Java项目、学习资料、技术互助 文末获取源码 项目介绍 酒店管理系统共分为三个角色,客房经理、前台管理员、客户,各个角色的权限各不相同; 客房经理功能包括&#…

leetcode《图解数据结构》刷题日志【第五周】(2022/11/21-2022/11/28)

leetcode《图解数据结构》刷题日志【第五周】1. 剑指 Offer 60. n 个骰子的点数1.1 题目1.2 解题思路1.3 数据类型功能函数总结1.4 java代码1.5 踩坑小记1.6 进阶做法2. 剑指 Offer 63. 股票的最大利润2.1 题目2.2 解题思路2.3 数据类型功能函数总结2.4 java代码3. 剑指 Offer …

SpringBoot SpringBoot 原理篇 1 自动配置 1.16 自动配置原理【2】

SpringBoot 【黑马程序员SpringBoot2全套视频教程,springboot零基础到项目实战(spring boot2完整版)】 SpringBoot 原理篇 文章目录SpringBootSpringBoot 原理篇1 自动配置1.16 自动配置原理【2】1.16.1 看源码了1.16.2 Import({AutoConfig…

archlinux 安装matlab

最近在学matlab使用的是windows版本的,比起windows我更喜欢在linux中写代码。于是乎就想在Linux中安装一下。 主要过程参考此篇文章: 《【首发】 ubuntu20.04安装matlab2021b/matlab2020b》 https://blog.csdn.net/hanjuefu5827/article/details/1151677…

【Hack The Box】Linux练习-- Forge

HTB 学习笔记 【Hack The Box】Linux练习-- Forge 🔥系列专栏:Hack The Box 🎉欢迎关注🔎点赞👍收藏⭐️留言📝 📆首发时间:🌴2022年11月27日🌴 &#x1f36…

队列(C语言实现)

文章目录:1.队列的概念2.队列的结构3.接口实现3.1初始化队列3.2判断队列是否为空3.3入队3.4出队3.5查看队头元素3.6查看队尾元素3.7统计队列数据个数3.8销毁队列1.队列的概念 队列:只允许在一端进行插入数据操作,在另一端进行删除数据操作的特…