编者按:
在本系列文章中,我们梳理了运筹学顶刊Management Science在2024年5月份发布有关OR/OM以及相关应用的9篇文章的基本信息,旨在帮助读者快速洞察领域新动态。
推荐文章1
- 题目:Sensitivity Analysis of the Cost Coefficients in Multiobjective Integer Linear Optimization多目标整数线性优化中成本系数的敏感性分析
- 期刊:Management Science
- 原文链接:https://doi.org/10.1287/mnsc.2021.01406
- 发表日期:2024/05/02
- 作者:Kim Allan Andersen , Trine Krogh Boomsma , Britta Efkes , Nicolas Forget
- 摘要:
- 本文考虑了多目标整数线性规划问题中成本系数的敏感性分析。我们定义敏感性区域为系数同时变化的集合,在此集合中,有效解集合及其结构保持不变。具体来说,我们要求有效解之间的分量间关系得以保持,并且无效解继续保持无效,我们证明这对于保证变化后有效解集合保持不变是充分的。对于单一系数,我们表明可以从考虑中排除一部分无效解。更具体地说,我们证明只需检验在某两个相关的 q+1 目标问题中的一个为有效解的 q 目标问题的无效解即可。最后,我们证明敏感性区域是一个凸集(一个区间)。我们的方法可推广到多个系数的同时变化。作为示例,我们考察了均值-方差资本预算问题,并确定了有效投资组合集保持不变的区域,尽管项目的净现值存在错配或改变。进一步的计算实践与多目标二进制和整数背包问题证明了我们技术的普适性。例如,在不到半小时的CPU时间内,我们通过排除大量无效解,获得了具有500个二进制变量的双目标问题单个系数变化的所有敏感性区间。实际上,被排除的解的数量是剩余解数量的超过100个数量级。
- This paper considers sensitivity analysis of the cost coefficients in multiobjective integer linear programming problems. We define the sensitivity region as the set of simultaneous changes to the coefficients for which the efficient set and its structure remain the same. In particular, we require that the component-wise relation between efficient solutions is preserved and that inefficient solutions remain inefficient, and we show that this is sufficient for the efficient set to be the same upon changes. For a single coefficient, we show that a subset of the inefficient solutions can be excluded from consideration. More specifically, we prove that it suffices to inspect the inefficient solutions of a q-objective problem that are efficient in one of two related q + 1-objective problems. Finally, we show that the sensitivity region is a convex set (an interval). Our approach generalizes to simultaneous changes in multiple coefficients. For illustration, we consider mean-variance capital budgeting and determine the region for which the set of efficient portfolios remains the same, despite a misspecification or a change in the net present values of the projects. Further computational experience with multiobjective binary and integer knapsack problems demonstrates the general applicability of our technique. For instance, we obtain all sensitivity intervals for changes to single coefficients of biobjective problems with 500 binary variables in less than half an hour of CPU time by excluding a large number of inefficient solutions. In fact, the number of excluded solutions is above 100 orders of magnitude larger than the number of remaining solutions.
推荐文章2
- 题目:Learning to Optimize Contextually Constrained Problems for Real-Time Decision Generation 学习优化具有语境约束的问题以生成实时决策
- 期刊:Management Science
- 原文链接:https://doi.org/10.1287/mnsc.2020.03565
- 发表日期:2024/05/03
- 作者:Aaron Babier , Timothy C. Y. Chan , Adam Diamant , Rafid Mahmood
- 摘要:
- 机器学习解决优化问题的相关话题已经引起了运筹学和机器学习领域的关注。在本文中,我们结合这两个领域的理念,来解决学习生成决策的问题,这些决策应对的是具有潜在非线性或非凸约束的优化问题实例,其中可行解集随着语境特征而变化。我们提出了一个新颖的框架,通过结合内点法和对抗学习来训练生成模型,以产生可证明的最优决策,我们进一步将其嵌入到一个迭代数据生成算法中。为此,我们首先训练一个分类器来学习可行性,然后使用分类器作为正则化器训练生成模型,以生成优化问题的最优决策。我们证明,我们模型生成的决策满足样本内和样本外的最优性保证。此外,学习模型被嵌入到一个主动学习循环中,在该循环中,合成实例被迭代地添加到训练数据中;这使我们能够逐步生成可证明更紧致的最优决策。我们调查了投资组合优化和个性化治疗设计的案例研究,证明我们的方法相比于预测然后优化以及监督深度学习技术都更具有优势。特别是,我们的框架相比于预测然后优化的范式更能抵抗参数估计误差,与监督学习模型相比,能更好地适应领域转移。
- The topic of learning to solve optimization problems has received interest from both the operations research and machine learning communities. In this paper, we combine ideas from both fields to address the problem of learning to generate decisions to instances of optimization problems with potentially nonlinear or nonconvex constraints where the feasible set varies with contextual features. We propose a novel framework for training a generative model to produce provably optimal decisions by combining interior point methods and adversarial learning, which we further embed within an iterative data generation algorithm. To this end, we first train a classifier to learn feasibility and then train the generative model to produce optimal decisions to an optimization problem using the classifier as a regularizer. We prove that decisions generated by our model satisfy in-sample and out-of-sample optimality guarantees. Furthermore, the learning models are embedded in an active learning loop in which synthetic instances are iteratively added to the training data; this allows us to progressively generate provably tighter optimal decisions. We investigate case studies in portfolio optimization and personalized treatment design, demonstrating that our approach yields advantages over predict-then-optimize and supervised deep learning techniques, respectively. In particular, our framework is more robust to parameter estimation error compared with the predict-then-optimize paradigm and can better adapt to domain shift as compared with supervised learning models.
推荐文章3
- 题目:The (Surprising) Sample Optimality of Greedy Procedures for Large-Scale Ranking and Selection 大规模排序与选择问题中贪心算法的(令人惊讶的)样本最优性
- 期刊:Management Science
- 原文链接:https://doi.org/10.1287/mnsc.2023.00694
- 发表日期:2024/05/07
- 作者:Zaile Li , Weiwei Fan , L. Jeff Hong
- 摘要:
- 排序与选择(R&S)的目标是从有限的备选方案集中选出平均表现最优的最佳方案。最近,对涉及大量备选方案的大规模R&S问题的关注显著增加。理想的大规模R&S算法步骤应当达到样本最优;即保证正确选择概率(PCS)渐进地非零所需的总样本量应按照备选方案数量k的最小阶数(线性阶数)增长。出人意料的是,我们发现,简单的贪心算法,即不断采样平均值最大的备选方案,表现出色且看似达到样本最优。为了理解这一发现,我们开发了一种新的boundary-crossing视角,并证明在最佳平均值至少保持与其他所有平均值有一个正常数差的情况下,随着k增加,贪心算法是样本最优的。我们进一步显示了得到的PCS下界对于具有共同方差的均值滑移配置是渐近紧确的。在其他情境中,我们考虑了良好选择的概率,并发现结果取决于良好备选方案数量的增长行为:如果该数量随k增加而保持有界,样本最优性依然成立;否则,结果可能发生变化。此外,我们通过在贪心算法中添加一个探索阶段,提出了“先探索贪心算法步骤”。这些算法步骤在相同的假设下被证明是样本最优且一致的。最后,我们通过数值研究来探究我们的贪心算法步骤在解决大规模R&S问题中的表现。
- Ranking and selection (R&S) aims to select the best alternative with the largest mean performance from a finite set of alternatives. Recently, considerable attention has turned toward the large-scale R&S problem which involves a large number of alternatives. Ideal large-scale R&S procedures should be sample optimal; that is, the total sample size required to deliver an asymptotically nonzero probability of correct selection (PCS) grows at the minimal order (linear order) in the number of alternatives, k. Surprisingly, we discover that the naïve greedy procedure, which keeps sampling the alternative with the largest running average, performs strikingly well and appears sample optimal. To understand this discovery, we develop a new boundary-crossing perspective and prove that the greedy procedure is sample optimal for the scenarios where the best mean maintains at least a positive constant away from all other means as k increases. We further show that the derived PCS lower bound is asymptotically tight for the slippage configuration of means with a common variance. For other scenarios, we consider the probability of good selection and find that the result depends on the growth behavior of the number of good alternatives: if it remains bounded as k increases, the sample optimality still holds; otherwise, the result may change. Moreover, we propose the explore-first greedy procedures by adding an exploration phase to the greedy procedure. The procedures are proven to be sample optimal and consistent under the same assumptions. Last, we numerically investigate the performance of our greedy procedures in solving large-scale R&S problems.
推荐文章4
- 题目:Model-Free Nonstationary Reinforcement Learning: Near-Optimal Regret and Applications in Multiagent Reinforcement Learning and Inventory Control无模型非静态强化学习:接近最优的遗憾度及其在多智能体强化学习和库存控制中的应用
- 期刊:Management Science
- 原文链接:https://doi.org/10.1287/mnsc.2022.02533
- 发表日期:2024/05/14
- 作者:Weichao Mao , Kaiqing Zhang , Ruihao Zhu , David Simchi-Levi , Tamer Başar
- 摘要:
- 我们考虑在非静态马尔可夫决策过程中的无模型强化学习(RL)。奖励函数和状态转移函数均可以随时间任意变化,只要它们的累积变化不超过特定的变化限额。我们提出了一种名为带上限置信区间的重启Q学习(RestartQ-UCB)的算法,这是第一个针对非静态RL的无模型算法,并展示它在动态遗憾方面超越了现有解决方案。具体来说,带Freedman型奖励项的RestartQ-UCB在动态遗憾上达到了以下界限: O ~ ( S 3 A 3 Δ 3 H T 3 / 2 ) \tilde{O}(\sqrt{S^3 A^3 \Delta^3 H T^{3/2}}) O~(S3A3Δ3HT3/2),其中S和A分别是状态和动作的数量, Δ > 0 \Delta > 0 Δ>0是变化限额, H H H是每个周期的时间步数, T T T是总时间步数。我们进一步提出了一种名为双重重启Q-UCB的无参数算法,该算法不需要预先知道变化限额。我们展示我们的算法通过建立一个信息理论下界 Ω ( S 3 A 3 Δ 3 H T 3 / 2 ) \Omega(\sqrt{S^3 A^3 \Delta^3 H T^{3/2}}) Ω(S3A3Δ3HT3/2),即非静态RL中的第一个下界,几乎是最优的。数值实验验证了RestartQ-UCB在累积奖励和计算效率方面的优势。我们通过多智能体RL和跨领域产品库存控制的示例展示了我们结果的强大实用性。
- We consider model-free reinforcement learning (RL) in nonstationary Markov decision processes. Both the reward functions and the state transition functions are allowed to vary arbitrarily over time as long as their cumulative variations do not exceed certain variation budgets. We propose Restarted Q-Learning with Upper Confidence Bounds (RestartQ-UCB), the first model-free algorithm for nonstationary RL, and show that it outperforms existing solutions in terms of dynamic regret. Specifically, RestartQ-UCB with Freedman-type bonus terms achieves a dynamic regret bound of O ~ ( S 3 A 3 Δ 3 H T 3 / 2 ) \tilde{O}(\sqrt{S^3 A^3 \Delta^3 H T^{3/2}}) O~(S3A3Δ3HT3/2), where S and A are the numbers of states and actions, respectively, Δ > 0 \Delta > 0 Δ>0 is the variation budget, H is the number of time steps per episode, and T is the total number of time steps. We further present a parameter-free algorithm named Double-Restart Q-UCB that does not require prior knowledge of the variation budget. We show that our algorithms are nearly optimal by establishing an information-theoretical lower bound of Ω ( S 3 A 3 Δ 3 H T 3 / 2 ) \Omega(\sqrt{S^3 A^3 \Delta^3 H T^{3/2}}) Ω(S3A3Δ3HT3/2), the first lower bound in nonstationary RL. Numerical experiments validate the advantages of RestartQ-UCB in terms of both cumulative rewards and computational efficiency. We demonstrate the power of our results in examples of multiagent RL and inventory control across related products.
推荐文章5
- 题目:Predictive Three-Dimensional Printing of Spare Parts with Internet of Things基于物联网的备件预测性3D打印
- 期刊:Management Science
- 原文链接:https://doi.org/10.1287/mnsc.2023.00978
- 发表日期:2024/05/21
- 作者:Jing-Sheng Song , Yue Zhang
- 摘要:
- 工业4.0整合了数字技术和物理技术来转变工作管理,其中两个核心推动因素是物联网(IoT)和3D打印(3DP)。物联网实时监控复杂系统,而3D打印则使得制造业能够对实时信息做出敏捷响应。然而,这两者如何整合在一起,其细节尚不明确。为了深入了解,我们考虑了一个场景——一台3D打印机为多个嵌入传感器并通过物联网连接的机器提供关键部件。尽管公众认为这种整合能够实现按需打印,但我们的研究表明情况并非必然如此。相反,真正的优势在于能够进行预测性打印。特别是,对于3D打印机来说,基于依赖于系统状态的阈值进行预测性库存打印通常更为有效。我们还识别出一种称为预测性按需打印的模式,该模式允许库存最小化,我们发现3D打印的速度是影响其最优性的主要因素。此外,我们通过分别分析嵌入式传感器提供的预先信息和通过物联网的实时信息融合的影响,评估了物联网在成本削减中的价值。我们发现物联网总体上提供了显著的价值。然而,物联网价值随系统扩大而增加的传统观念仅在扩展与适当的3D打印能力相匹配时适用。我们的框架可以帮助就物联网/嵌入式传感器的投资决策提供信息,并支持预测性3D打印的调度工具的开发。
- Industry 4.0 integrates digital and physical technologies to transform work management, where two core enablers are the internet of things (IoT) and three-dimensional printing (3DP). IoT monitors complex systems in real time, whereas 3DP enables agile manufacturing that can respond to real-time information. However, the details of how these two can be integrated are not yet clear. To gain insights, we consider a scenario where a three-dimensional (3D) printer supplies a critical part to multiple machines that are embedded with sensors and connected through IoT. Although the public perception indicates that this integration would enable on-demand printing, our research suggests that this is not necessarily the case. Instead, the true benefit is the ability to print predictively. In particular, it is typically more effective for the 3D printer to predictively print to stock based on a threshold that depends on the system’s status. We also identify a printing mode called predictive print on demand that allows for minimal inventory, and we find the speed of 3DP to be the primary factor that influences its optimality. Furthermore, we assess the value of IoT in cost reductions by separately analyzing the impact of advance information from embedded sensors and the real-time information fusion through IoT. We find that IoT provides significant value in general. However, the conventional wisdom that IoT’s value scales up for larger systems is suitable only when the expansion is paired with appropriate 3DP capacity. Our framework can help inform investment decisions regarding IoT/embedded sensors and support the development of scheduling tools for predictive 3DP.
推荐文章6
- 题目:Self-Adapting Network Relaxations for Weakly Coupled Markov Decision Processes弱耦合马尔可夫决策过程的自适应网络松弛
- 期刊:Management Science
- 原文链接:https://doi.org/10.1287/mnsc.2022.01108
- 发表日期:2024/05/21
- 作者:Selvaprabu Nadarajah , Andre A. Cire
- 摘要:
- 高维弱耦合马尔可夫决策过程(WDPs)在动态决策和强化学习中出现,当放松连接约束时,这些过程分解为较小的马尔可夫决策过程(MDPs)。WDP的拉格朗日松弛(LAG)利用这一特性来高效地计算策略和(乐观的)界限;然而,对连接约束进行对偶处理会平均掉组合信息。我们引入了可行性网络松弛(FNRs),这是一类精确表示连接约束的线性规划松弛的新分类。我们开发了一种获取唯一最小松弛的方法,我们称之为自适应FNR,因为其大小会根据连接约束的结构自动调整。我们的分析为模型选择提供了信息:(i)自适应FNR提供了比LAG(弱)更强的界限,当连接约束允许一个可处理的网络表示时,其规模为多项式大小,甚至可以比LAG更小,(ii)自适应FNR提供的界限和策略与近似线性规划(ALP)方法相匹配,但其规模比ALP的公式和最近的等效于ALP的拉格朗日方法小得多。我们在受约束的动态分类和抢占式维护应用上进行了数值实验。我们的结果表明,自适应FNR在策略性能和/或界限方面显著优于LAG,同时比其他拉格朗日方法和ALP快一个数量级,而后者在某些实例中是无法求解的。
- High-dimensional weakly coupled Markov decision processes (WDPs) arise in dynamic decision making and reinforcement learning, decomposing into smaller Markov decision processes (MDPs) when linking constraints are relaxed. The Lagrangian relaxation of WDPs (LAG) exploits this property to compute policies and (optimistic) bounds efficiently; however, dualizing linking constraints averages away combinatorial information. We introduce feasibility network relaxations (FNRs), a new class of linear programming relaxations that exactly represents the linking constraints. We develop a procedure to obtain the unique minimally sized relaxation, which we refer to as self-adapting FNR, as its size automatically adjusts to the structure of the linking constraints. Our analysis informs model selection: (i) the self-adapting FNR provides (weakly) stronger bounds than LAG, is polynomially sized when linking constraints admit a tractable network representation, and can even be smaller than LAG, and (ii) self-adapting FNR provides bounds and policies that match the approximate linear programming (ALP) approach but is substantially smaller in size than the ALP formulation and a recent alternative Lagrangian that is equivalent to ALP. We perform numerical experiments on constrained dynamic assortment and preemptive maintenance applications. Our results show that self-adapting FNR significantly improves upon LAG in terms of policy performance and/or bounds, while being an order of magnitude faster than an alternative Lagrangian and ALP, which are unsolvable in several instances.
推荐文章7
- 题目:Thompson Sampling with Information Relaxation Penalties带有信息松弛惩罚的汤普森采样
- 期刊:Management Science
- 原文链接:https://doi.org/10.1287/mnsc.2020.01396
- 发表日期:2024/05/22
- 作者:Seungki Min , Costis Maglaras , Ciamac C. Moallemi
- 摘要:
- 我们在贝叶斯环境中考虑一个有限时间范围的多臂老虎机(MAB)问题,并提出了一个信息松弛采样框架。在这个框架下,我们定义了一组直观的控制策略家族,其中包含汤普森采样(TS)和贝叶斯最优策略作为端点。类似于TS在每个决策时段拉动一个相对于随机采样参数最优的臂,我们的算法采样整个未来的奖励实现并采取相应的最佳行动。然而,这是在存在“惩罚”的情况下进行的,这些惩罚旨在补偿未来信息的可用性。我们为MAB问题开发了几种新颖的策略和性能界限,这些策略在性能提升和计算复杂性增加方面介于两个端点之间。我们的策略可以被视为TS的自然推广,同时结合了时间范围的知识,并明确考虑了探索与利用的权衡。我们证明了关于性能界限和次优差距的相关结构结果。数值实验表明,这类新策略在特定情景下表现良好,特别是在有限时间范围内引入显著探索与利用张力的问题中。最后,受到有限时间范围Gittins指数的启发,我们提出了一种基于本文框架的指数策略,特别是在我们的数值实验中,优于当前最先进的算法。
- We consider a finite-horizon multiarmed bandit (MAB) problem in a Bayesian setting, for which we propose an information relaxation sampling framework. With this framework, we define an intuitive family of control policies that include Thompson sampling (TS) and the Bayesian optimal policy as endpoints. Analogous to TS, which at each decision epoch pulls an arm that is best with respect to the randomly sampled parameters, our algorithms sample entire future reward realizations and take the corresponding best action. However, this is done in the presence of “penalties” that seek to compensate for the availability of future information. We develop several novel policies and performance bounds for MAB problems that vary in terms of improving performance and increasing computational complexity between the two endpoints. Our policies can be viewed as natural generalizations of TS that simultaneously incorporate knowledge of the time horizon and explicitly consider the exploration-exploitation trade-off. We prove associated structural results on performance bounds and suboptimality gaps. Numerical experiments suggest that this new class of policies perform well, in particular, in settings where the finite time horizon introduces significant exploration-exploitation tension into the problem. Finally, inspired by the finite-horizon Gittins index, we propose an index policy that builds on our framework that particularly outperforms the state-of-the-art algorithms in our numerical experiments.
推荐文章8
- 题目:Pigeonhole Design: Balancing Sequential Experiments from an Online Matching Perspective 鸽笼设计:从在线匹配的角度平衡序贯实验
- 期刊:Management Science
- 原文链接:https://doi.org/10.1287/mnsc.2023.02184
- 发表日期:2024/05/24
- 作者:Jinglong Zhao , Zijie Zhou
- 摘要:
- 从业者和学者在进行随机实验时长期以来一直重视协变量平衡的优势。然而,对于进行在线A/B测试的面向网络的公司来说,当实验对象依次到达时,平衡协变量信息仍然是一个挑战。在本文中,我们研究了一种在线实验设计问题,我们称之为在线区组问题。在这个问题中,具有异质性协变量信息的实验对象依次到达,并且必须立即分配到对照组或处理组。目标是最小化总差异,总差异被定义为两个组之间的最小权重完美匹配。为了解决这个问题,我们提出了一种随机实验设计,称之为鸽笼设计。鸽笼设计首先将协变量空间划分为更小的空间,我们称之为鸽笼,当实验对象到达每个鸽笼时,平衡每个鸽笼中对照组和处理组的数量。我们分析了鸽笼设计的理论性能,并通过与两个知名的基准设计——配对设计和完全随机设计——进行比较,展示了其有效性。我们确定了鸽笼设计在某些情况下相对于基准设计的优势。最后,我们使用Yahoo!数据进行了广泛的模拟,表明如果使用鸽笼设计来估计平均处理效果,方差减少了10.2%。
- Practitioners and academics have long appreciated the benefits of covariate balancing when they conduct randomized experiments. For web-facing firms running online A/B tests, however, it still remains challenging in balancing covariate information when experimental subjects arrive sequentially. In this paper, we study an online experimental design problem, which we refer to as the online blocking problem. In this problem, experimental subjects with heterogeneous covariate information arrive sequentially and must be immediately assigned into either the control or the treated group. The objective is to minimize the total discrepancy, which is defined as the minimum weight perfect matching between the two groups. To solve this problem, we propose a randomized design of experiment, which we refer to as the pigeonhole design. The pigeonhole design first partitions the covariate space into smaller spaces, which we refer to as pigeonholes, and then, when the experimental subjects arrive at each pigeonhole, balances the number of control and treated subjects for each pigeonhole. We analyze the theoretical performance of the pigeonhole design and show its effectiveness by comparing against two well-known benchmark designs: the matched-pair design and the completely randomized design. We identify scenarios when the pigeonhole design demonstrates more benefits over the benchmark design. To conclude, we conduct extensive simulations using Yahoo! data to show a 10.2% reduction in variance if we use the pigeonhole design to estimate the average treatment effect.
推荐文章9
- 题目:Prediction-Driven Surge Planning with Application to Emergency Department Nurse Staffing基于预测的激增规划及其在急诊科护士配置中的应用
- 期刊:Management Science
- 原文链接:https://doi.org/10.1287/mnsc.2021.0278
- 发表日期:2024/05/24
- 作者:Yue Hu , Carri W. Chan , Jing Dong
- 摘要:
- 确定急诊科(ED)护士配置决策以平衡服务质量和配置成本可能极具挑战性,特别是在患者需求高度不确定的情况下。数据可用性的增加和预测分析的持续进步提供了利用需求预测来减轻需求不确定性的机会。在这项工作中,我们研究了一个两阶段的基于预测的人员配置框架,其中预测模型与急诊科的基础(提前几周做出)和激增(几乎实时做出)护士配置决策相结合。我们量化了使用更昂贵的激增人员配置的好处,并确定了平衡需求不确定性与系统随机性的重要性。我们还提出了一种近乎最优的两阶段人员配置策略,该策略易于理解和实施。最后,我们开发了一个统一框架,将参数估计、实时需求预测和急诊科护士配置结合起来。急诊科的高保真模拟实验表明,该框架有潜力在保证及时获得护理服务的同时,将年度配置成本降低10%–16%(200万至300万美元)。
- Determining emergency department (ED) nurse staffing decisions to balance quality of service and staffing costs can be extremely challenging, especially when there is a high level of uncertainty in patient demand. Increasing data availability and continuing advancements in predictive analytics provide an opportunity to mitigate demand uncertainty by using demand forecasts. In this work, we study a two-stage prediction-driven staffing framework where the prediction models are integrated with the base (made weeks in advance) and surge (made nearly real-time) nurse staffing decisions in the ED. We quantify the benefit of having the ability to use the more expensive surge staffing and identify the importance of balancing demand uncertainty versus system stochasticity. We also propose a near-optimal two-stage staffing policy that is straightforward to interpret and implement. Last, we develop a unified framework that combines parameter estimation, real-time demand forecasts, and nurse staffing in the ED. High-fidelity simulation experiments for the ED demonstrate that the proposed framework has the potential to reduce annual staffing costs by 10%–16% ($2 M–$3 M) while guaranteeing timely access to care.