HYPRE: BoomerAMG选项和优化

news2024/12/21 22:03:37


  • BoomerAMG选项和优化
    • Overview
    • AMG Algorithm
    • Options
    • Turning on BoomerAMG
    • Strong Threshold
    • Going Deeper
    • Timing
    • More Options
      • Max Levels
      • Coarsen Type
      • Agressive Coarsening
      • Interpolation Type
      • P Max
      • Putting it All Together
    • Full List of Options


Hypre / BoomerAMG


Hypre is a set of solvers/preconditioners from Lawrence Livermore National Laboratory. The main Hypre website can be found here. For MOOSE we mainly use Hypre’s algebraic multigrid (AMG) package: BoomerAMG.

AMG is a scalable, efficient algorithm for solution of PDEs that are fairly elliptic. Many different sets of PDEs fall into that category including heat conduction, solid mechanics, porous flow, species diffusion, etc.

AMG Algorithm

I hope to fill this out with some details about how AMG works - but I don’t have time right now.


BoomerAMG has an incredible number of options, many of which can have a large impact on solve speed and convergence rate. The defaults, as set by PETSc, are ok for small two-dimensional problems. However, if solving in 3D or on over 32 processors you should take some time to familiarize yourself with these options. It can be daunting, so if you start to get too deep always turn to moose-users for help!

We specify options for Hypre using PETSc command-line option syntax. On the command-line these take the form of -option value. However, we also supply a way of setting these in the input file. In both the Executioner block and Preconditioner blocks you can set petsc_options_iname and petsc_options_value. These two hold the parameter names and values, respectively, that you would like to set.

Turning on BoomerAMG

To turn on Hypre-BoomerAMG preconditioning you would use this in your input file:

  petsc_options_iname = '-pc_type -pc_hypre_type'
  petsc_options_value = 'hypre    boomeramg'

This is equivalent to setting -pc_type hypre -pc_hypre_type boomeramg on the command-line.

Notice that it takes two options to turn on BoomerAMG: one to select Hypre - and one to select BoomerAMG from the Hypre package. Hypre technically contains many solvers and preconditioners, but many of them overlap with what PETSc already provides.

Strong Threshold

By far, the most important option is -pc_hypre_boomeramg_stong_threshold. This option controls the primary coarsening mechanism: removal of entries from the matrix by simply deciding they’re unimportant. What you’re setting here is a threshold: the (scaled) value (between 0 and 1) the entry in the matrix must be over to be kept. Everything below the threshold will be discarded. This means that setting this to a higher amount (between 0 and 1) will discard more of the matrix. Discarding more entries is generally good for iteration speed (i.e. how fast each trip through BoomerAMG is) but can dramatically impact the quality of the preconditioner so going too far will lead to overall worse performance by requiring a larger number of linear iterations.

By default this is set to 0.25. That generally works fine in 2D… but is nowhere close for 3D.

If MOOSE detects that you’re using Hypre BoomerAMG and running in 3D it will automatically assign -pc_hypre_boomeramg_strong_threshold to be 0.7. This was chosen by reading a lot of literature and doing some small-scale optimization tests by the MOOSE team. HOWEVER: 0.7 is NOT a golden number. Depending on your problem you may need more coarsening (0.8) or less (0.6 or 0.5 to help convergence). Be warned though: I highly recommend that you never set this below 0.5 for any 3D problem. The problem will explode with a huge amount of time and memory taken up by the preconditioner.

Going Deeper

If you’re reading this far, then you’ve probably run into a real problem. Either you’re not getting the speed/scalability you want, or you’re not getting convergence. I’ll try to put these in order of importance (in my opinion) and give you some guidance for each one.

In general, speeding up BoomerAMG or improving scalability typically comes from doing more coarsening. As a reminder: the first thing to do is make sure you have -pc_hypre_boomeramg_strong_threshold set appropriately for your problem (see above). Even if you have it set to 0.25 (for 2D) or 0.7 (for 3D) you might try increasing it some to try to find that sweet spot between effeciency and effectiveness.


Before venturing futher, you will definitely want to turn on the performance log (“perf log”). You do that by putting print_perf_log = true in the [Outputs] block in your input file. At the end of the solve it will print out a table showing times.

For preconditioning what you want to pay attention to is the Total Time With Sub column. The total time during the nonlinear solve is in the solve() row. Your objective should be to reduce that. solve() is mainly a combination of three things: compute_residual(), compute_jacobian() and the preconditioner (with a little going to the linear/nonlinear solver in PETSc).

The first thing to do is look at how much of the total time compute_jacobian() and compute_residual() are taking. When pushing BoomerAMG far (trying to scale a problem out to many cores) what will happen is that compute_jacobian() and compute_residual() will take smaller and smaller portions of the total solve() time. At any point you want to keep compute_residual() + compute_jacobian() to around 60%-70% of the total solve() time. The remainder of the solve() time is solver / preconditioning time - with the majority of that going into BoomerAMG. If BoomerAMG is taking more than 50% of the solve time then either you’ve scaled your problem too far (try to keep at least 5000 DoFs per processor - 10k is even better) or you need to start adjusting BoomerAMG options.

More Options

Let’s dive into some of the more advanced options.

Max Levels

The first option that I want to draw your attention to is also one you should probably leave alone for now, but I point it out because everyone wants to mess with it. -pc_hypre_boomeramg_max_levels controls the number of “levels” in the multigrid solve: i.e. the number of coarser problems that are produced. The default for this option is 25. You might think that you could save time by making this smaller or that you could make the algorithm more accurate by making this larger: actually neither of those are the truth!

One thing to understand about multigrid is that it’s trying to generate a really small “coarsest” problem that it will actually solve (usually using a direct/Gauss elimination solver). If you artificially limit the number of levels what you’re doing is not allowing the algorithm to reach the coarsest level, which means that you’re doing an expensive direct solve on a larger problem… which is slower. Yes, by reducing the number of levels you can actually slow down your solve!

What about increasing the levels? Well, the problem can only get so coarse. So increasing the number of levels typically has no effect. Even a problem with millions of DoFs will typically only need ~15 levels or so. You can see how many levels BoomerAMG is using by turning on -pc_hypre_boomeramg_print_statistics.

Coarsen Type

As mentioned before, speeding up Hypre is usually done by doing more coarsening. The main option that controls coarsening is -pc_hypre_boomeramg_coarsen_type. By default this is set to Falgout which is a good mix of efficiency and accuracy. However, there is typically some performance to be gained by using more aggressive options. In particular, when solving a 3D problem you should try using HMIS or PMIS (in that order). Both of these use very little parallel communication but do an excellent job at removing matrix entries to get to coarser problems.

There are a lot more options other than Falgout, HMIS and PMIS - but I’m not going to list them here because those are really the ones you will want to use.

Agressive Coarsening

Another option that can do a lot of coarsening is “Aggressive Coarsening”. BoomerAMG actually has many parameters surrounding this - but currently only 2 are available to us as PETSc options: -pc_hypre_boomeramg_agg_nl and -pc_hypre_boomeramg_agg_num_paths.

-pc_hypre_boomeramg_agg_nl is the number of coarsening levels to apply “aggressive coarsening” to. Aggressive coarsening does just what you think it does: it tries even harder to remove matrix entries. The way it does this is looking at “second-order” connections: does there exist a path from one important entry to another important entry through several other entries. By looking at these pathways the algorithm will decide whether or not to keep an entry. Doing more aggressive coarsening will result in less time spent in BoomerAMG (and a lot less communication done) but will also impact the effectiveness of the preconditioner by quite a lot - so it’s a balance.

-pc_hypre_boomeramg_agg_num_paths is the number of pathways to consider to find a connection and keep something. That means increasing this value will reduce the ammount of aggressive coarsening happening in each aggressive coarsening level. What this means is that a higher -pc_hypre_boomeramg_agg_num_paths will improve accuracy/effectiveness but slow things down. So it’s a balance.

By default aggressive coarsening is off (-pc_hypre_boomeramg_agg_nl 0), so to turn it on set -pc_hypre_boomeramg_agg_nl to something higher than zero. I recommend 2 or 3 to start with, but even 4 can be ok in 3D. -pc_hypre_boomeramg_agg_num_paths defaults to 1: which is the most aggressive setting. If the aggressive coarsening levels are causing too many linear iterations, try increasing the number of paths first. Go up to about 4,5 or 6 and see if it helps reduce the number of linear iterations. If it doesn’t, then you may need to back off on the number of aggressive coarsening levels you are doing. All a balancing act…

Interpolation Truncation
You can also coarsen during the interpolation operation. One way to do that is to set -pc_hypre_boomeramg_truncfactor. This value should be between 0 and 1 and works similarly to the strong threshold: the higher you set it the more entries are ignored. I recommend a value around 0.3 to start with. You can adjust this up (maybe 0.4 or 0.5) for some speed or adjust it down (0.2, etc.) for more accuracy. Balance.

Interpolation Type

Speaking of interpolation - it’s expensive and how it’s done can greatly effect the accuracy and efficiency of BoomerAMG. Ideally, you should choose an interpolation operation that matches your physics (check the Hypre manual and various Hypre papers for discussion about which interpolation operators are better for which physics) but there are some good rules of thumb.

To change it set -pc_hypre_boomeramg_interp_type. The default is classic. This tends to be really slow and unnecessary - especially for 3D problems. I recommend starting with ext+i (yes, that’s what the value of the option is). It stands for “extended+i”. This is a good all around option that has low communication overhead.

There are many more options here, but I’m not going to enumerate them for now.

P Max

I’m going to be honest: I don’t quite understand what -pc_hypre_boomeramg_P_max does exactly. I’ve read about it - but I still can’t quite get it. The description from PETSc is: “Max elements per row for interpolation operator”. Setting this low (~2) seems to do a good job. Setting it higher seems to make the solve less accurate. However: that goes against my intuition - which is why I don’t quite understand what’s going on. If someone knows please email moose-users with a good eplanation!

Putting it All Together

So - what does an “evolved” BoomerAMG options line look like? Here’s one I’m currently using for a 3D Laplacian solve with ~6M elements on ~500 cores:

-pc_type hypre -pc_hypre_type boomeramg -pc_hypre_boomeramg_strong_threshold 0.7  -pc_hypre_boomeramg_agg_nl 4 -pc_hypre_boomeramg_agg_num_paths 5 -pc_hypre_boomeramg_max_levels 25 -pc_hypre_boomeramg_coarsen_type HMIS -pc_hypre_boomeramg_interp_type ext+i -pc_hypre_boomeramg_P_max 2 -pc_hypre_boomeramg_truncfactor 0.3

This is a good place to start if you’re looking for advanced usage. If it’s not “strong” enough for you (takes too many linear iterations) then reduce the number of aggressive coarsening levels (-pc_hypre_boomeramg_agg_nl) - if it’s still too slow… reduce the number of aggressive coarsening paths or try a different coarsening type (like PMIS) or add a bit more truncation (maybe go to 0.4 or 0.5). It’s all about balance

Full List of Options

Here is the full list of Hypre options present in PETSc 3.7.6 (found using -help | grep -C 5 -i hypre):

HYPRE preconditioner options
  -pc_hypre_type <boomeramg> (choose one of) pilut parasails boomeramg ams (PCHYPRESetType)
HYPRE BoomerAMG Options
  -pc_hypre_boomeramg_cycle_type <V> (choose one of) V W (None)
  -pc_hypre_boomeramg_max_levels <25>: Number of levels (of grids) allowed (None)
  -pc_hypre_boomeramg_max_iter <1>: Maximum iterations used PER hypre call (None)
  -pc_hypre_boomeramg_tol <0.>: Convergence tolerance PER hypre call (0.0 = use a fixed number of iterations) (None)
  -pc_hypre_boomeramg_truncfactor <0.>: Truncation factor for interpolation (0=no truncation) (None)
  -pc_hypre_boomeramg_P_max <0>: Max elements per row for interpolation operator (0=unlimited) (None)
  -pc_hypre_boomeramg_agg_nl <0>: Number of levels of aggressive coarsening (None)
  -pc_hypre_boomeramg_agg_num_paths <1>: Number of paths for aggressive coarsening (None)
  -pc_hypre_boomeramg_strong_threshold <0.25>: Threshold for being strongly connected (None)
  -pc_hypre_boomeramg_max_row_sum <0.9>: Maximum row sum (None)
  -pc_hypre_boomeramg_grid_sweeps_all <1>: Number of sweeps for the up and down grid levels (None)
  -pc_hypre_boomeramg_nodal_coarsen <0>: Use a nodal based coarsening 1-6 (HYPRE_BoomerAMGSetNodal)
  -pc_hypre_boomeramg_vec_interp_variant <0>: Variant of algorithm 1-3 (HYPRE_BoomerAMGSetInterpVecVariant)
  -pc_hypre_boomeramg_grid_sweeps_down <1>: Number of sweeps for the down cycles (None)
  -pc_hypre_boomeramg_grid_sweeps_up <1>: Number of sweeps for the up cycles (None)
  -pc_hypre_boomeramg_grid_sweeps_coarse <1>: Number of sweeps for the coarse level (None)
  -pc_hypre_boomeramg_smooth_type <Schwarz-smoothers> (choose one of) Schwarz-smoothers Pilut ParaSails Euclid (None)
  -pc_hypre_boomeramg_smooth_num_levels <25>: Number of levels on which more complex smoothers are used (None)
  -pc_hypre_boomeramg_eu_level <0>: Number of levels for ILU(k) in Euclid smoother (None)
  -pc_hypre_boomeramg_eu_droptolerance <0.>: Drop tolerance for ILU(k) in Euclid smoother (None)
  -pc_hypre_boomeramg_eu_bj: <FALSE> Use Block Jacobi for ILU in Euclid smoother? (None)
  -pc_hypre_boomeramg_relax_type_all <symmetric-SOR/Jacobi> (choose one of) Jacobi sequential-Gauss-Seidel seqboundary-Gauss-Seidel SOR/Jacobi backward-SOR/Jacobi  symmetric-SOR/Jacobi  l1scaled-SOR/Jacobi Gaussian-elimination      CG Chebyshev FCF-Jacobi l1scaled-Jacobi (None)
  -pc_hypre_boomeramg_relax_type_down <symmetric-SOR/Jacobi> (choose one of) Jacobi sequential-Gauss-Seidel seqboundary-Gauss-Seidel SOR/Jacobi backward-SOR/Jacobi  symmetric-SOR/Jacobi  l1scaled-SOR/Jacobi Gaussian-elimination      CG Chebyshev FCF-Jacobi l1scaled-Jacobi (None)
  -pc_hypre_boomeramg_relax_type_up <symmetric-SOR/Jacobi> (choose one of) Jacobi sequential-Gauss-Seidel seqboundary-Gauss-Seidel SOR/Jacobi backward-SOR/Jacobi  symmetric-SOR/Jacobi  l1scaled-SOR/Jacobi Gaussian-elimination      CG Chebyshev FCF-Jacobi l1scaled-Jacobi (None)
  -pc_hypre_boomeramg_relax_type_coarse <Gaussian-elimination> (choose one of) Jacobi sequential-Gauss-Seidel seqboundary-Gauss-Seidel SOR/Jacobi backward-SOR/Jacobi  symmetric-SOR/Jacobi  l1scaled-SOR/Jacobi Gaussian-elimination      CG Chebyshev FCF-Jacobi l1scaled-Jacobi (None)
  -pc_hypre_boomeramg_relax_weight_all <1.>: Relaxation weight for all levels (0 = hypre estimates, -k = determined with k CG steps) (None)
  -pc_hypre_boomeramg_relax_weight_level <1.>: Set the relaxation weight for a particular level (weight,level) (None)
  -pc_hypre_boomeramg_outer_relax_weight_all <1.>: Outer relaxation weight for all levels (-k = determined with k CG steps) (None)
  -pc_hypre_boomeramg_outer_relax_weight_level <1.>: Set the outer relaxation weight for a particular level (weight,level) (None)
  -pc_hypre_boomeramg_no_CF: <FALSE> Do not use CF-relaxation (None)
  -pc_hypre_boomeramg_measure_type <local> (choose one of) local global (None)
  -pc_hypre_boomeramg_coarsen_type <Falgout> (choose one of) CLJP Ruge-Stueben  modifiedRuge-Stueben   Falgout  PMIS  HMIS (None)
  -pc_hypre_boomeramg_interp_type <classical> (choose one of) classical   direct multipass multipass-wts ext+i ext+i-cc standard standard-wts   FF FF1 (None)
  -pc_hypre_boomeramg_print_statistics: Print statistics (None)
  -pc_hypre_boomeramg_print_statistics <3>: Print statistics (None)
  -pc_hypre_boomeramg_print_debug: Print debug information (None)
  -pc_hypre_boomeramg_nodal_relaxation: <FALSE> Nodal relaxation via Schwarz (None)





注意&#xff1a;建议使用3.6.0&#xff0c;我升级到3.7.1&#xff0c;并没有多什么新功能&#xff0c;反而电表的实时数据只能看到累计电能了&#xff0c;我回退了就正常&#xff0c;数据库是兼容的&#xff0c;java版本换位java11&#xff0c;其他不变就好 背景&#xff1a;…


目录 引入 Servlet容器 一、优化MyTomcat ①先将MyTomcat的main函数搬过来&#xff1a; ②将getClass()函数搬过来 ③创建容器 ④连接ServletConfigMapping和MyTomcat 连接&#xff1a; ⑤完整的ServletConfigMapping和MyTomcat方法&#xff1a; a.ServletConfigMappin…

Iris简单实现Go web服务器

package mainimport ("github.com/kataras/iris" )func main() {app : iris.New() // 实例一个iris对象//配置路由app.Get("/", func(ctx iris.Context) {ctx.WriteString("Hello Iris")})app.Get("/aa", func(ctx iris.Context) {ct…

MySql 中的解决某列中多个字段查询是否存在指定某个值, FIND_IN_SET 用法。

简言&#xff1a;今天公司数据库里面有个列是多个数据拼接而成的比如&#xff1a;**“,131113,749932833,749932825,749932826,749932827,749932828,749932829,”**想要通过sql 查找749932833值的列&#xff0c;很多同学第一想到的就是like 模糊匹配&#xff0c;模糊匹配不能保…


目录 读者须知 Git是什么 Git的原理 文件在Git中的几种状态 快速上手 结尾 读者须知 本文章适合从未接触过git,或者需要深度学习Git的用户进行阅读. 文末有详细的文档,读者可以前往Github下载阅读!!三克油 Git是什么 简单来说,Git是一个代码备份工具,你可以使用指令对…

jmeter 接口性能测试 学习笔记

目录 说明工具准备工具配置jmeter 界面汉化配置汉化步骤汉化结果图 案例1&#xff1a;测试接口接口准备线程组添加线程组配置线程组值线程数&#xff08;Number of Threads&#xff09;Ramp-Up 时间&#xff08;Ramp-Up Period&#xff09;循环次数&#xff08;Loop Count&…

小红书关键词搜索采集 | AI改写 | 无水印下载 | 多维表格 | 采集同步飞书

小红书关键词搜索采集 | AI改写 | 无水印下载 | 多维表格 | 采集同步飞书 一、下载影刀&#xff1a; https://www.winrobot360.com/share/activity?inviteUserUuid595634970300317698 二、加入应用市场 https://www.yingdao.com/share/accede/?inviteKeyb2d3f22a-fd6c-4a…


环境&#xff1a; WSL2 Unbuntu 22.04 问题描述&#xff1a; Unbuntu下怎么生成SSL自签证书&#xff1f; 解决方案&#xff1a; 生成自签名SSL证书可以使用OpenSSL工具&#xff0c;这是一个广泛使用的命令行工具&#xff0c;用于创建和管理SSL/TLS证书。以下是生成自签名…

通过阿里云 Milvus 与 PAI 搭建高效的检索增强对话系统

背景介绍 阿里云向量检索服务Milvus版&#xff08;简称阿里云Milvus&#xff09;是一款云上全托管服务&#xff0c;确保了了与开源Milvus的100%兼容性&#xff0c;并支持无缝迁移。在开源版本的基础上增强了可扩展性&#xff0c;能提供大规模 AI 向量数据的相似性检索服务。相…


靶机&#xff1a; https://download.vulnhub.com/tomato/Tomato.ova 难度&#xff1a; 低 目标&#xff1a; 获得 Root 权限 Flag 攻击方法&#xff1a; 主机发现端口扫描信息收集路径爬取源码分析文件包含写入日志 /var/log/auth.log内核漏洞枚举 les.sh本地提权 主机…


三维引擎cesium学习经验&#xff1a; 1、初始化viewer对象 2、对entity的操作&#xff1a;添加&#xff0c;隐藏&#xff0c;修改&#xff0c;去除&#xff0c;居中显示 3、去除掉entity的双击事件 4、获取当前视角高度 5、获取经纬度在屏幕上的位置 6、获取三维场景屏幕中心点…


四平方和 题目描述 四平方和定理&#xff0c;又称为拉格朗日定理&#xff1a; 每个正整数都可以表示为至多 4 个正整数的平方和。如果把 0 包括进去&#xff0c;就正好可以表示为 4 个数的平方和。 比如&#xff1a; 502021222 712121222; 对于一个给定的正整数&#xff0c;可…

十、从0开始卷出一个新项目之瑞萨RZN2L rzn-fsp v2.0.0 Release Notes

目录 一、概述 二、Github地址 三、 Features Added 3.1 Developer Assistance feature support added. 3.2 Multiplex interrupts support added. 四、Bug Fixes and Improvements 4.1 Added a noncache section for user applications. 4.2 Unified case of asm inst…


VM16解压版CentOS7安装和环境配置教程-2024年12月20日 一、下载安装包二、vm安装三、解压版CentOS7安装四、CentOS设置静态IP 因为很多同学觉得配置CentOS7好麻烦&#xff0c;我特地提供了一个已经配置好的现成镜像&#xff0c;来简化操作本篇来记录过程。 如果你在看到这篇文章…

PC寄存器(Program Counter Register)jvm

在JVM(Java虚拟机)中,PC寄存器(Program Counter Register)扮演着至关重要的角色。以下是对JVM中PC寄存器的详细解释: 一、定义与功能 定义: JVM中的PC寄存器,也被称为程序计数器,是对物理PC寄存器的一种抽象模拟。它用于存储当前线程所执行的字节码指令的地址,即指…


&#x1f468;‍⚕️ 主页&#xff1a; gis分享者 &#x1f468;‍⚕️ 感谢各位大佬 点赞&#x1f44d; 收藏⭐ 留言&#x1f4dd; 加关注✅! &#x1f468;‍⚕️ 收录于专栏&#xff1a;threejs gis工程师 文章目录 一、&#x1f340;前言1.2 ☘️THREE.Scene 场景1.2 ☘️…

【原生js案例】前端封装ajax请求及node连接 MySQL获取真实数据

上篇文章&#xff0c;我们封装了ajax方法来请求后端数据&#xff0c;这篇文章将介绍如何使用 Node.js 来连接 MySQL&#xff0c;并对数据库进行操作。 实现效果 代码实现 后端接口处理 const express require("express"); const connection require("../da…

FFmpeg 4.3 音视频-多路H265监控录放C++开发二十一.2,RTP协议-RTP协议概述,协议详情

前提: 为什么要学习 RTP&#xff08;Real-time Transport Protocol&#xff09;重点 简介&#xff1a;RTP是一个实时传输媒体数据的协议&#xff0c;通常与RTSP一起使用。它负责在网络上传输音视频数据。特点&#xff1a;RTP通过UDP或TCP传输媒体数据&#xff0c;提供时间戳和序…

Chapter 18 CMOS Processing Technology

Chapter 18 CMOS Processing Technology 这一章介绍CMOS制造工艺, 介绍wafer制作, 光刻, 氧化, 离子注入, 沉淀(deposition)和刻蚀. 然后介绍MOS管制作流程, 最后介绍被动器件和互连接. 18.1 General Considerations sheet resistance为方块电阻. R ρL/(W t), 方块电阻定…


服务器存储数据恢复环境&#xff1a; 一台V7000存储上共12块SAS机械硬盘&#xff08;其中1块是热备盘&#xff09;&#xff0c;组建了2组Mdisk&#xff0c;创建了一个pool。挂载在小型机上作为逻辑盘使用&#xff0c;小型机上安装的AIXSybase。 服务器存储故障&#xff1a; V7…