我们在《论文阅读:ThinLTO: Scalable and Incremental LTO》中介绍了ThinLTO论文的主要思想,这里我们介绍下LLVM ThinLTO是如何实现的。本文主要分为如下几个部分:
- LLVM ThinLTO Object 含有哪些内容?
- LLVM ThinLTO 是如何做优化的?
- LLVM ThinLTO 能够enable哪些优化?
LLVM ThinLTO Objects都包含了哪些?
继续使用 Example of link time optimization 中的例子进行分析,在《LLVM full LTO 学习笔记》中我们通过 magic number 作为切入点,简单分析了 full lto 的过程。下面按照这个路子继续该分析
$ clang -flto=thin -c a.c -o a_lto.o
$ clang -flto=thin -c main.c -o main_lto.o
$ hexdump a_lto.o | head
0000000 4342 dec0 1435 0000 0005 0000 0c62 2430
0000010 594d 66be fb8d 4fb4 c81b 4424 3201 0005
0000020 0c21 0000 0266 0000 020b 0021 0002 0000
0000030 0016 0000 8107 9123 c841 4904 1006 3932
0000040 0192 0c84 0525 1908 041e 628b 1080 0245
0000050 9242 420b 1084 1432 0838 4b18 320a 8842
0000060 7048 21c4 4423 8712 108c 9241 6402 08c8
0000070 14b1 4320 8846 c920 3201 8442 2a18 2a28
0000080 3190 b07c 915c c420 00c8 0000 2089 0000
0000090 000e 0000 2232 0908 6220 0046 2b21 9824
我们可以看到 magic number 为 4342 dec0
,说明对于 thin LTO 的 objects,其文件格式还是 bitcode file 。通过阅读 ThinLTO 的文档,发现其实文档中早已经说的很详细了。
In ThinLTO mode, as with regular LTO, clang emits LLVM bitcode after the compile phase. The ThinLTO bitcode is augmented with a compact summary of the module. During the link step, only the summaries are read and merged into a combined summary index, which includes an index of function locations for later cross-module function importing. Fast and efficient whole-program analysis is then performed on the combined summary index.
使用 llvm-dis a_lto.o 得到其可读的 IR。我们将其与 full lto 得到的 IR 进行对比后发现,两者差异极小,主要在于最后面的 summary 部分。以 a_lto.o 进行 thinLTO 和 full LTO 的对比如下。
// ---------------- Thin LTO ----------------//
!llvm.module.flags = !{!0, !1, !2, !3}
!llvm.ident = !{!4}
!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 7, !"uwtable", i32 1}
!2 = !{i32 7, !"frame-pointer", i32 2}
!3 = !{i32 1, !"EnableSplitLTOUnit", i32 0}
!4 = !{!"clang version 14.0.0 (https://github.com/llvm/llvm-project.git 58e7bf78a3ef724b70304912fb3bb66af8c4a10c)"}
^0 = module: (path: "a_lto.o", hash: (3489747275, 1762444854, 1461358598, 2667786215, 1835806708))
^1 = gv: (name: "foo2", summaries: (function: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1, canAutoHide: 0), insts: 2, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0, noUnwind: 1, mayThrow: 0, hasUnknownCall: 0, mustBeUnreachable: 0), refs: (writeonly ^2)))) ; guid = 2494702099028631698
^2 = gv: (name: "i", summaries: (variable: (module: ^0, flags: (linkage: internal, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1, canAutoHide: 0), varFlags: (readonly: 1, writeonly: 1, constant: 0)))) ; guid = 2708120569957007488
^3 = gv: (name: "foo1", summaries: (function: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1, canAutoHide: 0), insts: 13, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0, noUnwind: 1, mayThrow: 0, hasUnknownCall: 0, mustBeUnreachable: 0), calls: ((callee: ^5)), refs: (readonly ^2)))) ; guid = 7682762345278052905
^4 = gv: (name: "foo4") ; guid = 11564431941544006930
^5 = gv: (name: "foo3", summaries: (function: (module: ^0, flags: (linkage: internal, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1, canAutoHide: 0), insts: 2, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0, noUnwind: 1, mayThrow: 0, hasUnknownCall: 0, mustBeUnreachable: 0), calls: ((callee: ^4))))) ; guid = 17367728344439303071
^6 = blockcount: 5
// ---------------- Full LTO ----------------//
!llvm.module.flags = !{!0, !1, !2, !3, !4}
!llvm.ident = !{!5}
!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 7, !"uwtable", i32 1}
!2 = !{i32 7, !"frame-pointer", i32 2}
!3 = !{i32 1, !"ThinLTO", i32 0}
!4 = !{i32 1, !"EnableSplitLTOUnit", i32 1}
!5 = !{!"clang version 14.0.0 (https://github.com/llvm/llvm-project.git 58e7bf78a3ef724b70304912fb3bb66af8c4a10c)"}
^0 = module: (path: "a_lto.o", hash: (0, 0, 0, 0, 0))
^1 = gv: (name: "foo2", summaries: (function: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 1, live: 0, dsoLocal: 1, canAutoHide: 0), insts: 2, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0, noUnwind: 1, mayThrow: 0, hasUnknownCall: 0, mustBeUnreachable: 0), refs: (^2)))) ; guid = 2494702099028631698
^2 = gv: (name: "i", summaries: (variable: (module: ^0, flags: (linkage: internal, visibility: default, notEligibleToImport: 1, live: 0, dsoLocal: 1, canAutoHide: 0), varFlags: (readonly: 1, writeonly: 1, constant: 0)))) ; guid = 2708120569957007488
^3 = gv: (name: "foo1", summaries: (function: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 1, live: 0, dsoLocal: 1, canAutoHide: 0), insts: 13, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0, noUnwind: 1, mayThrow: 0, hasUnknownCall: 0, mustBeUnreachable: 0), calls: ((callee: ^5)), refs: (^2)))) ; guid = 7682762345278052905
^4 = gv: (name: "foo4") ; guid = 11564431941544006930
^5 = gv: (name: "foo3", summaries: (function: (module: ^0, flags: (linkage: internal, visibility: default, notEligibleToImport: 1, live: 0, dsoLocal: 1, canAutoHide: 0), insts: 2, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0, noUnwind: 1, mayThrow: 0, hasUnknownCall: 0, mustBeUnreachable: 0), calls: ((callee: ^4))))) ; guid = 17367728344439303071
^6 = flags: 8
^7 = blockcount: 5
我们将重点的差别进行 highlight,
Difference | Thin LTO | Full LTO |
---|---|---|
Module Flags | !3 = !{i32 1, !"ThinLTO" , i32 0} | |
Global Value Summary module ^0 | ^0 = module: (path: "a_lto.o", hash: (3489747275, 1762444854, 1461358598, 2667786215, 1835806708)) | ^0 = module: (path: "a_lto.o", hash: (0, 0, 0, 0, 0)) |
Global Value Summary foo2 ^1 | - notEligibleToImport: 0 - refs: (writeonly ^2) | - notEligibleToImport: 1 - refs: (^2) |
Global Value Summary i ^2 | - notEligibleToImport: 0 | notEligibleToImport: 1 |
Global Value Summary foo1 ^3 | - notEligibleToImport: 0 - refs: (readonly ^2) | - notEligibleToImport: 1 - refs: (^2) |
Global Value Summary foo3 ^5 | notEligibleToImport: 0 | notEligibleToImport: 1 |
通过 Metadata 知道,!
后面表示的是 metadata,^
表示的是 global value summary。
All metadata are identified in syntax by an exclamation point (‘!’).
Compiling with ThinLTO causes the building of a compact summary of the module that is emitted into the bitcode. The summary is emitted into the LLVM assembly and identified in syntax by a caret (‘^’).
通过 Module Flags Metadata 来对 !3 = !{i32 1, !"ThinLTO", i32 0}
进行解释。module flags metadata 是一组三元组 triplets
,
- The first element is a behavior flag, which specifies the behavior when two (or more) modules are merged together.
- The second element is a metadata string that is a unique ID for the metadata.
- The third element is the value of the flag.
!3 = !{i32 1, !"ThinLTO", i32 0}
ThinLTO
的值为 0, 表示非 ThinLTO,另外一个表明是否为 ThinLTO 或者 FullLTO,GLOBALVAL_SUMMARY_BLOCK
默认是 thin lto。
$ llvm-bcanalyzer -dump a_full_lto.o
Block ID #24 (FULL_LTO_GLOBALVAL_SUMMARY_BLOCK):
Num Instances: 1
Total Size: 789b/98.62B/24W
Percent of file: 3.4924%
Num SubBlocks: 0
Num Abbrevs: 6
Num Records: 7
Percent Abbrevs: 57.1429%
Record Histogram:
Count # Bits b/Rec % Abv Record Kind
3 218 72.7 100.00 PERMODULE
1 22 BLOCK_COUNT
1 22 FLAGS
1 22 VERSION
1 38 100.00 PERMODULE_GLOBALVAR_INIT_REFS
$ llvm-bcanalyzer -dump a_thin_lto.o
Block ID #20 (GLOBALVAL_SUMMARY_BLOCK):
Num Instances: 1
Total Size: 789b/98.62B/24W
Percent of file: 3.4727%
Num SubBlocks: 0
Num Abbrevs: 6
Num Records: 7
Percent Abbrevs: 57.1429%
Record Histogram:
Count # Bits b/Rec % Abv Record Kind
3 218 72.7 100.00 PERMODULE
1 22 BLOCK_COUNT
1 22 FLAGS
1 22 VERSION
1 38 100.00 PERMODULE_GLOBALVAR_INIT_REFS
在有 global value summary 的情况下,默认是 thin lto,除非 ThinLTO module metadata flag 为 0 。
/// Emit the per-module summary section alongside the rest of
/// the module's bitcode.
void ModuleBitcodeWriterBase::writePerModuleGlobalValueSummary() {
// By default we compile with ThinLTO if the module has a summary, but the
// client can request full LTO with a module flag.
bool IsThinLTO = true;
if (auto *MD =
mdconst::extract_or_null<ConstantInt>(M.getModuleFlag("ThinLTO")))
IsThinLTO = MD->getZExtValue();
Stream.EnterSubblock(IsThinLTO ? bitc::GLOBALVAL_SUMMARY_BLOCK_ID
: bitc::FULL_LTO_GLOBALVAL_SUMMARY_BLOCK_ID,
4);
// ...
}
RFC
https://lists.llvm.org/pipermail/llvm-dev/2015-May/085526.html
https://sites.google.com/site/llvmthinlto/
Patches
https://reviews.llvm.org/D13107?id=35761
Function Importer
https://reviews.llvm.org/D14914
https://reviews.llvm.org/D18343
llvm-opt2/llvm-opt相关
关于 SyntheticCount的讨论
- https://lists.llvm.org/pipermail/llvm-dev/2017-December/119701.html
- https://reviews.llvm.org/D43521?id=135117#inline-388028
/// Compute synthetic function entry counts.
void computeSyntheticCounts(ModuleSummaryIndex &Index);
相关术语
- BFI, block frequency inforamtion
- BPI,probability information
- CGSCC,call graph scc analysis,https://lists.llvm.org/pipermail/llvm-dev/2016-June/100792.html