注:机翻。未校。
How I cut GTA Online loading times by 70%
t0st
2021-02-28
GTA Online. Infamous for its slow loading times. Having picked up the game again to finish some of the newer heists I was shocked (/s) to discover that it still loads just as slow as the day it was released 7 years ago.
侠盗猎车手在线模式。因其缓慢的加载时间而臭名昭著。再次拿起游戏以完成一些较新的抢劫后,我震惊地 (/s) 发现它的加载速度仍然和 7 年前发布的那一天一样慢。
It was time. Time to get to the bottom of this.
是时候了。是时候深入了解这个问题了。
Recon
First I wanted to check if someone had already solved this problem. Most of the results I found pointed towards anecdata about how the game is so sophisticated that it needs to load so long, stories on how the p2p network architecture is rubbish (not saying that it isn’t), some elaborate ways of loading into story mode and a solo session after that and a couple of mods that allowed skipping the startup R* logo video. Some more reading told me we could save a whopping 10-30 seconds with these combined!
首先,我想检查是否有人已经解决了这个问题。我发现的大多数结果都指向关于游戏如何如此复杂以至于需要加载如此之长时间的轶事数据,关于 p2p 网络架构如何垃圾的故事(并不是说它不是),一些精心设计的加载到故事模式和之后的单独会话的方法,以及一些允许跳过启动 R* 标志视频的模组。更多的阅读告诉我,我们可以用这些结合起来节省高达 10-30 秒的时间!
Meanwhile on my PC… 同时在我的电脑上…
Benchmark
Story mode load time: ~1m 10s
Online mode load time: ~6m flat
Startup menu disabled, time from R* logo until in-game (social club login time isn't counted).
Old but decent CPU: AMD FX-8350
Cheap-o SSD: KINGSTON SA400S37120G
We have to have RAM: 2x Kingston 8192 MB (DDR3-1337) 99U5471
Good-ish GPU: NVIDIA GeForce GTX 1070
I know my setup is dated but what on earth could take 6x longer to load into online mode? I couldn’t measure any difference using the story-to-online loading technique as others have found before me. Even if it did work the results would be down in the noise.
我知道我的设置已经过时了,但到底什么可能需要 6 倍的时间才能加载到在线模式?我无法像其他人在我之前发现的那样,使用故事到在线加载技术来衡量任何差异。即使它确实有效,结果也会在噪音中下降。
I Am (Not) Alone
If this poll is to be trusted then the issue is widespread enough to mildly annoy more than 80% of the player base. It’s been 7 years R*!
如果这项民意调查是可信的,那么这个问题就足够普遍,足以轻微地惹恼超过80%的玩家群。已经 7 年了 R*!
Looking around a bit to find who are the lucky ~20% that get sub 3 minute load times I came across a few benchmarks with high-end gaming PCs and an online mode load time of about 2 minutes. I would kill hack for a 2 minute load time! It does seem to be hardware-dependent but something doesn’t add up here…
环顾四周,找出谁是幸运的 ~20% 的加载时间低于 3 分钟,我遇到了一些高端游戏 PC 的基准测试,在线模式加载时间约为 2 分钟。我会在 2 分钟的加载时间内杀死 hack!它似乎确实依赖于硬件,但这里没有加起来…
How come their story mode still takes near a minute to load? (The M.2 one didn’t count the startup logos btw.) Also, loading story to online takes them only a minute more while I’m getting about five more. I know that their hardware specs are a lot better but surely not 5x better.
为什么他们的故事模式仍然需要将近一分钟才能加载?(顺便说一句,M.2 没有计算启动徽标。此外,将故事加载到网上只需要多一分钟,而我还有大约五分钟。我知道他们的硬件规格要好得多,但肯定不是 5 倍好。
Highly accurate measurements
Armed with such powerful tools as the Task Manager I began to investigate what resources could be the bottleneck.
有了任务管理器等强大的工具,我开始调查哪些资源可能是瓶颈。
After taking a minute to load the common resources used for both story and online modes (which is near on par with high-end PCs) GTA decides to max out a single core on my machine for four minutes and do nothing else.
在花了一分钟加载用于故事和在线模式的常用资源(这与高端 PC 几乎不相上下)之后,GTA 决定将我的机器上的单个内核最大化四分钟,而不做任何其他事情。
Disk usage? None! Network usage? There’s a bit, but it drops basically to zero after a few seconds (apart from loading the rotating info banners). GPU usage? Zero. Memory usage? Completely flat…
磁盘使用情况?没有!网络使用情况?有一点,但几秒钟后它基本上下降到零(除了加载旋转信息横幅)。GPU 使用情况?零。内存使用情况?完全平坦…
What, is it mining crypto or something? I smell code. Really bad code.
什么,是挖加密货币还是什么?我闻到代码的味道。真的糟糕的代码。
Single thread-bound
While my old AMD CPU has 8 cores and it does pack a punch, it was made in the olden days. Back when AMD’s single-thread performance was way behind Intel’s. This might not explain all of the load time differences but it should explain most of it.
虽然我的旧 AMD CPU 有 8 个核心,它确实很有冲击力,但它是在过去制造的。当AMD的单线程性能远远落后于英特尔时。这可能无法解释所有的加载时间差异,但应该可以解释大部分时间差异。
What’s odd is that it’s using up just the CPU. I was expecting vast amounts of disk reads loading up resources or loads of network requests trying to negotiate a session in the p2p network. But this? This is probably a bug.
奇怪的是,它只占用了CPU。我预计会进行大量的磁盘读取,从而加载资源或尝试在 p2p 网络中协商会话的网络请求负载。但是这个?这可能是一个错误。
Profiling
Profilers are a great way of finding CPU bottlenecks. There’s only one problem - most of them rely on instrumenting the source code to get a perfect picture of what’s happening in the process. And I don’t have the source code. Nor do I need microsecond-perfect readings - I have 4 minutes’ worth of a bottleneck.
分析器是查找 CPU 瓶颈的好方法。只有一个问题 - 他们中的大多数都依赖于检测源代码来完美地了解过程中发生的情况。而且我没有源代码。我也不需要微秒级的完美读数 - 我有 4 分钟的瓶颈。
Enter stack sampling: for closed source applications there’s only one option. Dump the running process’ stack and current instruction pointer’s location to build a calling tree in set intervals. Then add them up to get statistics on what’s going on. There’s only one profiler that I know of (might be ignorant here) that can do this on Windows. And it hasn’t been updated in over 10 years. It’s Luke Stackwalker! Someone, please give this project some love 😃
进入堆栈采样:对于闭源应用程序,只有一个选项。转储正在运行的进程的堆栈和当前指令指针的位置,以在设定的时间间隔内构建调用树。然后将它们相加以获得有关正在发生的事情的统计信息。据我所知,只有一个分析器(这里可能一无所知)可以在 Windows 上执行此操作。而且它已经有 10 多年没有更新了。我是卢克·斯塔克沃克!有人,请给这个项目一些爱:)
Normally Luke would group the same functions together but since I don’t have debugging symbols I had to eyeball nearby addresses to guess if it’s the same place. And what do we see? Not one bottleneck but two of them!
通常,Luke 会将相同的函数组合在一起,但由于我没有调试符号,我不得不查看附近的地址来猜测它是否是同一个地方。我们看到了什么?不是一个瓶颈,而是两个瓶颈!
Down the rabbit hole
Having borrowed my friend’s completely legitimate copy of the industry-standard disassembler (no, I really can’t afford the thing… gonna learn to ghidra one of these days) I went to take GTA apart.
借了我朋友完全合法的行业标准反汇编器副本(不,我真的买不起这东西…这几天有一天我要去学习 ghidra)我去拆开了 GTA。
That doesn’t look right at all. Most high-profile games come with built-in protection against reverse engineering to keep away pirates, cheaters, and modders. Not that it has ever stopped them.
这看起来根本不对劲。大多数备受瞩目的游戏都带有内置的逆向工程保护功能,以阻止盗版、作弊者和改装者。并不是说它曾经阻止过他们。
There seems to be some sort of an obfuscation/encryption at play here that has replaced most instructions with gibberish. Not to worry, we simply need to dump the game’s memory while it’s executing the part we want to look at. The instructions have to be de-obfuscated before running one way or another. I had Process Dump lying around, so I used that, but there are plenty of other tools available to do this sort of thing.
这里似乎存在某种混淆/加密,已经用胡言乱语取代了大多数指令。不用担心,我们只需要在游戏执行我们想要查看的部分时转储游戏的内存。在以一种或另一种方式运行之前,必须对指令进行混淆。我有 Process Dump,所以我使用了它,但还有很多其他工具可用于做这种事情。
Problem one: It’s… strlen?!
Disassembling the now-less-obfuscated dump reveals that one of the addresses has a label pulled out of somewhere! It’s strlen
? Going down the call stack the next one is labeled vscan_fn
and after that the labels end, tho I’m fairly confident it’s sscanf
.
拆解现在不那么混淆的转储会发现,其中一个地址的标签是从某处拉出来的!是strlen
?沿着调用堆栈向下,下一个被标记为 vscan_fn
,之后标签结束,我相当有信心它是 sscanf。
It’s parsing something. Parsing what? Untangling the disassembly would take forever so I decided to dump some samples from the running process using x64dbg. Some debug-stepping later it turns out it’s… JSON! They’re parsing JSON. A whopping 10 megabytes worth of JSON with some 63k item entries.
它正在解析某些东西。解析什么?解开反汇编需要很长时间,所以我决定使用 x64dbg 从正在运行的进程中转储一些样本。经过一些调试步骤后,发现它是…JSON的!他们正在解析 JSON。价值高达 10 兆字节的 JSON,其中包含大约 63k 个项目条目。
...,
{
"key": "WP_WCT_TINT_21_t2_v9_n2",
"price": 45000,
"statName": "CHAR_KIT_FM_PURCHASE20",
"storageType": "BITFIELD",
"bitShift": 7,
"bitSize": 1,
"category": ["CATEGORY_WEAPON_MOD"]
},
...
What is it? It appears to be data for a “net shop catalog” according to some references. I assume it contains a list of all the possible items and upgrades you can buy in GTA Online.
这是什么?根据一些参考资料,它似乎是“网店目录”的数据。我假设它包含您可以在 GTA 在线模式中购买的所有可能物品和升级的列表。
But 10 megs? That’s nothing! And using sscanf
may not be optimal but surely it’s not that bad? Well…
但是 10 梅格?这没什么!使用 sscanf
可能不是最佳选择,但肯定没有那么糟糕吗?
Yeah, that’s gonna take a while… To be fair I had no idea most sscanf
implementations called strlen
so I can’t blame the developer who wrote this. I would assume it just scanned byte by byte and could stop on a NULL
.
是的,这需要一段时间…公平地说,我不知道大多数 sscanf
实现都称为 strlen
,所以我不能责怪编写此文档的开发人员。我假设它只是逐个字节扫描,并且可以在 NULL
上停止。
Problem two: Let’s use a Hash- … Array?
Turns out the second offender is called right next to the first one. They’re both even called in the same if
statement as seen in this ugly decompilation:
事实证明,第二个罪犯就在第一个罪犯旁边被召唤。它们甚至在同一个 if
语句中被调用,如这个丑陋的反编译中所示:
All labels are mine, no idea what the functions/parameters are actually called.
所有标签都是我的,不知道函数/参数实际上叫什么。
The second problem? Right after parsing an item, it’s stored in an array (or an inlined C++ list? not sure). Each entry looks something like this:
第二个问题?在解析一个项目之后,它被存储在一个数组中(或者一个内联的C++列表?不确定)。每个条目看起来都像这样:
struct {
uint64_t *hash;
item_t *item;
} entry;
But before it’s stored? It checks the entire array, one by one, comparing the hash of the item to see if it’s in the list or not. With ~63k entries that’s (n^2+n)/2 = (63000^2+63000)/2 = 1984531500
checks if my math is right. Most of them useless. You have unique hashes why not use a hash map.
但在它被存储之前?它一个接一个地检查整个数组,比较项目的哈希值,看看它是否在列表中。对于 ~63k 个条目,即 (n^2+n)/2 = (63000^2+63000)/2 = 1984531500
检查我的数学是否正确。他们中的大多数是无用的。你有唯一的哈希值,为什么不使用哈希映射。
I named it hashmap
while reversing but it’s clearly not_a_hashmap
. And it gets even better. The hash-array-list-thing is empty before loading the JSON. And all of the items in the JSON are unique! They don’t even need to check if it’s in the list or not! They even have a function to directly insert the items! Just use that! Srsly, WAT!?
我在倒车时将其命名为 hashmap
,但它显然是 not_a_hashmap
。它变得更好。hash-array-list-thing 在加载 JSON 之前是空的。JSON 中的所有项目都是唯一的!他们甚至不需要检查它是否在列表中!他们甚至具有直接插入物品的功能!就用那个吧!Srsly,WAT!?
PoC
Now that’s nice and all, but no one is going to take me seriously unless I test this so I can write a clickbait title for the post.
现在这很好,但没有人会认真对待我,除非我测试一下,这样我就可以为帖子写一个点击诱饵标题。
The plan? Write a .dll
, inject it in GTA, hook some functions, ???, profit.
计划是什么?写一个.dll
,在GTA中注入,钩住一些函数,???,利润。
The JSON problem is hairy, I can’t realistically replace their parser. Replacing sscanf
with one that doesn’t depend on strlen
would be more realistic. But there’s an even easier way.
JSON问题很棘手,我无法实际替换他们的解析器。将 sscanf
替换为不依赖于 strlen
的 sscanf
会更现实。但还有一种更简单的方法。
- hook strlen
- wait for a long string
- “cache” the start and length of it
- if it’s called again within the string’s range, return cached value
Something like: 像这样:
size_t strlen_cacher(char* str)
{
static char* start;
static char* end;
size_t len;
const size_t cap = 20000;
// if we have a "cached" string and current pointer is within it
if (start && str >= start && str <= end) {
// calculate the new strlen
len = end - str;
// if we're near the end, unload self
// we don't want to mess something else up
if (len < cap / 2)
MH_DisableHook((LPVOID)strlen_addr);
// super-fast return!
return len;
}
// count the actual length
// we need at least one measurement of the large JSON
// or normal strlen for other strings
len = builtin_strlen(str);
// if it was the really long string
// save it's start and end addresses
if (len > cap) {
start = str;
end = str + len;
}
// slow, boring return
return len;
}
And as for the hash-array problem, it’s more straightforward - just skip the duplicate checks entirely and insert the items directly since we know the values are unique.
至于哈希数组问题,它更直接 - 只需完全跳过重复检查并直接插入项目,因为我们知道值是唯一的。
char __fastcall netcat_insert_dedupe_hooked(uint64_t catalog, uint64_t* key, uint64_t* item)
{
// didn't bother reversing the structure
uint64_t not_a_hashmap = catalog + 88;
// no idea what this does, but repeat what the original did
if (!(*(uint8_t(__fastcall**)(uint64_t*))(*item + 48))(item))
return 0;
// insert directly
netcat_insert_direct(not_a_hashmap, key, &item);
// remove hooks when the last item's hash is hit
// and unload the .dll, we are done here :)
if (*key == 0x7FFFD6BE) {
MH_DisableHook((LPVOID)netcat_insert_dedupe_addr);
unload();
}
return 1;
}
Full source of PoC here.
PoC 的完整来源在这里。
Results
Well, did it work then?
那么,它起作用了吗?
Original online mode load time: ~6m flat
Time with only duplication check patch: 4m 30s
Time with only JSON parser patch: 2m 50s
Time with both issues patched: 1m 50s
(6*60 - (1*60+50)) / (6*60) = 69.4% load time improvement (nice!)
Hell yes, it did! 😃)
地狱,是的,它做到了!😃)
Most likely, this won’t solve everyone’s load times - there might be other bottlenecks on different systems, but it’s such a gaping hole that I have no idea how R* has missed it all these years.
最有可能的是,这并不能解决每个人的加载时间 - 不同的系统可能存在其他瓶颈,但这是一个如此巨大的漏洞,我不知道 R* 这些年来是如何错过它的。
tl;dr
- There’s a single thread CPU bottleneck while starting up GTA Online
启动 GTA 在线模式时存在单线程 CPU 瓶颈 - It turns out GTA struggles to parse a 10MB JSON file
事实证明,GTA 很难解析 10MB 的 JSON 文件 - The JSON parser itself is poorly built / naive and
JSON解析器本身构建得很差/幼稚,并且 - After parsing there’s a slow item de-duplication routine
解析后,有一个缓慢的项目重复数据消除例程
R* please fix
If this somehow reaches Rockstar: the problems shouldn’t take more than a day for a single dev to solve. Please do something about it :<
如果这以某种方式到达 Rockstar:单个开发人员解决问题的时间不应超过一天。请做点什么:<
You could either switch to a hashmap for the de-duplication or completely skip it on startup as a faster fix. For the JSON parser - just swap out the library for a more performant one. I don’t think there’s any easier way out.
您可以切换到哈希图进行重复数据消除,也可以在启动时完全跳过它,以加快修复速度。对于 JSON 解析器 - 只需将库换成性能更高的库即可。我认为没有比这更简单的出路了。
ty ❤️
Small update
I was expecting to get some attention but nowhere near this much! After reaching the top of HN this post has spread like wildfire! Thank you for the overwhelming response 😃
我本来以为会得到一些关注,但远没有那么多!在到达HN的顶部后,这个帖子像野火一样蔓延开来!感谢您的热烈回应:)
I’ll do more writing if something interesting comes along, but don’t expect anything of this scale soon - there was a lot of luck involved.
如果有什么有趣的事情出现,我会写更多的文章,但不要指望很快就会有这种规模的东西 - 这涉及到很多运气。
A few people suggested spamming this post to Rockstar’s support - please don’t! I’m sure they’ve seen this by now. Continuing would only bog down support tickets for everyone else. Social media is fair game in my book tho.
一些人建议向 Rockstar 的支持发送此帖子的垃圾邮件 - 请不要!我相信他们现在已经看到了这一点。继续下去只会让其他人的支持票陷入困境。在我的书中,社交媒体是公平的游戏。
Several HN comments suggested I add a donate button, as they would like to buy me a beer (thank you!) so I’m placing a link in the footer.
一些 HN 评论建议我添加一个捐赠按钮,因为他们想给我买啤酒(谢谢!)所以我在页脚中放置了一个链接。
Thank you for reading and all the support 😃
感谢您的阅读和所有支持:)
Update 2021-03-15
- Got confirmation from R* that this is getting a fix soon
从 R* 那里得到确认,这个问题很快就会得到修复 - Just got awarded $10k through their H1 in-game bounty as an exception 😃) (usually only for security issues)
刚刚通过他们的 H1 游戏内赏金获得了 10 美元作为例外:))(通常仅用于安全问题) - Trying to figure out what’s a W8 and how to fill it (lol)
试图弄清楚什么是 W8 以及如何填充它(笑) - I did try asking for more technical details but they couldn’t say anything
我确实尝试询问更多技术细节,但他们什么也说不出来 - Will do another benchmark on my same old setup as soon as the update is out, I’m sure their engineers won’t disappoint 😃
一旦更新出来,就会在我相同的旧设置上做另一个基准测试,我相信他们的工程师不会让:)失望
Update 2021-03-16
R* released the update! Downloaded it and got my first run results - same hardware, same measurement - from R* logo to fully online.
R* 发布了更新!下载了它并得到了我的第一次运行结果 - 相同的硬件,相同的测量 - 从 R* 标志到完全在线。
Fully fixed! t0st approves!
完全固定!t0st 批准!
Thanks again for all the coffees, and thanks to R* for taking the time to look into this and the generous bounty!
再次感谢所有的咖啡,感谢 R* 抽出时间调查这个问题和慷慨的赏金!
Copyright © 2021-2023 t0st
via:
-
How I cut GTA Online loading times by 70%
https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times-by-70/
高手在民间!线上模式加载缓慢问题已解决!GTA6再也来不了了!
原创 Keylol 其乐社区 2021年03月16日
想必大家都对 GTA5 爱恨交加,因为它好玩才会经常上榜,也因为它加载缓慢而被人痛恨,现在加载的问题终于解决了!
提供解决方法的人竟然不是 R 星的人!
划重点
01发售六年后,终于有人发现 GTA5 线上模式的漫长加载,源于游戏数据结构上的低级缺陷
有一天,一位作者应该是很久没玩过 GTA 线上模式了,他最近想上游戏看看新增了哪些内容,却发现这个模式的加载过程依然像六年前那样十分漫长。作者翻看了一些网上的讨论,都将这个问题归因于“游戏内容过于庞大”,或者“P2P 网络的原因”,但他觉得这个问题并没有那么简单,于是想一探究竟。
作者首先用任务管理器简单查看了线上模式的整个加载过程。他奇怪地发现,在完成单人游戏内容的加载后,游戏的多人部分加载仅仅是在利用 CPU 的单核性能,几乎没有磁盘读取、没有内存写入、也没有网络活动。要知道对于一个网络游戏来说,这一点非常奇怪。
由于无法看到源代码,作者接下来使用一个十年前的古老工具 Luke Stackwalker 来进行堆栈采样,试图从调用树进行分析。他发现,造成性能瓶颈的原因不是一个,而是两个(但此时还不知道具体的原因是什么)。
随后,在借助了工业界级别的逆向工具,以及 x64dbg 和 Process Dump 的帮助后,作者发现了游戏在数据结构上存在着低级缺陷:
低级缺陷一
-
线上模式在加载时,会从大小为 10MB 左右 JSON 数据中解析出游戏内所有可购买的物品,数量大概为 63000 个。
-
编写这部分解析代码的某位天才工程师,在代码中使用的是 sscanf() 和 strlen(),更为奇葩的是,他可能使用的取值方法,是在这个 10MB 大小的字符串中进行逐个字符的遍历。
低级缺陷二
-
将某个物品从 JSON 中解析出来之后,游戏会将它存储在数组或者列表中。(事实上这一步是完全没必要的,见后)
-
然而在存储之前,这位(或另一位)天才工程师遍历了整个数组,并且根据物品的哈希值来进行重复性检查。那么,既然每个物品都有不同的哈希值,为什么不使用 hash map 呢?
-
事实上,由于在加载 JSON 之前,整个所谓的哈希数组都是空的,而且每个物品的哈希值都不同,那么根本就不需要检查这件物品是否在数组中!更夸张的是,R 星的这位(或另一位)天才工程师甚至还专门写了个函数来处理物品的插入。
解决方案
基于以上分析,原作者给出了解决这两个问题的方案:
- 将 strlen() 进行 hook
- 等待长字符串
- 将字符串的开始和结束进行“缓存”
- 如果在字符串的范围内再次被调用,返回缓存值
对于上面讨论的缺陷二,直接跳过重复性检查即可。
完整的代码可以私信公众号,回复“GTA代码”来获取。
在使用这个补丁后,线上模式的加载时间减少了 70%。
划重点
02R 星更新修复了 GTA5 启动加载较慢的问题,并奖励 1 万美金给发现问题的 tostercx
好消息是,R 星已经将这一补丁实装,玩家们可以登录 Steam 来进行更新 GTAV 了!
有坛友在更新了游戏之后,进行了载入测试
- 如果直接加载线上模式,用时 2 分 10 秒
- 如果先进入故事模式再加载线上模式,用时 26 秒
via:
-
高手在民间!线上模式加载缓慢问题已解决!GTA6再也来不了了!原创 Keylol 其乐社区 2021年03月16日 23:39
https://mp.weixin.qq.com/s/tNxLZkCi4pxBnFZ5Gs5caw