文章目录
- 1. 目的
- 2. 正确用法实例
- 3. 纠正错误用法
- 3.1 错误用法
- 3.2 让 AddressSanitizer 告诉你错误
- 3.3 解释
- 4. 总结
1. 目的
在读取 pgm 格式图像的 meta 信息时, 使用了 %2s
这个格式串, 之前不是很了解, 尝试后发现, 如果不小心容易内存越界,而内存越界并不总是让运行的程序立即 crash, 会导致排查错误的成本升高。记录如下。
2. 正确用法实例
读取 .pgm 图像文件的 meta 信息时, 第一行是 P5 两个字符串。读取的代码为
char magic[3];
fscanf(fp, "%2s", magic);
意思是说从 文件句柄 fp 读取不超过2个字符, 存储到 magic 这个内存buffer中。
也可以是从控制台(stdin)读取输入:
char buf[3];
scanf("%2s", buf);
问: 明明是读取不超过 2 个字符, 为什么要申请3个字符呢?是否必要?
答: 必要的。
问: 那我偏要申请2个字符, 程序运行也没 crash 啊?
答: 内存越界并不总是立即 crash, 除非触发缺页中断。你可以开启 Address Sanitizer, 它会告诉你,你越界了。
问: 我不理解。fscanf 和 scanf 为啥要“多管闲事”? 那多出来的字符跟 fscanf 和 scanf 有啥关系?
答: 你打印下结果字符串就知道了。
3. 纠正错误用法
3.1 错误用法
// test.c
#include <stdio.h>
int ex1()
{
char buf[2]; // 只申请了两个字符
scanf("%2s", buf); // 读取不超过2个字符
printf("buf is %s\n", buf); // 输出
return 0;
}
int main()
{
ex1();
return 0;
}
编译和运行:
zz@Legion-R7000P% gcc test.c
zz@Legion-R7000P% ./a.out
he
buf is he
乍一看,程序运行良好,觉得“可以收工回家吃饭”了。
3.2 让 AddressSanitizer 告诉你错误
通常是搭配 -fsanitize=address -fno-omit-frame-pointer -g
编译选项使用,运行时第一次内存越界时,程序会终止, 并打印输出错误类型,错误原因。
此处使用 GCC, 你也可以使用 Visual Studio 或 XCode。 Visual Studio 需要 VS2019 >= 16.7 版本, 或 VS2022, 才支持 Address Sanitizer。
zz@Legion-R7000P% gcc test1.c -fsanitize=address -fno-omit-frame-pointer -g
zz@Legion-R7000P% ./a.out
he
=================================================================
==162004==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffc96f1fd32 at pc 0x7fe14c1d786c bp 0x7ffc96f1fbb0 sp 0x7ffc96f1f338
WRITE of size 3 at 0x7ffc96f1fd32 thread T0
#0 0x7fe14c1d786b in scanf_common ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors_format.inc:342
#1 0x7fe14c1d84d3 in __interceptor___isoc99_vscanf ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:1530
#2 0x7fe14c1d85e6 in __interceptor___isoc99_scanf ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:1551
#3 0x55bca1d712ca in ex1 /home/zz/work/lenet_c/test1.c:7
#4 0x55bca1d71354 in main /home/zz/work/lenet_c/test1.c:14
#5 0x7fe14bf7cd8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
#6 0x7fe14bf7ce3f in __libc_start_main_impl ../csu/libc-start.c:392
#7 0x55bca1d71164 in _start (/home/zz/work/lenet_c/a.out+0x1164)
Address 0x7ffc96f1fd32 is located in stack of thread T0 at offset 34 in frame
#0 0x55bca1d71238 in ex1 /home/zz/work/lenet_c/test1.c:5
This frame has 1 object(s):
[32, 34) 'buf' (line 6) <== Memory access at offset 34 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
(longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors_format.inc:342 in scanf_common
Shadow bytes around the buggy address:
0x100012ddbf50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100012ddbf60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100012ddbf70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100012ddbf80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100012ddbf90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x100012ddbfa0: 00 00 f1 f1 f1 f1[02]f3 f3 f3 00 00 00 00 00 00
0x100012ddbfb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100012ddbfc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100012ddbfd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100012ddbfe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100012ddbff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==162004==ABORTING
3.3 解释
scanf/fscanf 并不是“多管闲事”。 C 的字符串总是需要多分配一个字符, 如果 scanf/fscanf 简单粗暴的写入 n 个字符, 那么后续使用 strlen()
等函数时, 无法根据字符串结尾处的 \0
判断结束, 因为此时没有 \0
字符。
在执行 scanf()
/fscanf()
的前、后, 分别打印 buf
字符串的内容, 可以发现读取2个字符后, 第3个字符会被自动设置为 0
.
// test.c
#include <stdio.h>
int ex1()
{
char buf[2];
scanf("%2s", buf);
printf("buf is %s\n", buf);
return 0;
}
void print_buf(char* buf, int len)
{
for (int i = 0; i < 3; i++)
{
printf("buf[%d] = %d\n", i, buf[i]);
}
}
int ex2()
{
char buf[3] = {1, 1, 1};
print_buf(buf, 3);
int n = scanf("%2s", buf);
printf("n = %d\n", n);
printf("-----\n");
print_buf(buf, 3);
return 0;
}
int main()
{
//ex1();
ex2();
return 0;
}
运行如下
zz@Legion-R7000P% gcc test.c -fsanitize=address -fno-omit-frame-pointer -g
zz@Legion-R7000P% ./a.out
buf[0] = 1
buf[1] = 1
buf[2] = 1
he
n = 1
-----
buf[0] = 104
buf[1] = 101
buf[2] = 0
4. 总结
无论是 scanf()
还是 fscanf()
, 都支持读取不超过 n 个字符, 通常目的就是读取 n 个字符, 这样就不用手写循环那么那麻烦了。
而读取不超过 n
个字符, 存储这 n
个字符的字符串, 需要 n+1
个字节的内存空间, 最后一个字符用于存储 \0
:
- 如果内存空间等于
n
个字节, 虽然程序可能不会 crash, 但无法确保总是不 crash。换言之开发过程中总是应该开启 Address Sanitizer 来确保绝对安全正确。 - 如果内存空间大于等于
n+1
个字符, 那么索引为n
的字符将被scanf()
/fscanf()
填充为\0
- 正确用法, 再次举例: 读取(不超过)2个字符,代码为
char buf[3];
int n = scanf("%2s", buf);
printf("buf = %s\n", buf);