观测数据源
目前按照我的理解,和trace相关的常用数据源–探针
大致分为四类。
内核
Trace point
kprobe
用户程序
USDT
uprobe
在用户程序中,USDT是所谓的静态Tracepoint。和内核代码中的Trace point类似。实现方式是在代码开发时,使用USDT的库和头文件。在代码中埋点。在运行时可以通过一些手段使能这些Tracepoint。而uprobe,是在kprobe的基础上沿袭下来。属于是动态探针。实现方式需要依托内核支持。在执行到此指令前或者后,进行代码注入。实现trace。
内核支持
引用参考文献
Uprobe-tracer: Uprobe-based Event Tracing
=========================================
Documentation written by Srikar Dronamraju
Overview
--------
Uprobe based trace events are similar to kprobe based trace events.
To enable this feature, build your kernel with CONFIG_UPROBE_EVENTS=y.
Similar to the kprobe-event tracer, this doesn't need to be activated via
current_tracer. Instead of that, add probe points via
/sys/kernel/debug/tracing/uprobe_events, and enable it via
/sys/kernel/debug/tracing/events/uprobes/<EVENT>/enabled.
However unlike kprobe-event tracer, the uprobe event interface expects the
user to calculate the offset of the probepoint in the object.
6.6. Dynamic Tracing
For kernel analysis, I'm using CONFIG_KPROBES=y and CONFIG_KPROBE_EVENTS=y, to enable kernel dynamic tracing, and CONFIG_FRAME_POINTER=y, for frame pointer-based kernel stacks. For user-level analysis, CONFIG_UPROBES=y and CONFIG_UPROBE_EVENTS=y, for user-level dynamic tracing.
Kernel Config: 3.8.6
Here are some kernel CONFIG options for perf_events functionality:
# for perf_events:
CONFIG_PERF_EVENTS=y
# for stack traces:
CONFIG_FRAME_POINTER=y
# kernel symbols:
CONFIG_KALLSYMS=y
# tracepoints:
CONFIG_TRACEPOINTS=y
# kernel function trace:
CONFIG_FTRACE=y
# kernel-level dynamic tracing:
CONFIG_KPROBES=y
CONFIG_KPROBE_EVENTS=y
# user-level dynamic tracing:
CONFIG_UPROBES=y
CONFIG_UPROBE_EVENTS=y
# full kernel debug info:
CONFIG_DEBUG_INFO=y
# kernel lock tracing:
CONFIG_LOCKDEP=y
# kernel lock tracing:
CONFIG_LOCK_STAT=y
# kernel dynamic tracepoint variables:
CONFIG_DEBUG_INFO=y
You may need to build your own kernel to enable these. The exact set you need depends on your needs and kernel version, and list is likely to grow as new features are added to perf_events.
测试代码
#include <stdio.h>
#include <unistd.h>
static void
print_curr_state_one(void)
{
printf("This is the print current state one function\n");
}
static void
print_curr_state_two(void)
{
printf("This is the print current state two function\n");
}
int main() {
while(1) {
print_curr_state_one();
sleep(1);
print_curr_state_two();
}
}
通过 perf 使用 uprobe
uprobe作为数据源,可以通过多种途径使用。不同的工具实现的功能可能有所差别。
# perf probe
Usage: perf probe [<options>] 'PROBEDEF' ['PROBEDEF' ...]
or: perf probe [<options>] --add 'PROBEDEF' [--add 'PROBEDEF' ...]
or: perf probe [<options>] --del '[GROUP:]EVENT' ...
or: perf probe --list [GROUP:]EVENT ...
or: perf probe [<options>] --funcs
-a, --add <[EVENT=]FUNC[+OFF|%return] [[NAME=]ARG ...]>
probe point definition, where
GROUP: Group name (optional)
EVENT: Event name
FUNC: Function name
OFF: Offset from function entry (in byte)
%return: Put the probe at function return
ARG: Probe argument (kprobe-tracer argument format.)
-D, --definition <[EVENT=]FUNC[+OFF|%return] [[NAME=]ARG ...]>
Show trace event definition of given traceevent for k/uprobe_events.
-d, --del <[GROUP:]EVENT>
delete a probe event.
-f, --force forcibly add events with existing name
-F, --funcs <[FILTER]>
Show potential probe-able functions.
-k, --vmlinux <file> vmlinux pathname
(not built-in because NO_DWARF=1)
-L, --line <FUNC[:RLN[+NUM|-RLN2]]|SRC:ALN[+NUM|-ALN2]>
Show source code lines.
(not built-in because NO_DWARF=1)
-l, --list <[GROUP:]EVENT>
list up probe events
-m, --module <modname|path>
target module name (for online) or path (for offline)
-n, --dry-run dry run
-q, --quiet be quiet (do not show any messages)
-s, --source <directory>
path to kernel source
(not built-in because NO_DWARF=1)
-V, --vars <FUNC[@SRC][+OFF|%return|:RL|;PT]|SRC:AL|SRC;PT>
Show accessible variables on PROBEDEF
(not built-in because NO_DWARF=1)
-v, --verbose be more verbose (show parsed arguments, etc)
-x, --exec <executable|path>
target executable name or path
--cache Manipulate probe cache
--demangle Enable symbol demangling
--demangle-kernel
Enable kernel symbol demangling
--externs Show external variables too (with --vars only)
(not built-in because NO_DWARF=1)
--filter <[!]FILTER>
Set a filter (with --vars/funcs only)
(default: "!__k???tab_* & !__crc_*" for --vars,
"!_*" for --funcs)
--max-probes <n> Set how many probe points can be found for a probe.
--no-inlines Don't search inlined functions
(not built-in because NO_DWARF=1)
--range Show variables location range in scope (with --vars only)
(not built-in because NO_DWARF=1)
--symfs <directory>
Look for files with symbols relative to this directory
--target-ns <pid>
target pid for namespace contexts
其中这个(not built-in because NO_DWARF=1)
很有意思,这是否意味着,不能通过debug info 去获取局部变量?也不能通过行号加probe?
通过搜索perf的源码,发现似乎是编译perf的时候没有开启。
那么我需要自己编译perf? 然后移植?
查看可用的probe并添加记录
查看可以插入探针的函数
# perf probe -x a.out -F
abort@plt
call_weak_fn
completed.8444
data_start
deregister_tm_clones
frame_dummy
main
print_curr_state_one
print_curr_state_two
puts@plt
register_tm_clones
sleep@plt
通过-x
指定执行文件。-F
显示可能被用来插入探针的函数。
插入探针
# perf probe -x a.out -a print_curr_state_one
Added new event:
probe_a:print_curr_state_one (on print_curr_state_one in /home/root/test_uprobe/a.out)
You can now use it in all perf tools, such as:
perf record -e probe_a:print_curr_state_one -aR sleep 1
查看插入的探针
# perf probe -l
probe_a:print_curr_state_one (on print_curr_state_one in /home/root/test_uprobe/a.out)
probe_a:print_curr_state_two (on print_curr_state_two in /home/root/test_uprobe/a.out)
开启记录新增的探针
# perf record -e probe_a:* -a
Couldn't synthesize bpf events.
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.159 MB perf.data (24 samples) ]
查看记录的结果
# perf script
a.out 3198 [000] 15137.303918: probe_a:print_curr_state_two: (40060c)
a.out 3198 [000] 15137.304015: probe_a:print_curr_state_one: (4005ec)
a.out 3198 [000] 15138.304117: probe_a:print_curr_state_two: (40060c)
a.out 3198 [000] 15138.304153: probe_a:print_curr_state_one: (4005ec)
a.out 3198 [000] 15139.304244: probe_a:print_curr_state_two: (40060c)
a.out 3198 [000] 15139.304278: probe_a:print_curr_state_one: (4005ec)
a.out 3198 [000] 15140.304378: probe_a:print_curr_state_two: (40060c)
a.out 3198 [000] 15140.304415: probe_a:print_curr_state_one: (4005ec)
a.out 3198 [001] 15141.304575: probe_a:print_curr_state_two: (40060c)
a.out 3198 [001] 15141.304614: probe_a:print_curr_state_one: (4005ec)
a.out 3198 [001] 15142.304696: probe_a:print_curr_state_two: (40060c)
a.out 3198 [001] 15142.304729: probe_a:print_curr_state_one: (4005ec)
a.out 3198 [001] 15143.304829: probe_a:print_curr_state_two: (40060c)
a.out 3198 [001] 15143.304866: probe_a:print_curr_state_one: (4005ec)
a.out 3198 [001] 15144.304969: probe_a:print_curr_state_two: (40060c)
a.out 3198 [001] 15144.305004: probe_a:print_curr_state_one: (4005ec)
a.out 3198 [001] 15145.305104: probe_a:print_curr_state_two: (40060c)
a.out 3198 [001] 15145.305137: probe_a:print_curr_state_one: (4005ec)
a.out 3198 [000] 15146.305243: probe_a:print_curr_state_two: (40060c)
a.out 3198 [000] 15146.305279: probe_a:print_curr_state_one: (4005ec)
a.out 3198 [000] 15147.305373: probe_a:print_curr_state_two: (40060c)
a.out 3198 [000] 15147.305406: probe_a:print_curr_state_one: (4005ec)
a.out 3198 [000] 15148.305499: probe_a:print_curr_state_two: (40060c)
a.out 3198 [000] 15148.305533: probe_a:print_curr_state_one: (4005ec)
删去不再使用的probe
# perf probe -d probe_a:*
在LTTng中使用
Create or enable a recording event rule to match Linux kernel events created from a dynamic instrumentation point:
lttng [GENERAL OPTIONS] enable-event --kernel
(--probe=LOC | --function=LOC | --userspace-probe=LOC) RECORDNAME
[--session=SESSION] [--channel=CHANNEL]
# lttng enable-event --kernel --userspace-probe=./a.out:print_curr_state_one FUNC_A
kernel event FUNC_A created in channel channel0
# lttng enable-event --kernel --userspace-probe=./a.out:print_curr_state_two FUNC_B
kernel event FUNC_B created in channel channel0
在设定上,uprobe仍然属于是内核提供 所以还是内核的trace事件。
在LTTng中,好像没有找到关于uretprobe的内容。也没有发现类似可以按照行或者抓取局部变量的内容。可能是LTTng没有做。
之后正常使能所有内核Trace事件。
然后记录log。进行分析。
之后可以在可视化工具中查看进程调度,以及进程运行的细节。
在LTTng的log中,没有更多的细节,甚至函数名都没有保留,只有在注册probe的时候自定义的命名。
可能他们推荐大家使用LTTng-UST的USDT吧。
下一步工作
目前实际上是没有实现更细致的观测。例如perf prob -V /L这种。
可能需要重新编译移植perf.或者寻找其他的数据采集和分析工具。
或者有其他的工具可以使用。我甚至想尝试bpf了。但这意味着重新编译内核打开bpf的支持。
参考文献
- Linux K/Uprobe 使用指南 · GitBook (t-head.cn)
- Linux uprobe: User-Level Dynamic Tracing (brendangregg.com)
- https://www.kernel.org/doc/Documentation/trace/uprobetracer.txt
- Linux perf Examples (brendangregg.com)
- lttng-enable-event(1) [v2.13] — LTTng