源码基于:Android U
0. 前言
在前一篇《Android statsd 埋点简析》一文中简单剖析了Android 埋点采集、传输的框架,本文在其基础对埋点信息进行解析,来看下Android 中埋下的内存信息有哪些。
1. 通过代码剖析google 埋点内容
1.1 PROCESS_MEMORY_STATE
frameworks/base/services/core/java/com/android/server/stats/pull/StatsPullAtomService.java
int pullProcessMemoryStateLocked(int atomTag, List<StatsEvent> pulledData) {
List<ProcessMemoryState> processMemoryStates =
LocalServices.getService(ActivityManagerInternal.class)
.getMemoryStateForProcesses();
for (ProcessMemoryState processMemoryState : processMemoryStates) {
final MemoryStat memoryStat = readMemoryStatFromFilesystem(processMemoryState.uid,
processMemoryState.pid);
if (memoryStat == null) {
continue;
}
pulledData.add(FrameworkStatsLog.buildStatsEvent(atomTag, processMemoryState.uid,
processMemoryState.processName, processMemoryState.oomScore, memoryStat.pgfault,
memoryStat.pgmajfault, memoryStat.rssInBytes, memoryStat.cacheInBytes,
memoryStat.swapInBytes, -1 /*unused*/, -1 /*unused*/, -1 /*unused*/));
}
return StatsManager.PULL_SUCCESS;
}
getMemoryStateForProcesses
函数:读取每个应用进程;
readMemoryStatFromFilesystem
函数:读取/proc/<pid>/stat
节点,解析pgfault(9)、pgmajfault(11)、rssInbytes(23) 数据;
统计数据有:
- uid
- processName
- oomScore
- pgmajfault
- rss
- cache(memcg)
- swap (memcg)
1.2 PROCESS_MEMORY_HIGH_WATER_MARK
int pullProcessMemoryHighWaterMarkLocked(int atomTag, List<StatsEvent> pulledData) {
List<ProcessMemoryState> managedProcessList =
LocalServices.getService(ActivityManagerInternal.class)
.getMemoryStateForProcesses();
for (ProcessMemoryState managedProcess : managedProcessList) {
final MemorySnapshot snapshot = readMemorySnapshotFromProcfs(managedProcess.pid);
if (snapshot == null) {
continue;
}
pulledData.add(FrameworkStatsLog.buildStatsEvent(atomTag, managedProcess.uid,
managedProcess.processName,
// RSS high-water mark in bytes.
snapshot.rssHighWaterMarkInKilobytes * 1024L,
snapshot.rssHighWaterMarkInKilobytes));
}
// Complement the data with native system processes
SparseArray<String> processCmdlines = getProcessCmdlines();
managedProcessList.forEach(managedProcess -> processCmdlines.delete(managedProcess.pid));
int size = processCmdlines.size();
for (int i = 0; i < size; ++i) {
...
}
// Invoke rss_hwm_reset binary to reset RSS HWM counters for all processes.
SystemProperties.set("sys.rss_hwm_reset.on", "1");
return StatsManager.PULL_SUCCESS;
}
该函数主要查询所有应用进程和native 进程的内存信息。
getMemoryStateForProcesses
函数:读取每个应用进程;
readMemorySnapshotFromProcfs
函数:读取/proc/<pid>/status
节点,解析Uid
、VmHWM
、VmRss
、RssAnon
、RssShmem
、VmSwap
数据。通过判定/proc/<pid>/status
节点中是否有RssAnon
、RssShmem
、VmSwap
数据排除 kernel 进程;
最后通过设置 prop 唤醒 rss_hwm_reset 程序,将VmHWM 清除。
统计数据有:
- uid
- processName / cmdline (native是cmdline)
- VmHWM
1.3 PROCESS_MEMORY_SNAPSHOT
同上 HWM,统计每一个应用进程和native 进程的内存快照,区别在于这里另外统计了每个进程的 GPU 使用量:sys/fs/bpf/map_fpuMem_gpu_mem_total_map
统计数据有:
- uid
- processName / cmdline(native是cmdline)
- pid
- oomScore
- rss
- rss_anon
- swap
- rss_anon + swap
- gpu memory
- hasForegroundServices (native 为false)
- rss_shmem
1.4 SYSTEM_ION_HEAP_SIZE
int pullSystemIonHeapSizeLocked(int atomTag, List<StatsEvent> pulledData) {
final long systemIonHeapSizeInBytes = readSystemIonHeapSizeFromDebugfs();
pulledData.add(FrameworkStatsLog.buildStatsEvent(atomTag, systemIonHeapSizeInBytes));
return StatsManager.PULL_SUCCESS;
}
解析 sys/kernel/debug/ion/heaps/system
节点total 部分的数据。
1.5 ION_HEAP_SIZE
int pullIonHeapSizeLocked(int atomTag, List<StatsEvent> pulledData) {
int ionHeapSizeInKilobytes = (int) getIonHeapsSizeKb();
pulledData.add(FrameworkStatsLog.buildStatsEvent(atomTag, ionHeapSizeInKilobytes));
return StatsManager.PULL_SUCCESS;
}
调用 Debug.getIonHeapsSizeKb
,详细可以查看 android_os_Debug.cpp
解析/sys/kernel/ion/total_heaps_kb
1.6 PROCESS_SYSTEM_ION_HEAP_SIZE
int pullProcessSystemIonHeapSizeLocked(int atomTag, List<StatsEvent> pulledData) {
List<IonAllocations> result = readProcessSystemIonHeapSizesFromDebugfs();
for (IonAllocations allocations : result) {
pulledData.add(FrameworkStatsLog.buildStatsEvent(atomTag, getUidForPid(allocations.pid),
readCmdlineFromProcfs(allocations.pid),
(int) (allocations.totalSizeInBytes / 1024), allocations.count,
(int) (allocations.maxSizeInBytes / 1024)));
}
return StatsManager.PULL_SUCCESS;
}
readProcessSystemIonHeapSizesFromDebugfs
解析sys/kernel/debug/ion/heaps/system
节点进程部分数据。
1.7 PROCESS_DMABUF_MEMORY
int pullProcessDmabufMemory(int atomTag, List<StatsEvent> pulledData) {
KernelAllocationStats.ProcessDmabuf[] procBufs =
KernelAllocationStats.getDmabufAllocations();
if (procBufs == null) {
return StatsManager.PULL_SKIP;
}
for (KernelAllocationStats.ProcessDmabuf procBuf : procBufs) {
pulledData.add(FrameworkStatsLog.buildStatsEvent(
atomTag,
procBuf.uid,
procBuf.processName,
procBuf.oomScore,
procBuf.retainedSizeKb,
procBuf.retainedBuffersCount,
0, /* mapped_dmabuf_kb - deprecated */
0, /* mapped_dmabuf_count - deprecated */
procBuf.surfaceFlingerSizeKb,
procBuf.surfaceFlingerCount
));
}
return StatsManager.PULL_SUCCESS;
}
getDmabufAllocations
函数主要是调用 dmabufinfo.cpp 中ReadProcfsDmaBufs
函数获取进程dmabuf 信息。
统计数据有:
- uid
- cmdline
- oomScore
- total (KB)
- inode count
- surfaceflinger size (KB)
- surfaceflinger inode cnt
1.8 SYSTEM_MEMORY
int pullSystemMemory(int atomTag, List<StatsEvent> pulledData) {
SystemMemoryUtil.Metrics metrics = SystemMemoryUtil.getMetrics();
pulledData.add(
FrameworkStatsLog.buildStatsEvent(
atomTag,
metrics.unreclaimableSlabKb, //meminfo.SUnreclaim
metrics.vmallocUsedKb, //meminfo.VmallocUsed
metrics.pageTablesKb, //meminfo.PageTables
metrics.kernelStackKb, //meminfo.KernelStack
metrics.totalIonKb,
metrics.unaccountedKb,
metrics.gpuTotalUsageKb,
metrics.gpuPrivateAllocationsKb,
metrics.dmaBufTotalExportedKb,
metrics.shmemKb, //meminfo.Shmem
metrics.totalKb, //meminfo.MemTotal
metrics.freeKb, //meminfo.MemFree
metrics.availableKb, //meminfo.MemAvailable
metrics.activeKb, //meminfo.Active
metrics.inactiveKb, //meminfo.Inactive
metrics.activeAnonKb, //meminfo.Active(anon)
metrics.inactiveAnonKb, //meminfo.Inactive(anon)
metrics.activeFileKb, //meminfo.Active(file)
metrics.inactiveFileKb, //meminfo.Inactive(file)
metrics.swapTotalKb, //meminfo.SwapTotal
metrics.swapFreeKb, //meminfo.SwapFree
metrics.cmaTotalKb, //meminfo.CmaTotal
metrics.cmaFreeKb)); //meminfo.CmaFree
return StatsManager.PULL_SUCCESS;
}
totalIonKb:统计/
sys
/
kernel
/
dmabuf
/
buffers
下所有定义在 /dev/dma_heap 的 exporter的总大小;如果不支持dmabuf,那就统计/sys/kernel/ion/total_heaps_kb
节点;
gpuTotalUsageKb:解析节点 /
sys
/
fs
/
bpf
/
map_gpuMem_gpu_mem_total_map
gpuPrivateAllocationsKb:获取GPU private
dmaBufTotalExportedKb:统计/
sys
/
kernel
/
dmabuf
/
buffers
下dmabuf 总和;
unaccountedKb:meminfo.MemTotal - accountedKb;
accountedKb 包括:
meminfo.MemFree + zram + meminfo.Buffers + meminfo.active + meminfo.inactive + meminfo.Unevictable + meminfo.SUnreclaim + meminfo.KReclaimable + meminfo.VmallocUsed + meminfo.PageTables + meminfo.KernelStack + dmaBufTotalExportedKb + gpuPrivateAllocationsKb
1.9 VMSTAT
int pullVmStat(int atomTag, List<StatsEvent> pulledData) {
ProcfsMemoryUtil.VmStat vmStat = ProcfsMemoryUtil.readVmStat();
if (vmStat != null) {
pulledData.add(
FrameworkStatsLog.buildStatsEvent(
atomTag,
vmStat.oomKillCount));
}
return StatsManager.PULL_SUCCESS;
}
只统计 oom_kill 的次数。
2. 通过看板剖析 google 埋点内容
2.1 RSS hwm
结合代码第 1.2 节应该是统计每个进程的 hwm,其中包含顺序、倒序显示,显示的数值应该是平均值 ± 体现最大值和最小值。
Metric details 有可能显示更多的分位数信息。
从看板数据来看,三方的应用占用内存较大,例如 com.tencent.ig 和 com.roblox.client,后期内存健康优化可以考虑三方应用给系统带来的压力,也需要确定应用在后台时的内存占用。这里可以优先查看这些进程的anon RSS + swap 的内存占用,确定是否存在内存泄漏。
2.2 P95 anon RSS + swap
结合代码第 1.3 节应该是统计每个进程的 anon RSS + swap 高于P95 的分布。
Metric details 可能有更多分位数的分布。
从看板数据来看,三方的应用占用匿名页内存较大,可能存在内存泄露的可能。可以查看details
anon RSS + swap 中包含leaked、unused 内存,这些都会swap out 到zram,需要限制这个阈值。
2.3 ION heap Size
这里应该统计的是dmabuf,结合代码第 1.8 节。
Distribution details 中可能有每个进程的 dmabuf 的分布。
从看板数据来看,有还有1% 的进程使用DMABUF超过了910M,需要通过details 进行细细确认进程占用。