一、前言
在Service组件StartService()方式启动流程分析文章中,针对Context#startService()启动Service流程分析了源码,其实关于Service启动还有一个比较重要的点是Service启动的ANR,因为因为线上出现了上百例的"executing service " + service.shortName的异常。
二、Service-ANR原理
2.1 Service启动ANR原理简述
Service的ANR触发原理,是在启动Service前使用Handler发送一个延时的Message(埋炸弹过程),然后在Service启动完成后remove掉这个Message(拆炸弹过程)。如果在指定的延迟时间内没有remove掉这个Message,那么就会触发ANR(没有在炸弹爆炸前拆掉就会爆炸),弹出AppNotResponding的弹窗。
其实这个机制跟Windows/MacOS的应用程序无响应,是类似的交互设计。
2.2 前台Service VS 后台Service的区别
2.2.1 前台Service
前台Service是一种在通知栏中显示持续通知的服务,它通常用于执行用户明确知晓的任务,比如音乐播放器、定位服务等。前台Service在系统内部被视为用户正在主动使用的组件,因此它具有更高的优先级和较低的系统资源限制。在使用前台Service时,必须在通知栏中显示一个通知,以告知用户有一个正在运行的Service,并且通常还应该提供一些与该Service相关的有用信息。
2.2.3 后台Service
后台Service是一种不会在通知栏中显示通知的服务。它用于执行一些不需要用户直接交互或注意的任务,例如数据同步、网络请求等。后台Service具有较低的系统优先级,系统可能会在资源紧张的情况下终止这些服务,以释放资源。
2.3 Service启动ANR源码执行过程
ps: 还是基于Android SDK28源码分析
基于文章:Service组件StartService()方式启动流程分析的总结,我们已经很清楚,通过startService的方式启动Service的源码过程。因此,本文直接从com.android.server.am.ActiveServices#bringUpServiceLocked方法的源码开始分析,如有不清楚前置的启动流程的同学,可以参考我之前的文章,然后打开AS对照看下这部分的代码。
2.3.1 ActiveServices#bringUpServiceLocked
private String bringUpServiceLocked(ServiceRecord r, int intentFlags, boolean execInFg,
boolean whileRestarting, boolean permissionsReviewRequired)
throws TransactionTooLargeException {
if (r.app != null && r.app.thread != null) {
sendServiceArgsLocked(r, execInFg, false);
return null;
}
realStartServiceLocked(r, app, execInFg);
}
2.3.2 ActiveServices#realStartServiceLocked
private final void realStartServiceLocked(ServiceRecord r,
ProcessRecord app, boolean execInFg) throws RemoteException {
// ...
// 会转调到scheduleServiceTimeoutLocked(r.app)方法进行埋炸弹操作
bumpServiceExecutingLocked(r, execInFg, "create");
try {
app.thread.scheduleCreateService(r, r.serviceInfo,
mAm.compatibilityInfoForPackageLocked(r.serviceInfo.applicationInfo),
app.repProcState);
} catch (DeadObjectException e) {
mAm.appDiedLocked(app);
throw e;
} finally {
// serviceDoneExecutingLocked方法内会拆除炸弹
serviceDoneExecutingLocked(r, inDestroying, inDestroying);
// ...
}
}
2.3.3 埋炸弹过程:ActiveServices#bumpServiceExecutingLocked
private final void bumpServiceExecutingLocked(ServiceRecord r, boolean fg, String why) {
// ...
scheduleServiceTimeoutLocked(r.app);
// ...
}
- com.android.server.am.ActiveServices#scheduleServiceTimeoutLocked
void scheduleServiceTimeoutLocked(ProcessRecord proc) {
if (proc.executingServices.size() == 0 || proc.thread == null) {
return;
}
Message msg = mAm.mHandler.obtainMessage(
ActivityManagerService.SERVICE_TIMEOUT_MSG);
msg.obj = proc;
mAm.mHandler.sendMessageDelayed(msg,
proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
}
- 这里需要知道,炸弹的爆炸时间在ActiveServices中定义了三个:
// 前台Service的超时时间是20s
static final int SERVICE_TIMEOUT = 20*1000;
// 后台Service的超时时间是200s,是前台超时时间的10倍
static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;
// How long the startForegroundService() grace period is to get around to
// calling startForeground() before we ANR + stop it.
static final int SERVICE_START_FOREGROUND_TIMEOUT = 10*1000;
2.3.4 拆炸弹过程:ActiveServices#serviceDoneExecutingLocked
private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying,
boolean finishing) {
// ...
mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);
// ...
}
2.3.5 炸弹爆炸出发ANR弹窗过程
如上文分析的,ANR弹窗其实就是一个sendMessageDelayed()方式发送的一个Message,想要了解ANR炸弹这么爆炸的,其实检索这个what值为ActivityManagerService.SERVICE_TIMEOUT_MSG的消息处理过程即可。
这个消息的Handler对应的handleMessage方法实现代码在AMS.java中。
- com.android.server.am.ActivityManagerService.MainHandler#handleMessage
final class MainHandler extends Handler {
public MainHandler(Looper looper) {
super(looper, null, true);
}
@Override
public void handleMessage(Message msg) {
switch (msg.what) {
// ...
case SERVICE_TIMEOUT_MSG: {
mServices.serviceTimeout((ProcessRecord)msg.obj);
} break;
case SERVICE_FOREGROUND_TIMEOUT_MSG: {
mServices.serviceForegroundTimeout((ServiceRecord)msg.obj);
} break;
case SERVICE_FOREGROUND_CRASH_MSG: {
mServices.serviceForegroundCrash(
(ProcessRecord) msg.obj, msg.getData().getCharSequence(SERVICE_RECORD_KEY));
} break;
// ...
}
}
- com.android.server.am.ActiveServices#serviceTimeout
void serviceTimeout(ProcessRecord proc) {
String anrMessage = null;
synchronized(mAm) {
if (proc.executingServices.size() == 0 || proc.thread == null) {
return;
}
final long now = SystemClock.uptimeMillis();
final long maxTime = now -
(proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
ServiceRecord timeout = null;
long nextTime = 0;
for (int i=proc.executingServices.size()-1; i>=0; i--) {
ServiceRecord sr = proc.executingServices.valueAt(i);
if (sr.executingStart < maxTime) {
timeout = sr;
break;
}
if (sr.executingStart > nextTime) {
nextTime = sr.executingStart;
}
}
if (timeout != null && mAm.mLruProcesses.contains(proc)) {
Slog.w(TAG, "Timeout executing service: " + timeout);
StringWriter sw = new StringWriter();
PrintWriter pw = new FastPrintWriter(sw, false, 1024);
pw.println(timeout);
timeout.dump(pw, " ");
pw.close();
mLastAnrDump = sw.toString();
mAm.mHandler.removeCallbacks(mLastAnrDumpClearer);
mAm.mHandler.postDelayed(mLastAnrDumpClearer, LAST_ANR_LIFETIME_DURATION_MSECS);
// 这一句对于我们分析问题比较关键,如果有Service的ANR,
// 就会在log中有这样的前缀打印:executing service Service.shortName
anrMessage = "executing service " + timeout.shortName;
} else {
Message msg = mAm.mHandler.obtainMessage(
ActivityManagerService.SERVICE_TIMEOUT_MSG);
msg.obj = proc;
mAm.mHandler.sendMessageAtTime(msg, proc.execServicesFg
? (nextTime+SERVICE_TIMEOUT) : (nextTime + SERVICE_BACKGROUND_TIMEOUT));
}
}
// 真正触发ANR弹窗的位置
if (anrMessage != null) {
mAm.mAppErrors.appNotResponding(proc, null, null, false, anrMessage);
}
}
这里记住anrMessage的格式是:executing service Service.shortName,代表的是Service的启动超时。
- com.android.server.am.AppErrors#appNotResponding
final void appNotResponding(ProcessRecord app, ActivityRecord activity, ActivityRecord parent, boolean aboveSystem, final String annotation) {
if (mService.mController != null) {
try {
// 0 == continue, -1 = kill process immediately
// !!关键:mService.mController的实现类是:
// com.android.server.am.ActivityManagerShellCommand.MyActivityController
int res = mService.mController.appEarlyNotResponding(
app.processName, app.pid, annotation);
if (res < 0 && app.pid != MY_PID) {
app.kill("anr", true);
}
} catch (RemoteException e) {
mService.mController = null;
Watchdog.getInstance().setActivityController(null);
}
long anrTime = SystemClock.uptimeMillis();
if (ActivityManagerService.MONITOR_CPU_USAGE) {
mService.updateCpuStatsNow();
}
// Unless configured otherwise, swallow ANRs in background processes
// & kill the process.
// 读取开发者选项中的“显示后台ANR”开关
boolean showBackground = Settings.Secure.getInt(mContext.getContentResolver(),
Settings.Secure.ANR_SHOW_BACKGROUND, 0) != 0;
boolean isSilentANR;
// Don't dump other PIDs if it's a background ANR
isSilentANR = !showBackground && !isInterestingForBackgroundTraces(app);
// Log the ANR to the main log.
StringBuilder info = new StringBuilder();
info.setLength(0);
// 这里可以看出写入到日志文件中的格式是:ANR in processName,
// 比如:我们应用包名是com.techmix.myapp,
// 那可在搜索时,直接输入ANR in com.techminx.myapp搜索,可更高效定位到ANR的trace位置
info.append("ANR in ").append(app.processName);
if (activity != null && activity.shortComponentName != null) {
info.append(" (").append(activity.shortComponentName).append(")");
}
info.append("\n");
info.append("PID: ").append(app.pid).append("\n");
if (annotation != null) {
// 这里的reason还是serviceTimeout中定义的Service启动的anrMessage字符串:
// "executing service Service.shortName",没有多余的更明细的分类了,具体是哪一步ANR了。
info.append("Reason: ").append(annotation).append("\n");
}
if (parent != null && parent != activity) {
info.append("Parent: ").append(parent.shortComponentName).append("\n");
}
ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);
// For background ANRs, don't pass the ProcessCpuTracker to
// avoid spending 1/2 second collecting stats to rank lastPids.
File tracesFile = ActivityManagerService.dumpStackTraces(true, firstPids,
(isSilentANR) ? null : processCpuTracker,
(isSilentANR) ? null : lastPids, nativePids);
// 写入cpu占用信息到anr log中
String cpuInfo = null;
if (ActivityManagerService.MONITOR_CPU_USAGE) {
mService.updateCpuStatsNow();
synchronized (mService.mProcessCpuTracker) {
cpuInfo = mService.mProcessCpuTracker.printCurrentState(anrTime);
}
info.append(processCpuTracker.printCurrentLoad());
info.append(cpuInfo);
}
info.append(processCpuTracker.printCurrentState(anrTime));
// ANR log写入到dropbox文件夹中,annotation变量就是
// com.android.server.am.ActiveServices#serviceTimeout中传入的anrMessage变量
mService.addErrorToDropBox("anr", app, app.processName, activity, parent, annotation,
cpuInfo, tracesFile, null);
synchronized (mService) {
// 静默ANR的定义?
if (isSilentANR) {
app.kill("bg anr", true);
return;
}
// 通过Handler发送ANR弹窗的dialog,这里直接跟进
// ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG这个消息的handleMessage处理逻辑即可
Message msg = Message.obtain();
msg.what = ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG;
msg.obj = new AppNotRespondingDialog.Data(app, activity, aboveSystem);
// 注意这里的mService是AMS
mService.mUiHandler.sendMessage(msg);
}
}
- 静默ANR的定义?
- 弹出ANR弹窗的逻辑代码:com.android.server.am.ActivityManagerService.UiHandler#handleMessage
case SHOW_NOT_RESPONDING_UI_MSG: {
// 还是转调到AppErrors中的方法去实现了,所以这里只是用AMS中的UiHandler切换了一下线程而已
// 最开始的startService方法,从应用主线程
// ContextImpl#startService->AMS#startService(),后者其实是执行在binder线程池的线程里
// 面的,是子线程。所以这里通过消息的方式切到主线程
mAppErrors.handleShowAnrUi(msg);
ensureBootCompleted();
} break;
- com.android.server.am.AppErrors#handleShowAnrUi
// 跟ANR相关的变量直接存储在了ProcessRecord.java类中,每个进程单独维护一个
boolean notResponding; // does the app have a not responding dialog?
Dialog anrDialog; // dialog being displayed due to app not resp.
void handleShowAnrUi(Message msg) {
Dialog dialogToShow = null;
synchronized (mService) {
AppNotRespondingDialog.Data data = (AppNotRespondingDialog.Data) msg.obj;
final ProcessRecord proc = data.proc;
if (proc == null) {
Slog.e(TAG, "handleShowAnrUi: proc is null");
return;
}
if (proc.anrDialog != null) {
Slog.e(TAG, "App already has anr dialog: " + proc);
MetricsLogger.action(mContext, MetricsProto.MetricsEvent.ACTION_APP_ANR,
AppNotRespondingDialog.ALREADY_SHOWING);
return;
}
// 这个ANR的广播,应用进程如果注册了能接收到吗?
Intent intent = new Intent("android.intent.action.ANR");
if (!mService.mProcessesReady) {
intent.addFlags(Intent.FLAG_RECEIVER_REGISTERED_ONLY
| Intent.FLAG_RECEIVER_FOREGROUND);
}
mService.broadcastIntentLocked(null, null, intent,
null, null, 0, null, null, null, AppOpsManager.OP_NONE,
null, false, false, MY_PID, Process.SYSTEM_UID, 0 /* TODO: Verify */);
boolean showBackground = Settings.Secure.getInt(mContext.getContentResolver(),
Settings.Secure.ANR_SHOW_BACKGROUND, 0) != 0;
if (mService.canShowErrorDialogs() || showBackground) {
dialogToShow = new AppNotRespondingDialog(mService, mContext, data);
proc.anrDialog = dialogToShow;
} else {
MetricsLogger.action(mContext, MetricsProto.MetricsEvent.ACTION_APP_ANR,
AppNotRespondingDialog.CANT_SHOW);
// 如果ANR弹窗是关闭状态下,直接kill当前应用进程了
// 跟ANR弹窗中点关闭应用一样,多是调用AMS#killAppAtUsersRequest方法
// 关闭当前进程
mService.killAppAtUsersRequest(proc, null);
}
}
// If we've created a crash dialog, show it without the lock held
if (dialogToShow != null) {
dialogToShow.show();
}
}
- com.android.server.am.AppErrors#killAppAtUserRequestLocked
void killAppAtUserRequestLocked(ProcessRecord app, Dialog fromDialog) {
app.crashing = false;
app.crashingReport = null;
app.notResponding = false;
app.notRespondingReport = null;
if (app.anrDialog == fromDialog) {
app.anrDialog = null;
}
if (app.waitDialog == fromDialog) {
app.waitDialog = null;
}
// 这里的MY_PID是定义在AMS中的:static final int MY_PID = myPid();
// 所以这里只要是有效的应用pid,都是能进入if逻辑分支中的
if (app.pid > 0 && app.pid != MY_PID) {
handleAppCrashLocked(app, "user-terminated" /*reason*/,
null /*shortMsg*/, null /*longMsg*/, null /*stackTrace*/, null /*data*/);
app.kill("user request after error", true);
}
}
app.kill()调用的是ProcessRecord#kill(),最终转调到AMS#killProcessGroup()方法了。这里AMS方式kill掉进程的,在Android的logcat中其实都能搜索到,Activity Manager killing的字样的。