案例说明:
如下图所示:KingbaseES服务进程结构
KingbaseES使用客户端/服务器的模型。 对于每个客户端的连接,KingbaseES主进程接收到客户端连接后,会为其创建一个新的服务进程。 KingbaseES 用服务进程来处理连接到数据库服务的客户端请求。 该进程负责实际处理客户端的数据库请求,连接断开时退出。当Client连接到数据库时,会有对应的kingbase的服务进程为其提供服务,如以下Client查询访问:
如下所示,操作系统对应的服务进程(backend process):
当Client结束访问正常退出数据库连接时,对应的kingbase的服务进程也将结束;但是当客户端异常退出时,会导致数据库端的kingbase服务进程没有正常结束,并占用数据库资源,本案例将详细描述手工方式对服务进程(backend process)终止。手工结束backend process可以使用数据库工具或者操作系统的kill进程方式,但是不同方式对数据库造成的影响不同。
适用版本: KingbaseES V8R3/R6
系统架构:
一、客户端访问
[kingbase@node1 bin]$ ./ksql -h 192.168.8.201 -U system -W prod
Password:
ksql (V8.0)
Type "help" for help.
prod=# select count(*) from t1;
count
--------
100000
(1 row)
二、backend process终止方案
1、pg_terminate_backend(pid)方式
Tips:函数 pg_terminate_backend() 实际上是给进程发送了一个 SIGTERM 信号。
# 查询backend process状态信息
prod=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | quer
y_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+---------+-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+------------+--------------
-----------------+-------------------------------+-----------------+------------+-------+-------------+--------------+--------------------------+----------------
16385 | prod | 17100 | 10 | system | kingbase_*&+_ | 192.168.8.200 | | 57476 | 2022-11-29 15:04:57.131987+08 | | 2022-11-29 15
:05:05.526379+08 | 2022-11-29 15:05:05.539018+08 | Client | ClientRead | idle | | | select count(*) from t1; | client backend
(1 row)
#终止backend process对应的pid
prod=# select pg_terminate_backend(17100);
pg_terminate_backend
----------------------
t
(1 row)
#进程被终止
prod=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait
_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+---------+-----+----------+---------+------------------+-------------+-----------------+-------------+---------------+------------+-------------+--------------+-----
------------+------------+-------+-------------+--------------+-------+--------------
(0 rows)
#如下所示,数据库进程正常,对应的backend pross被安全终止
[root@node2 sys_log]# ps -ef |grep kingbase
kingbase 13089 1 0 14:39 ? 00:00:00 /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/bin/kingbase -D /db/kingbase/v8r6_054/data
kingbase 13090 13089 0 14:39 ? 00:00:00 kingbase: logger
kingbase 16474 13089 0 14:59 ? 00:00:00 kingbase: checkpointer
kingbase 16475 13089 0 14:59 ? 00:00:00 kingbase: background writer
kingbase 16476 13089 0 14:59 ? 00:00:00 kingbase: walwriter
kingbase 16477 13089 0 14:59 ? 00:00:00 kingbase: autovacuum launcher
kingbase 16478 13089 0 14:59 ? 00:00:00 kingbase: stats collector
kingbase 16479 13089 0 14:59 ? 00:00:00 kingbase: ksh writer
kingbase 16480 13089 0 14:59 ? 00:00:00 kingbase: ksh collector
kingbase 16481 13089 0 14:59 ? 00:00:00 kingbase: kwr collector
kingbase 16482 13089 0 14:59 ? 00:00:00 kingbase: logical replication launcher
kingbase 17100 13089 0 15:04 ? 00:00:00 kingbase: system prod 192.168.8.200(57476) idle
kingbase 17212 12416 0 15:05 pts/0 00:00:00 ./ksql -U system test
kingbase 17214 13089 0 15:05 ? 00:00:00 kingbase: system prod [local] idle
2、操作系统kill pid方式
#查询backend process状态信息
prod=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | quer
y_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+---------+-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+------------+--------------
-----------------+-------------------------------+-----------------+------------+-------+-------------+--------------+--------------------------+----------------
16385 | prod | 18424 | 10 | system | kingbase_*&+_ | 192.168.8.200 | | 57480 | 2022-11-29 15:15:26.813035+08 | | 2022-11-29 15
:15:28.912910+08 | 2022-11-29 15:15:28.922719+08 | Client | ClientRead | idle | | | select count(*) from t1; | client backend
(1 row)
#操作系统下执行kill pid结束进程
[root@node2 sys_log]# kill 18424
#如下所示,数据库进程正常,对应的backend pross被安全kill
[root@node2 sys_log]# ps -ef |grep kingbase
kingbase 13089 1 0 14:39 ? 00:00:00 /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/bin/kingbase -D /db/kingbase/v8r6_054/data
kingbase 13090 13089 0 14:39 ? 00:00:00 kingbase: logger
kingbase 16474 13089 0 14:59 ? 00:00:00 kingbase: checkpointer
kingbase 16475 13089 0 14:59 ? 00:00:00 kingbase: background writer
kingbase 16476 13089 0 14:59 ? 00:00:00 kingbase: walwriter
kingbase 16477 13089 0 14:59 ? 00:00:00 kingbase: autovacuum launcher
kingbase 16478 13089 0 14:59 ? 00:00:00 kingbase: stats collector
kingbase 16479 13089 0 14:59 ? 00:00:00 kingbase: ksh writer
kingbase 16480 13089 0 14:59 ? 00:00:00 kingbase: ksh collector
kingbase 16481 13089 0 14:59 ? 00:00:00 kingbase: kwr collector
kingbase 16482 13089 0 14:59 ? 00:00:00 kingbase: logical replication launcher
kingbase 17212 12416 0 15:05 pts/0 00:00:00 ./ksql -U system test
kingbase 17214 13089 0 15:05 ? 00:00:00 kingbase: system prod [local] idle
#在数据库视图中已经无此backend process记录
prod=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait
_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+---------+-----+----------+---------+------------------+-------------+-----------------+-------------+---------------+------------+-------------+--------------+-----
------------+------------+-------+-------------+--------------+-------+--------------
(0 rows)
3、操作系统kill -15 pid和数据库sys_ctl kill TERM PID
1)操作系统kill -15 pid
#查询backend process状态信息
test=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | quer
y_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+---------+-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+------------+--------------
-----------------+-------------------------------+-----------------+------------+-------+-------------+--------------+--------------------------+----------------
16385 | prod | 22955 | 10 | system | kingbase_*&+_ | 192.168.8.200 | | 57498 | 2022-11-29 15:44:29.993305+08 | | 2022-11-29 15
:44:32.090913+08 | 2022-11-29 15:44:32.100617+08 | Client | ClientRead | idle | | | select count(*) from t1; | client backend
(1 row)
#操作系统下执行kill -15 pid结束进程
[kingbase@node2 bin]$ kill -15 22955
[kingbase@node2 bin]$ ps -ef |grep kingbase
#如下所示,数据库进程正常,对应的backend pross被安全kill
kingbase 13089 1 0 14:39 ? 00:00:00 /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/bin/kingbase -D /db/kingbase/v8r6_054/data
kingbase 22197 13089 0 15:38 ? 00:00:00 kingbase: checkpointer
kingbase 22198 13089 0 15:38 ? 00:00:00 kingbase: background writer
kingbase 22199 13089 0 15:38 ? 00:00:00 kingbase: walwriter
kingbase 22200 13089 0 15:38 ? 00:00:00 kingbase: autovacuum launcher
kingbase 22201 13089 0 15:38 ? 00:00:00 kingbase: stats collector
kingbase 22202 13089 0 15:38 ? 00:00:00 kingbase: ksh writer
kingbase 22203 13089 0 15:38 ? 00:00:00 kingbase: ksh collector
kingbase 22204 13089 0 15:38 ? 00:00:00 kingbase: kwr collector
kingbase 22205 13089 0 15:38 ? 00:00:00 kingbase: logical replication launcher
kingbase 22444 12416 0 15:40 pts/0 00:00:00 ./ksql -U system test
kingbase 22445 13089 0 15:40 ? 00:00:00 kingbase: system test [local] idle
#在数据库视图中已经无此backend process记录
test=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait
_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+---------+-----+----------+---------+------------------+-------------+-----------------+-------------+---------------+------------+-------------+--------------+-----
------------+------------+-------+-------------+--------------+-------+--------------
(0 rows)
2)数据库sys_ctl kill TERM pid
#查询backend process状态信息
test=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | quer
y_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+---------+-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+------------+--------------
-----------------+-------------------------------+-----------------+------------+-------+-------------+--------------+--------------------------+----------------
16385 | prod | 22443 | 10 | system | kingbase_*&+_ | 192.168.8.200 | | 57494 | 2022-11-29 15:40:42.804868+08 | | 2022-11-29 15
:40:44.972533+08 | 2022-11-29 15:40:44.985340+08 | Client | ClientRead | idle | | | select count(*) from t1; | client backend
(1 row)
#执行数据库命令kill进程
[kingbase@node2 bin]$ ./sys_ctl kill TERM 22443
#如下所示,数据库进程正常,对应的backend pross被安全kill
[kingbase@node2 bin]$ ps -ef |grep kingbase
kingbase 13089 1 0 14:39 ? 00:00:00 /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/bin/kingbase -D /db/kingbase/v8r6_054/data
kingbase 22197 13089 0 15:38 ? 00:00:00 kingbase: checkpointer
kingbase 22198 13089 0 15:38 ? 00:00:00 kingbase: background writer
kingbase 22199 13089 0 15:38 ? 00:00:00 kingbase: walwriter
kingbase 22200 13089 0 15:38 ? 00:00:00 kingbase: autovacuum launcher
kingbase 22201 13089 0 15:38 ? 00:00:00 kingbase: stats collector
kingbase 22202 13089 0 15:38 ? 00:00:00 kingbase: ksh writer
kingbase 22203 13089 0 15:38 ? 00:00:00 kingbase: ksh collector
kingbase 22204 13089 0 15:38 ? 00:00:00 kingbase: kwr collector
kingbase 22205 13089 0 15:38 ? 00:00:00 kingbase: logical replication launcher
kingbase 22444 12416 0 15:40 pts/0 00:00:00 ./ksql -U system test
kingbase 22445 13089 0 15:40 ? 00:00:00 kingbase: system test [local] idle
#在数据库视图中已经无此backend process记录
test=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait
_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+---------+-----+----------+---------+------------------+-------------+-----------------+-------------+---------------+------------+-------------+--------------+-----
------------+------------+-------+-------------+--------------+-------+--------------
(0 rows)
4、操作系统kill -3 pid和数据库sys_ctl kill QUIT PID
1)操作系统kill -3 pid
#查询backend process状态信息
prod=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | quer
y_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query
| backend_type
-------+---------+-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+------------+--------------
-----------------+-------------------------------+-----------------+------------+-------+-------------+--------------+-------------------------------------------------------
--------------+----------------
16385 | prod | 18666 | 10 | system | kingbase_*&+_ | 192.168.8.200 | | 57486 | 2022-11-29 15:17:34.726155+08 | | 2022-11-29 15
:17:34.736202+08 | 2022-11-29 15:17:34.740584+08 | Client | ClientRead | idle | | | select setting from pg_settings where name = 'enable_u
pper_colname' | client backend
(1 row)
#操作系统下执行kill -3 pid结束进程
[root@node2 sys_log]# kill -3 18666
#如下所示,除了对应的backend process被终止外,其余的后台辅助进程也被终止并重启
[root@node2 sys_log]# ps -ef |grep kingbase
kingbase 13089 1 0 14:39 ? 00:00:00 /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/bin/kingbase -D /db/kingbase/v8r6_054/data
kingbase 13090 13089 0 14:39 ? 00:00:00 kingbase: logger
kingbase 17212 12416 0 15:05 pts/0 00:00:00 ./ksql -U system test
kingbase 18795 13089 0 15:18 ? 00:00:00 kingbase: startup
#查看对应的sys_log日志(backend process被终止导致数据库的辅助进程也将重启)
prod=# select * from sys_stat_activity where client_addr='192.168.8.200';
WARNING: terminating connection because of crash of another server process
DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
2022-11-29 15:18:06.741 CST [18666] WARNING: terminating connection because of crash of another server process
2022-11-29 15:18:06.741 CST [18666] DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-11-29 15:18:06.741 CST [18666] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-11-29 15:18:06.742 CST [13089] LOG: server process (PID 18666) exited with exit code 2
2022-11-29 15:18:06.742 CST [13089] DETAIL: Failed process was running: select setting from pg_settings where name = 'enable_upper_colname'
2022-11-29 15:18:06.742 CST [13089] LOG: terminating any other active server processes
2022-11-29 15:18:06.743 CST [17214] WARNING: terminating connection because of crash of another server process
2022-11-29 15:18:06.743 CST [17214] DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-11-29 15:18:06.743 CST [17214] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-11-29 15:18:06.744 CST [16477] WARNING: terminating connection because of crash of another server process
2022-11-29 15:18:06.744 CST [16477] DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-11-29 15:18:06.744 CST [16477] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-11-29 15:18:06.745 CST [13089] LOG: all server processes terminated; reinitializing
2022-11-29 15:18:06.823 CST [18795] LOG: database system was interrupted; last known up at 2022-11-29 14:59:17 CST
2022-11-29 15:19:29.897 CST [18795] LOG: database system was not properly shut down; automatic recovery in progress
2022-11-29 15:19:30.063 CST [18795] LOG: redo starts at 0/22935D8
2022-11-29 15:19:30.063 CST [18795] LOG: redo wal segment count 2
2022-11-29 15:19:30.063 CST [18795] LOG: invalid record length at 0/2293608: wanted 24, got 0
2022-11-29 15:19:30.063 CST [18795] LOG: redo done at 0/22935D8
2022-11-29 15:19:30.739 CST [13089] LOG: database system is ready to accept connections
#数据库服务重启后
[root@node2 sys_log]# ps -ef |grep kingbase
kingbase 13089 1 0 14:39 ? 00:00:00 /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/bin/kingbase -D /db/kingbase/v8r6_054/data
kingbase 13090 13089 0 14:39 ? 00:00:00 kingbase: logger
kingbase 17212 12416 0 15:05 pts/0 00:00:00 ./ksql -U system test
kingbase 18953 13089 0 15:19 ? 00:00:00 kingbase: checkpointer
kingbase 18954 13089 0 15:19 ? 00:00:00 kingbase: background writer
kingbase 18955 13089 0 15:19 ? 00:00:00 kingbase: walwriter
kingbase 18957 13089 0 15:19 ? 00:00:00 kingbase: stats collector
kingbase 18958 13089 0 15:19 ? 00:00:00 kingbase: ksh writer
kingbase 18959 13089 0 15:19 ? 00:00:00 kingbase: ksh collector
kingbase 18960 13089 0 15:19 ? 00:00:00 kingbase: kwr collector
---如上所示,kill -3 pid用于终止backend process将给数据库带来极大的风险 。
2)数据库sys_ctl kill QUIT PID
#查询backend process状态信息
test=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | quer
y_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+---------+-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+------------+--------------
-----------------+-------------------------------+-----------------+------------+-------+-------------+--------------+--------------------------+----------------
16385 | prod | 21894 | 10 | system | kingbase_*&+_ | 192.168.8.200 | | 57490 | 2022-11-29 15:36:10.034020+08 | | 2022-11-29 15
:36:13.902728+08 | 2022-11-29 15:36:13.917841+08 | Client | ClientRead | idle | | | select count(*) from t1; | client backend
(1 row)
#执行数据库命令sys_ctl kill QUIT终止backend process
[kingbase@node2 bin]$ ./sys_ctl kill QUIT 21894
#如下所示,除了对应的backend process被终止外,其余的后台辅助进程也被终止并重启
[kingbase@node2 bin]$ ps -ef |grep kingbase
kingbase 13089 1 0 14:39 ? 00:00:00 /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/bin/kingbase -D /db/kingbase/v8r6_054/data
kingbase 13090 13089 0 14:39 ? 00:00:00 kingbase: logger
kingbase 21895 12416 0 15:36 pts/0 00:00:00 ./ksql -U system test
kingbase 22071 13089 0 15:37 ? 00:00:00 kingbase: startup
#查看对应的sys_log日志(backend process被终止导致数据库的辅助进程也将重启)
test=# select * from sys_stat_activity where client_addr='192.168.8.200';
WARNING: terminating connection because of crash of another server process
DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
2022-11-29 15:37:13.828 CST [21894] WARNING: terminating connection because of crash of another server process
2022-11-29 15:37:13.828 CST [21894] DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-11-29 15:37:13.828 CST [21894] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-11-29 15:37:13.829 CST [13089] LOG: server process (PID 21894) exited with exit code 2
2022-11-29 15:37:13.829 CST [13089] DETAIL: Failed process was running: select count(*) from t1;
2022-11-29 15:37:13.829 CST [13089] LOG: terminating any other active server processes
2022-11-29 15:37:13.830 CST [21896] WARNING: terminating connection because of crash of another server process
2022-11-29 15:37:13.830 CST [21896] DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-11-29 15:37:13.830 CST [21896] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-11-29 15:37:13.831 CST [18956] WARNING: terminating connection because of crash of another server process
2022-11-29 15:37:13.831 CST [18956] DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-11-29 15:37:13.831 CST [18956] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-11-29 15:37:13.833 CST [13089] LOG: all server processes terminated; reinitializing
2022-11-29 15:37:13.933 CST [22071] LOG: database system was interrupted; last known up at 2022-11-29 15:19:30 CST
2022-11-29 15:38:29.154 CST [22071] LOG: database system was not properly shut down; automatic recovery in progress
2022-11-29 15:38:29.232 CST [22071] LOG: redo starts at 0/2293680
2022-11-29 15:38:29.232 CST [22071] LOG: redo wal segment count 2
2022-11-29 15:38:29.232 CST [22071] LOG: invalid record length at 0/22936B0: wanted 24, got 0
2022-11-29 15:38:29.232 CST [22071] LOG: redo done at 0/2293680
2022-11-29 15:38:29.767 CST [13089] LOG: database system is ready to accept connection
---如上所示,sys_ctl kill QUIT pid用于终止backend process将给数据库带来极大的风险 。
5、操作系统kill -9 pid
#查询backend process状态信息
prod=# select * from sys_stat_activity where client_addr='192.168.8.200';
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start
| query_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query
| backend_type
-------+---------+-------+----------+---------+-------------------+---------------+-----------------+-------------+-------------------------------+--------------------------
-----+-------------------------------+-------------------------------+-----------------+---------------------+--------+-------------+--------------+-------------------------
---------+------------------------------
16385 | prod | 14114 | 10 | system | kingbase_*&+_ | 192.168.8.200 | | 57472 | 2022-11-29 14:48:06.372451+08 |
| 2022-11-29 14:50:16.618969+08 | 2022-11-29 14:50:16.627572+08 | Client | ClientRead | idle | | | select count(*) from t1;
| client backend
(1 rows)
#操作系统下执行kill -9 pid结束进程
[root@node2 ~]# kill -9 14114
#如下所示,除了对应的backend process被终止外,其余的后台辅助进程也被终止并重启
[root@node2 ~]# ps -ef |grep kingbase
.......
kingbase 13089 1 0 14:39 ? 00:00:00 /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/bin/kingbase -D /db/kingbase/v8r6_054/data
kingbase 13090 13089 0 14:39 ? 00:00:00 kingbase: logger
kingbase 14010 12416 0 14:47 pts/0 00:00:00 ./ksql -U system test
kingbase 15651 13089 0 14:57 ? 00:00:00 kingbase: startup
#查看对应的sys_log日志(backend process被终止导致数据库的辅助进程也将重启)
2022-11-29 14:58:00.093 CST [13089] LOG: server process (PID 14114) was terminated by signal 9: Killed
2022-11-29 14:58:00.093 CST [13089] DETAIL: Failed process was running: select count(*) from t1;
2022-11-29 14:58:00.093 CST [13089] LOG: terminating any other active server processes
2022-11-29 14:58:00.093 CST [14602] WARNING: terminating connection because of crash of another server process
2022-11-29 14:58:00.093 CST [14602] DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-11-29 14:58:00.093 CST [14602] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-11-29 14:58:00.095 CST [13095] WARNING: terminating connection because of crash of another server process
2022-11-29 14:58:00.095 CST [13095] DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-11-29 14:58:00.095 CST [13095] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-11-29 14:58:00.384 CST [13089] LOG: all server processes terminated; reinitializing
2022-11-29 14:58:00.464 CST [15651] LOG: database system was interrupted; last known up at 2022-11-29 14:53:42 CST
2022-11-29 14:59:16.706 CST [15651] LOG: database system was not properly shut down; automatic recovery in progress
2022-11-29 14:59:16.806 CST [15651] LOG: redo starts at 0/2293488
2022-11-29 14:59:16.806 CST [15651] LOG: redo wal segment count 2
2022-11-29 14:59:16.806 CST [15651] LOG: invalid record length at 0/2293560: wanted 24, got 0
2022-11-29 14:59:16.806 CST [15651] LOG: redo done at 0/2293530
2022-11-29 14:59:17.563 CST [13089] LOG: database system is ready to accept connections
---如上所示,kill -9 pid用于终止backend process将给数据库带来极大的风险 。
三、总结
对于KingbaseES数据库中异常的backend proces可以采用手工方式终止,执行时选择的方式要注意对数据库带来的风险:
1)相对安全方式:pg_terminate_backend(pid);kill pid;kill -15 pid;sys_ctl kill TERM pid。
2)不安全方式: kill -3 pid;sys_ctl kill QUIT pid;kill -9 pid。
注意:千万不要kill -9,SIGKILL没有信号处理函数,OS会直接停掉进程;Kingbase父进程发现子进程异常退出,会停掉所有进程,释放共享内存,
再重新申请共享内存,拉起所有进程。效果就等于异常重启,启动时肯定会需要时间redo,可能造成几分钟的停止服务。(除非后果可以接受,否则不要kill -9)。