这篇其实去年就写好了,孰知就在12月31日那一天打进决赛圈,一躺,二过年,三休假,四加班,居然到了三个月以后,才有机会将它发出来…… 一年也就四个季度不是,实在是光阴荏苒,岁月蹉跎。
本来也打算给这个系列换个名字。从学习Centos开始,结果写着写着已经完全不是Centos那些东西了。不过考虑到换名字也很麻烦,不如保持初心,继续Centos的好。
Click House是目前比较热的数据库选择,这一节我们尝试搭建其使用环境,并基于VSCODE搭建编程环境。ClickHouse是俄罗斯商业公司开发的一款分布式列数据库软件,独立于Hadoop体系,别树一格地提供了高性能的数据库服务支撑。Click House的官网是有详细的中文支持文档的,可参考什么是ClickHouse? | ClickHouse Docs
一、系统要求
根据官方的描述,ClickHouse支持的CPU架构有x86_64、AArch64和PowerPC64LE,操作系统包括Linux、FreeBSD和Mac OS X。之所以要提及CPU架构,是因为ClickHouse在优化中使用了特定的CPU指令集。这也是网络安全工具的底层支撑平台常用的一种策略,比如DPDK和HyperScan等都是如此。具体在Intel的x86_64架构上,ClickHouse主要依靠SSE 4.2指令集来进行优化。
SSE指令集扩展于早期Pentium III的MMX指令集。MMX主要针对多媒体的大量整形计算需求提供了寄存器级别的并行计算能力,可以借用浮点计算的寄存器空间,提供一条指令计算多条数据(SIMD:Single Instruction Multi-Data)的能力。随着大数据的发展,各种网络数据处理和大数据应用对此类运算加速的需求快速增长,最终Intel推出了专门的SSE指令集,从而不用争抢浮点计算的资源,专门提供SIMD的计算资源。SSE,实际也就是Internet Streaming SIMD Extensions的简称。
确认系统是否支持SSE指令集,可以在系统之后,从/proc/cpuinfo获取系统运行时的CPU相关信息的描述,从中搜索“SSE4_2”字样:
也可以使用官网上的检查方法。命令中,&&表示前条命令执行成功后执行,||表示前条命令执行不成功时执行;-q 代表静默执行,否则会把上图的执行结果也打印出来。
[root@pig sysconfig]# grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
SSE 4.2 supported
[root@pig sysconfig]#
对于AArch64和PowerPC64LE架构,因为不支持SSE指令集,所以需要从ClickHouse的源代码进行编译的方式安装部署。
二、VMware上单点部署ClickHouse
按照Hadoop的上手流程,我们首先在单节点上部署ClickHouse,并在其上关联VSCODE构建编程环境,然后再考虑集群部署的问题。
(一)添加安装源
官方对Redhat/CentOS和Debian/Ubuntu类的Linux均提供了安装源
1. Redhat/CentOS添加安装源
在/etc/yum.repo.d/目录下添加镜像源,可以完全手工完成(前面经常搞,此处不赘述),也可以基于yum-utils工具来“正规化”达成:
[root@pig yum.repos.d]# yum install yum-utils
上次元数据过期检查:1:24:34 前,执行于 2022年12月19日 星期一 21时12分06秒。
软件包 yum-utils-4.0.21-11.el8.noarch 已安装。
依赖关系解决。
============================================================================================================================================
软件包 架构 版本 仓库 大小
============================================================================================================================================
升级:
dnf-plugins-core noarch 4.0.21-16.el8 baseos 75 k
python3-dnf-plugins-core noarch 4.0.21-16.el8 baseos 258 k
yum-utils noarch 4.0.21-16.el8 baseos 74 k
事务概要
============================================================================================================================================
升级 3 软件包
总下载:407 k
确定吗?[y/N]: y
下载软件包:
(1/3): yum-utils-4.0.21-16.el8.noarch.rpm 54 kB/s | 74 kB 00:01
(2/3): dnf-plugins-core-4.0.21-16.el8.noarch.rpm 44 kB/s | 75 kB 00:01
(3/3): python3-dnf-plugins-core-4.0.21-16.el8.noarch.rpm 60 kB/s | 258 kB 00:04
--------------------------------------------------------------------------------------------------------------------------------------------
总计 84 kB/s | 407 kB 00:04
CentOS Stream 8 - BaseOS 1.6 MB/s | 1.6 kB 00:00
导入 GPG 公钥 0x8483C65D:
Userid: "CentOS (CentOS Official Signing Key) <security@centos.org>"
指纹: 99DB 70FA E1D7 CE22 7FB6 4882 05B5 55B3 8483 C65D
来自: /etc/pki/rpm-gpg/RPM-GPG-KEY-centosofficial
确定吗?[y/N]: y
导入公钥成功
运行事务检查
事务检查成功。
运行事务测试
事务测试成功。
运行事务
准备中 : 1/1
升级 : python3-dnf-plugins-core-4.0.21-16.el8.noarch 1/6
……………………
运行脚本: python3-dnf-plugins-core-4.0.21-11.el8.noarch 6/6
验证 : dnf-plugins-core-4.0.21-16.el8.noarch 1/6
………………
验证 : yum-utils-4.0.21-11.el8.noarch 6/6
已升级:
dnf-plugins-core-4.0.21-16.el8.noarch python3-dnf-plugins-core-4.0.21-16.el8.noarch yum-utils-4.0.21-16.el8.noarch
完毕!
[root@pig yum.repos.d]#
然后使用yum-config-manager命令添加repo库:
[root@pig yum.repos.d]# ls
CentOS-Stream-AppStream.repo CentOS-Stream-HighAvailability.repo CentOS-Stream-ResilientStorage.repo epel-testing-modular.repo
CentOS-Stream-BaseOS.repo CentOS-Stream-Media.repo CentOS-Stream-Sources.repo epel-testing.repo
CentOS-Stream-Debuginfo.repo CentOS-Stream-NFV.repo epel-modular.repo old
CentOS-Stream-Extras-common.repo CentOS-Stream-PowerTools.repo epel-playground.repo
CentOS-Stream-Extras.repo CentOS-Stream-RealTime.repo epel.repo
[root@pig yum.repos.d]# yum-config-manager --add-repo https://packages.clickhouse.com/rpm/clickhouse.repo
添加仓库自:https://packages.clickhouse.com/rpm/clickhouse.repo
[root@pig yum.repos.d]# ls
CentOS-Stream-AppStream.repo CentOS-Stream-HighAvailability.repo CentOS-Stream-ResilientStorage.repo epel.repo
CentOS-Stream-BaseOS.repo CentOS-Stream-Media.repo CentOS-Stream-Sources.repo epel-testing-modular.repo
CentOS-Stream-Debuginfo.repo CentOS-Stream-NFV.repo clickhouse.repo epel-testing.repo
CentOS-Stream-Extras-common.repo CentOS-Stream-PowerTools.repo epel-modular.repo old
CentOS-Stream-Extras.repo CentOS-Stream-RealTime.repo epel-playground.repo
[root@pig yum.repos.d]#
可以看到,执行完毕后,yum.repo.d目录下出现了clickhouse.repo文件,其中记录了repo的baseurl和密钥url地址。
[root@pig yum.repos.d]# cat clickhouse.repo
[clickhouse-stable]
name=ClickHouse - Stable Repository
baseurl=https://packages.clickhouse.com/rpm/stable/
gpgkey=https://packages.clickhouse.com/rpm/stable/repodata/repomd.xml.key
gpgcheck=0
repo_gpgcheck=1
enabled=1
[clickhouse-lts]
name=ClickHouse - LTS Repository
baseurl=https://packages.clickhouse.com/rpm/lts/
gpgkey=https://packages.clickhouse.com/rpm/lts/repodata/repomd.xml.key
gpgcheck=0
repo_gpgcheck=1
enabled=0
[root@pig yum.repos.d]#
2. Debian/Ubuntu添加安装源
(1)添加镜像源准备
如CENTOS上的网络安全工具(十六)容器特色的Linux操作_lhyzws的博客-CSDN博客所提到的,在执行官方给出的命令之前,最好先执行一下apt update。否则可能导致apt-transport-https ca-certificates找不到安装包。
其中,apt-transport-https ca-certificates我们已经比较熟悉,是用来协助访问国内镜像源的组件,dirmngr是用来管理和下载openpgp和x509证书的组件(在前文中,发挥这一作用的貌似是gnupg组件)。本质上,应该都是用来管理和使用镜像源的。
root@5ebe7112320a:/# apt update
Ign:1 http://archive.ubuntu.com/ubuntu jammy InRelease
Ign:2 http://security.ubuntu.com/ubuntu jammy-security InRelease
Get:2 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]
………………
Get:18 http://archive.ubuntu.com/ubuntu jammy/multiverse amd64 Packages [266 kB]
Get:16 http://archive.ubuntu.com/ubuntu jammy/universe amd64 Packages [17.5 MB]
Fetched 24.9 MB in 7min 0s (59.4 kB/s)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
2 packages can be upgraded. Run 'apt list --upgradable' to see them.
如果考虑更改镜像源,则这几个支撑镜像源管理的包还是需要在更换之前安装的,毕竟国内的一些镜像源依赖它们才能正确加载
root@5ebe7112320a:/# apt install -y apt-transport-https ca-certificates dirmngr
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
gnupg gnupg-l10n gnupg-utils gpg gpg-agent gpg-wks-client gpg-wks-server gpgconf gpgsm libassuan0 libksba8
libldap-2.5-0 libldap-common libnpth0 libreadline8 libsasl2-2 libsasl2-modules libsasl2-modules-db
libsqlite3-0 openssl pinentry-curses readline-common
Suggested packages:
dbus-user-session libpam-systemd pinentry-gnome3 tor parcimonie xloadimage scdaemon
libsasl2-modules-gssapi-mit | libsasl2-modules-gssapi-heimdal libsasl2-modules-ldap libsasl2-modules-otp
libsasl2-modules-sql pinentry-doc readline-doc
The following NEW packages will be installed:
apt-transport-https ca-certificates dirmngr gnupg gnupg-l10n gnupg-utils gpg gpg-agent gpg-wks-client
gpg-wks-server gpgconf gpgsm libassuan0 libksba8 libldap-2.5-0 libldap-common libnpth0 libreadline8
libsasl2-2 libsasl2-modules libsasl2-modules-db libsqlite3-0 openssl pinentry-curses readline-common
0 upgraded, 25 newly installed, 0 to remove and 2 not upgraded.
Need to get 4817 kB of archives.
After this operation, 11.9 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 openssl amd64 3.0.2-0ubuntu1.7 [1183 kB]
……………………
Get:8 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 gpgconf amd64 2.2.27-3ubuntu2.1 [94.2 kB]
debconf: unable to initialize frontend: Readline
……………………
Updating certificates in /etc/ssl/certs...
124 added, 0 removed; done.
Setting up gpgconf (2.2.27-3ubuntu2.1) ...
Setting up gpg (2.2.27-3ubuntu2.1) ...
……………………
Setting up gpg-wks-client (2.2.27-3ubuntu2.1) ...
Setting up gnupg (2.2.27-3ubuntu2.1) ...
Processing triggers for libc-bin (2.35-0ubuntu3.1) ...
Processing triggers for ca-certificates (20211016ubuntu0.22.04.1) ...
Updating certificates in /etc/ssl/certs...
0 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.
root@5ebe7112320a:/#
镜像源更改请参考CENTOS上的网络安全工具(十六)容器特色的Linux操作_lhyzws的博客-CSDN博客,不赘述。
(2)添加镜像源
首先是向/etc/apt/source.list.d目录下增加sourcelist文件。一般来说系统镜像源在/etc/apt/sources.list文件中描述,应用镜像源放在source.list.d目录下
root@5ebe7112320a:/etc/apt/sources.list.d# echo "deb https://packages.clickhouse.com/deb stable main" > clickhouse.list
root@5ebe7112320a:/etc/apt/sources.list.d# cat clickhouse.list
deb https://packages.clickhouse.com/deb stable main
root@5ebe7112320a:/etc/apt/sources.list.d#
其次是需要安装该镜像源的密钥:
root@5ebe7112320a:/etc/apt/sources.list.d# apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 8919F6BD2B48D754
Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).
Executing: /tmp/apt-key-gpghome.io9GX0It0w/gpg.1.sh --keyserver hkp://keyserver.ubuntu.com:80 --recv 8919F6BD2B48D754
gpg: key 8919F6BD2B48D754: public key "ClickHouse Inc. Repositories Key <packages@clickhouse.com>" imported
gpg: Total number processed: 1
gpg: imported: 1
root@5ebe7112320a:/etc/apt/sources.list.d#
(二)安装clickhouse服务器与客户端
1. Redhat/CentOS
[root@pig yum.repos.d]# yum install -y clickhouse-server clickhouse-client
ClickHouse - Stable Repository 18 B/s | 833 B 00:45
ClickHouse - Stable Repository 1.2 kB/s | 5.7 kB 00:04
导入 GPG 公钥 0x2B48D754:
Userid: "ClickHouse Inc. Repositories Key <packages@clickhouse.com>"
指纹: 3A9E A119 3A97 B548 BE14 57D4 8919 F6BD 2B48 D754
来自: https://packages.clickhouse.com/rpm/stable/repodata/repomd.xml.key
ClickHouse - Stable Repository 4.2 kB/s | 86 kB 00:20
依赖关系解决。
============================================================================================================================================
软件包 架构 版本 仓库 大小
============================================================================================================================================
安装:
clickhouse-client x86_64 22.12.1.1752-1 clickhouse-stable 119 k
clickhouse-server x86_64 22.12.1.1752-1 clickhouse-stable 145 k
安装依赖关系:
clickhouse-common-static x86_64 22.12.1.1752-1 clickhouse-stable 252 M
事务概要
============================================================================================================================================
安装 3 软件包
总下载:253 M
安装大小:721 M
下载软件包:
(1/3): clickhouse-client-22.12.1.1752.x86_64.rpm 51 kB/s | 119 kB 00:02
(2/3): clickhouse-server-22.12.1.1752.x86_64.rpm 58 kB/s | 145 kB 00:02
[MIRROR] clickhouse-common-static-22.12.1.1752.x86_64.rpm: Curl error (28): Timeout was reached for https://packages.clickhouse.com/rpm/stable/clickhouse-common-static-22.12.1.1752.x86_64.rpm [Operation too slow. Less than 1000 bytes/sec transferred the last 30 seconds]
………………
chown -R clickhouse-bridge:clickhouse-bridge '/usr/bin/clickhouse-odbc-bridge'
chown -R clickhouse-bridge:clickhouse-bridge '/usr/bin/clickhouse-library-bridge'
Password for default user is empty string. See /etc/clickhouse-server/users.xml and /etc/clickhouse-server/users.d to change it.
Setting capabilities for clickhouse binary. This is optional.
chown -R clickhouse:clickhouse '/etc/clickhouse-server'
ClickHouse has been successfully installed.
Start clickhouse-server with:
sudo clickhouse start
Start clickhouse-client with:
clickhouse-client
这里默认default user的口令是空字符串,若需要设置口令,需要后续更改配置文件。
2. Debian/Ubuntu
和yum不同,更换镜像源后,还是需要手动apt update一下。
root@5ebe7112320a:/etc/apt/sources.list.d# apt update
Hit:1 https://mirrors.tuna.tsinghua.edu.cn/ubuntu jammy InRelease
Hit:2 https://mirrors.tuna.tsinghua.edu.cn/ubuntu jammy-updates InRelease
Hit:3 https://mirrors.tuna.tsinghua.edu.cn/ubuntu jammy-backports InRelease
Hit:4 https://mirrors.tuna.tsinghua.edu.cn/ubuntu jammy-security InRelease
Ign:5 https://packages.clickhouse.com/deb stable InRelease
Get:5 https://packages.clickhouse.com/deb stable InRelease [2484 B]
Get:6 https://packages.clickhouse.com/deb stable/main amd64 Packages [39.5 kB]
Fetched 42.0 kB in 17s (2405 B/s)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
2 packages can be upgraded. Run 'apt list --upgradable' to see them.
W: https://packages.clickhouse.com/deb/dists/stable/InRelease: Key is stored in legacy trusted.gpg keyring (/etc/apt/trusted.gpg), see the DEPRECATION section in apt-key(8) for details.
然后再启动安装才不至于找不到安装包:
root@5ebe7112320a:/etc/apt/sources.list.d# apt install -y clickhouse-server clickhouse-client
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
clickhouse-common-static libcap2-bin libpam-cap
Suggested packages:
clickhouse-common-static-dbg
The following NEW packages will be installed:
clickhouse-client clickhouse-common-static clickhouse-server libcap2-bin libpam-cap
0 upgraded, 5 newly installed, 0 to remove and 2 not upgraded.
Need to get 265 MB of archives.
After this operation, 756 MB of additional disk space will be used.
Get:1 https://mirrors.tuna.tsinghua.edu.cn/ubuntu jammy/main amd64 libcap2-bin amd64 1:2.44-1build3 [26.0 kB]
Get:2 https://mirrors.tuna.tsinghua.edu.cn/ubuntu jammy/main amd64 libpam-cap amd64 1:2.44-1build3 [7932 B]
Get:3 https://packages.clickhouse.com/deb stable/main amd64 clickhouse-common-static amd64 22.12.1.1752 [265 MB]
9% [3 clickhouse-common-static 4215 kB/265 MB 2%] 282 kB/s 15min 25s
……………………
与yum不同的是,安装过程中会要求输入default用户的口令,否则不会继续向下执行。如果不需要口令,回车即可。
不论哪个版本,安装结束后,再/etc/init.d目录下,都能够看到clickhouse-server的目录。
[root@pig init.d]# pwd
/etc/init.d
[root@pig init.d]# ls
clickhouse-server functions README
[root@pig init.d]#
(三)测试
1. 启动服务
(1)systemctl启动
在可以使用systemctl的情况下,直接使用systemctl进行管理维护:
[root@pig init.d]# systemctl start clickhouse-server
[root@pig init.d]# systemctl status clickhouse-server.service
● clickhouse-server.service - ClickHouse Server (analytic DBMS for big data)
Loaded: loaded (/usr/lib/systemd/system/clickhouse-server.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2022-12-20 02:02:14 EST; 17s ago
Main PID: 38925 (clickhouse-serv)
Tasks: 219 (limit: 11047)
Memory: 187.5M
CGroup: /system.slice/clickhouse-server.service
└─38925 /usr/bin/clickhouse-server --config=/etc/clickhouse-server/config.xml --pid-file=/run/clickhouse-server/clickhouse-serve>
12月 20 02:02:10 pig systemd[1]: Starting ClickHouse Server (analytic DBMS for big data)...
12月 20 02:02:10 pig clickhouse-server[38925]: Processing configuration file '/etc/clickhouse-server/config.xml'.
12月 20 02:02:10 pig clickhouse-server[38925]: Logging trace to /var/log/clickhouse-server/clickhouse-server.log
12月 20 02:02:10 pig clickhouse-server[38925]: Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log
12月 20 02:02:13 pig clickhouse-server[38925]: Processing configuration file '/etc/clickhouse-server/config.xml'.
12月 20 02:02:13 pig clickhouse-server[38925]: Saved preprocessed configuration to '/var/lib/clickhouse/preprocessed_configs/config.xml'.
12月 20 02:02:13 pig clickhouse-server[38925]: Processing configuration file '/etc/clickhouse-server/users.xml'.
12月 20 02:02:13 pig clickhouse-server[38925]: Saved preprocessed configuration to '/var/lib/clickhouse/preprocessed_configs/users.xml'.
12月 20 02:02:14 pig systemd[1]: Started ClickHouse Server (analytic DBMS for big data).
[root@pig init.d]# systemctl enable clickhouse-server
Synchronizing state of clickhouse-server.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable clickhouse-server
[root@pig init.d]#
(2)service命令启动
使用service命令实际和systemctl是一样的。在不能使用systemctl的场合,大致也无法使用service。
[root@pig init.d]# service clickhouse-server start
[root@pig init.d]# service clickhouse-server status
/var/run/clickhouse-server/clickhouse-server.pid file exists and contains pid = 38925.
The process with pid = 38925 is running.
(3)命令行后台启动
在没有systemctl和service的情况下,可以使用官方推荐的命令行后台启动方式
sudo /etc/init.d/clickhouse-server start
查看/etc/init.d/clickhouse-server的shell代码。实际上,当参数为start时,执行命令为:
case "$1" in
start)
service_or_func start && enable_cron
;;
service_or_func函数的工作,实际是如果systemctl可用,直接调用systemctl start clickhouse-server,若不可用,调用start()函数:
service_or_func()
{
if [ -x "/bin/systemctl" ] && [ -f /etc/systemd/system/clickhouse-server.service ] && [ -d /run/systemd/system ]; then
systemctl $1 $PROGRAM
else
$1
fi
}
start函数:
start()
{
${CLICKHOUSE_GENERIC_PROGRAM} start --user "${CLICKHOUSE_USER}" --pid-path "${CLICKHOUSE_PIDDIR}" --config-path "${CLICKHOUSE_CONFDIR}" --binary-path "${CLICKHOUSE_BINDIR}"
}
根据代码中变量定义,该语句实际为:
clickhouse start --user "clickhouse" --pid-path "/var/run/clickhouse-server" –config-path "/etc /clickhouse-server" –binary-path "/usr/bin"
之所以要拆到这一步,是因为如果直接按照官方的指南,需要sudo执行/etc/init.d/clickhouse-server start的话,由于我的系统没有sudo,直接执行会出现错误。
root@5ebe7112320a:/etc/init.d# /etc/init.d /clickhouse-server start
chown -R clickhouse: '/var/run/clickhouse-server/'
Will run sudo -u 'clickhouse' /usr/bin/clickhouse-server --config-file /etc/clickhouse-server/config.xml --pid-file /var/run/clickhouse-server/clickhouse-server.pid --daemon
/bin/sh: 1: sudo: not found
Code: 302. DB::Exception: Child process was exited with return code 127. (CHILD_WAS_NOT_EXITED_NORMALLY) (version 22.12.1.1752 (official build))
从脚本分析中容易发现,clickhouse实际上要求程序以clickhouse这个用户启动。
比如看一看clickhouse相关的文件夹,除了client以外, owner实际都是clickhouse,这可能是其必须以clickhouse启动的原因吧。
root@5ebe7112320a:/etc# ls -l
………………
drwxr-xr-x 2 root root 4096 Dec 20 04:46 clickhouse-client
drwx------ 4 clickhouse clickhouse 4096 Dec 20 04:11 clickhouse-server
………………
直接执行clickhouse start –user "clickhouse" …… 一样不好使,报错信息一样。虽然clickhouse是编译后的可执行文件,但从报错信息看,其内部似乎是仍然用sudo调用了同在bin下的可执行文件clickhouse-server(注意不要和前面的sh文件弄混)。
所以,还是老实将sudo安装上吧:
root@72afdf9527a8:/usr/bin# apt install sudo
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
sudo
0 upgraded, 1 newly installed, 0 to remove and 2 not upgraded.
Need to get 820 kB of archives.
After this operation, 2564 kB of additional disk space will be used.
Get:1 https://mirrors.tuna.tsinghua.edu.cn/ubuntu jammy-updates/main amd64 sudo amd64 1.9.9-1ubuntu2.1 [820 kB]
Fetched 820 kB in 8s (100 kB/s)
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package sudo.
(Reading database ... 5264 files and directories currently installed.)
Preparing to unpack .../sudo_1.9.9-1ubuntu2.1_amd64.deb ...
Unpacking sudo (1.9.9-1ubuntu2.1) ...
Setting up sudo (1.9.9-1ubuntu2.1) ...
Processing triggers for libc-bin (2.35-0ubuntu3.1) ...
root@72afdf9527a8:/usr/bin#
果然,安装以后,不论是clickhouse直接执行,还是通过clickhouse-server脚本执行都可以了:
root@72afdf9527a8:/usr/bin# clickhouse start --user "clickhouse" --pid-path "/var/run/clickhouse-server" --config-path "/etc/clickhouse-server" --binary-path "/usr/bin"
chown -R clickhouse: '/var/run/clickhouse-server/'
Will run sudo -u 'clickhouse' /usr/bin/clickhouse-server --config-file /etc/clickhouse-server/config.xml --pid-file /var/run/clickhouse-server/clickhouse-server.pid --daemon
Waiting for server to start
Waiting for server to start
Server started
root@72afdf9527a8:/usr/bin#
(4)控制台启动
官方给出的控制台启动方法,实际就是systemctl/service的服务启动方法的手动形式。
[root@pig system]# pwd
/usr/lib/systemd/system
[root@pig system]# cat clickhouse-server.service
[Unit]
Description=ClickHouse Server (analytic DBMS for big data)
Requires=network-online.target
# NOTE: that After/Wants=time-sync.target is not enough, you need to ensure
# that the time was adjusted already, if you use systemd-timesyncd you are
# safe, but if you use ntp or some other daemon, you should configure it
# additionaly.
After=time-sync.target network-online.target
Wants=time-sync.target
[Service]
Type=notify
# Switching off watchdog is very important for sd_notify to work correctly.
Environment=CLICKHOUSE_WATCHDOG_ENABLE=0
User=clickhouse
Group=clickhouse
Restart=always
RestartSec=30
RuntimeDirectory=clickhouse-server
ExecStart=/usr/bin/clickhouse-server --config=/etc/clickhouse-server/config.xml --pid-file=/run/clickhouse-server/clickhouse-server.pid
# Minus means that this file is optional.
EnvironmentFile=-/etc/default/clickhouse
LimitCORE=infinity
LimitNOFILE=500000
CapabilityBoundingSet=CAP_NET_ADMIN CAP_IPC_LOCK CAP_SYS_NICE CAP_NET_BIND_SERVICE
[Install]
# ClickHouse should not start from the rescue shell (rescue.target).
WantedBy=multi-user.target
[root@pig system]#
使用控制台启动的命令则是:
[root@pig system]# clickhouse-server --config-file=/etc/clickhouse-server/config.xml
需要注意的是,直接输入clickhouse-server,执行的不是./clickhouse-server的shell脚本,而是/usr/bin下的可执行文件。
2. 客户端连接
简单测试情况下,只需要在本机启动clickhouse-client,执行SELECT 1。如果执行情况如下,表示安装成功:
root@72afdf9527a8:/etc/init.d# clickhouse-client
ClickHouse client version 22.12.1.1752 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 22.12.1 revision 54461.
Warnings:
* Linux transparent hugepages are set to "always". Check /sys/kernel/mm/transparent_hugepage/enabled
72afdf9527a8 :) SELECT 1
SELECT 1
Query id: f946c4ec-7e21-425a-8574-67e2e86288af
┌─1─┐
│ 1 │
└───┘
1 row in set. Elapsed: 0.001 sec.
72afdf9527a8 :) quit
三、VSCODE上的ClickHouse编程环境
个人感觉,除了性能的优化意外,clickhouse很成功的一点,是和mysql太像了,不仅是安装部署的流程还是使用的感觉,相对于HBASE要友好很多。这种无门槛效应,应当能够吸引到相当多的用户。
所以,和mysql的VSCODE环境部署一样,clickhouse的VSCODE环境部署别无二致,甚至sqltools工具都能够通用,唯一需要添加的,只是一个clickhouse的驱动插件而已。
(一)本地部署
首先,我们尝试在本地部署VSCODE的clickhouse使用环境。本地部署,前提自然是clickhouse-server在本地启动,clickhouse-client客户端要首先能用。
我们的本地环境是安装在Vmware中的一台CentOS-stream-8虚拟机,如前所述过程,安装好了clickhouse环境。并进一步如CENTOS上的网络安全工具(五)CODE来打个酱油_lhyzws的博客-CSDN博客所述安装了VSCODE。
下一步,和CENTOS上的网络安全工具(七)MYSQL也不能少_lhyzws的博客-CSDN博客所述相似,安装SQLTools插件:
安装完成后,左侧应该会多出一个数据库图标:
在驱动安装部分,和mysql不同的在于,是要搜索安装clickhouse的驱动插件:
安装完成后,点击左侧SQLTools的图标栏
点击“+”图标或者add new connection按钮
选择数据库驱动 clickhouse
如图配置连接参数。比如testCK为自己取的名字,后期可以用在标识active connection上面,localhost不能填IP地址,IP地址代表远程连接,不是本地连接了,此时一定连不上。8123是默认端口号,不动。数据库和用户名都可以用default,一般来说这两者都存在。然后点击测试连接,成功的话会有如图的绿色小框跳出,否则为红色错误信息框。
若正确,点击“save connection”即可。
插件会给出连接参数的json配置。然后就可以“CONNECT NOW”了。
创建一个sql查询,并且Run on active connection,即可得到如下结果:
(二)远程部署
如果不希望在VMware的CentOS里部署VSCODE,也可以使用windows上的VSCODE远程登录。
然而,如在“本地部署”中所述。远程登录,必须以IP地址来指明所使用的服务器,而默认情况下,clickhouse是只允许localhost登录的。
1. 打开远程登录权限
clickhouse的远程登陆权限在/etc/clickhouse-server/目录下的config.xml文件中配置
[root@pig clickhouse-server]# pwd
/etc/clickhouse-server
[root@pig clickhouse-server]# ls
config.d config.xml users.d users.xml
[root@pig clickhouse-server]#vim config.xml
搜索listen字符串,可以看到如下图描述
其中,被注释的<listen_host>::</listen_host>用于在IPv4/IPv6都支持的情况下放开远程登录权限;<listen_host>0.0.0.0</listen>用于放开仅支持IPv4情况下的远程登录权限。要放开的话取消注释即可。
由于config.xml是只读文件,使用vim修改后需要使用:w!命令写入。
2. 配置VSCODE
(1)使用SQLTools插件
打开远程权限以后的操作,就和本地登录没有两样了——安装SQL tools和clickhouse drvier插件,配置连接参数——在这里就一定要填写clickhouse-server所在的主机IP地址了:
(2)使用MySQL Database Manager
当然也可以选择其他的插件。例如我们在折腾MySQL的时候用过的MySQL Database Manager工具,由weijan Chen提供。只不过这个工具有越来越多的功能开始收费了,所以后来我就不怎么用了。
同样,添加连接的时候,会出现其所支持的所有引擎——果然收费的就是不一样,clickhouse也在其中,点选以后下面的连接选项就会配置成默认的样式。同样配置就好,用户default,密码为空。
四、Docker上的单节点部署
(一)Dockers上的Server部署
如第二章内容所述,我们在Ubuntu:latest(22.04 jammy)版本的镜像中,按照上述的过程,安装了clickhouse-server和clickhouse-client,然后将其导出成为clickhouse:pig镜像。
再次载入镜像时,需要绑定监听端口。
C:\Users\lhyzw>docker run -it --name pig -p 9000:9000 -p 8123:8123 clickhouse:pig bash
下面我们按失败的方式先来一遍:
按照虚拟机本地模式:使用命令行方式启动(当然,需要事先安装sudo命令)
root@02aea9ee1a0e:/etc/clickhouse-server# sudo /etc/init.d/clickhouse-server start
/var/run/clickhouse-server/clickhouse-server.pid file exists and contains pid = 67.
Will run sudo -u 'clickhouse' /usr/bin/clickhouse-server --config-file /etc/clickhouse-server/config.xml --pid-file /var/run/clickhouse-server/clickhouse-server.pid --daemon
Waiting for server to start
Server started
结果是服务正常启动,在容器内使用clickhouse-client也可以正常访问,如果上文一样。但是没有查看到端口打开:
root@c2bb84e9c057:/etc/clickhouse-server# netstat -ltpn
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
root@c2bb84e9c057:/etc/clickhouse-server#
自然从外部无法连接。
按照虚拟机远程模式:更改/etc/clickhouse-server/config.xml
sed -i 's@<!-- <listen_host>::</listen_host> -->@<listen_host>::</listen_host>@g' config.xml
sed -i 's@<!-- <listen_host>0.0.0.0</listen_host> -->@<listen_host>0.0.0.0</listen_host>@g' config.xml
然后再sudo启动,结果是clickhouse-client无法以localhost:9000方式进行本地连接;监听端口仍然没启动,外面也无法远程连接。
正确方式:
具体原因一头雾水……,不过幸好有clickhouse官方镜像可以参考。经过一番比对,发现只需要改动一个地方,即/etc/clickhouse-server/config.d文件夹下,构造一个名为docker_related_config.xml的文件即可,内容如下所示:
root@f5f12d60e478:/etc/clickhouse-server/config.d# cat docker_related_config.xml
<clickhouse>
<!-- Listen wildcard address to allow accepting connections from other containers and host network. -->
<listen_host>::</listen_host>
<listen_host>0.0.0.0</listen_host>
<listen_try>1</listen_try>
<!--
<logger>
<console>1</console>
</logger>
-->
</clickhouse>
root@f5f12d60e478:/etc/clickhouse-server/config.d#
改动的内容和在/etc/clickhouse-server.xml文件中的改动基本一致,除了listen_try这个选项。官方的注释是这么解释的:当值为0时,代表“Don't exit if IPv6 or IPv4 networks are unavailable while trying to listen”。这个改完后,重新sudo启动,顺滑得一匹:
C:\Users\lhyzw>docker run -it -p 8123:8123 -p9000:9000 --name pig1 clickhouse:pig bash
root@f5f12d60e478:/# sudo /etc/init.d/clickhouse-server start
/var/run/clickhouse-server/clickhouse-server.pid file exists and contains pid = 67.
Will run sudo -u 'clickhouse' /usr/bin/clickhouse-server --config-file /etc/clickhouse-server/config.xml --pid-file /var/run/clickhouse-server/clickhouse-server.pid --daemon
Waiting for server to start
Server started
root@f5f12d60e478:/# netstat -ltpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:8123 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:9009 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:9004 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:9005 0.0.0.0:* LISTEN -
原因不知。也许因为docker的网络驱动问题。猜测有可能clickhouse是绕开了传统网络驱动栈,所以需要对docker做一些专门的处理……。总之,在config.d下添加这个文件,把listen_try设置为1,倒是发挥了作用。
如此配置后,在VSCODE中,使用localhost、127.0.0.1和WSL子系统的IP地址,都可以连接成功,相当给力。
(二)直接使用官方镜像
1.加载镜像
直接使用官方镜像的情况比较简单,参考clickhouse/clickhouse-server - Docker Image | Docker Hub即可。以官方镜像clickhouse/clickhouse-server为例:
启动clickhouse-server,可根据官方对镜像使用方法的介绍,不过端口我原样映射了,太多了记不住。另外,--ulimit nofile=262144:262144是用于Linux内核优化的设置,控制打开的文件数量。Linux内核默认对每个用户设置最大打开文件数为1024,在clickhouse的场景内,显然是不够的。当然,只测试连接的情况,不设置这个参数也没有问题。
C:\Users\lhyzw>docker run -d -p 8123:8123 -p 9000:9000 --name pig --ulimit nofile=262144:262144 clickhouse/clickhouse-server
c617624c8629919a135f739a35e74c1c376c4214db74f0099a5837072e6ce6f0
C:\Users\lhyzw>docker exec -it pig bash
root@c617624c8629:/# clickhouse-client
ClickHouse client version 22.12.1.1752 (official build).
c617624c8629 :) quit
Bye.
启动后,可以直接挂进容器,先本地测试一下是否能够连接……,一般是绝对可以的。
然后,如上面我们自己弄的容器一样,VSCODE连接之,一般也是没有问题的。
2. 指定数据存储位置
clickhouse有两个位置是用户比较关心的。一个是/var/lib/clickhouse,是clickhouse用来存放数据的地方;另一个是/var/log/clickhouse-server,是用来存放日志的地方。如果需要明确指定这两个目录,可以用-v 参数进行映射。
另外,还有3个涉及到容器配置的文件或目录可能需要统一映射,分别是:
/etc/clickhouse-server/config.d/*.xml,配置相关,这个我们是cp进去的:)
/etc/clickhouse-server/user.d/*.xml,用户相关,用户口令可以在这里添加;
/docker-entrypoint-initdb.d/ 用于存放数据库初始化脚本的地方。