背景
目前公司用的是influxdb来存储时序数据,但是influxdb太坑了·,查一天的数据就开始内存猛涨,然后就炸了,查询语句也不适应。因此调研了tdengine,还把influxdb和tdengine做了性能对比。
结果嘛 ,首先tdengine确实比influxdb快,内存也不会暴涨,sql用着也顺手,但是tdengine集群模式才刚起步。很多问题得不到解答,社区不活跃,建了几个微信群,有问题都是在群里猛问,人家也不一定回答。感觉有点小家子气,谁整天关注群消息啊···
总之,最后哪个也没选,凑合着用吧先,倒是服务器扩容了,算是解决了问题。
最近又看到了clickhouse,用来分析用户画像之类的不错,就学一学,探探路。
之所以写这篇文章,实在是遇到了几个新手会遇到的坑,所以来分享下,希望能让你也少走点弯路。同时,我也是新学的,内容比较浅显····
照着官网安装、测试都没有问题,就是坑在了用java连接上。
一、远程连接
我在虚拟机上装的CH,在本机测试,不可避免的,要配置下允许远程连接,很多软件都有这个。
配置软件目录是:/etc/clickhouse-server/config.xml
正确的配置方法如下:
这两个不动,把上面那个:: 注释去掉
<listen_host>::1</listen_host>
<listen_host>127.0.0.1</listen_host>
其实可以看到注释上说,这两个是默认的value,分别对应IPv4 and IPv6的ip。
改成:: ,即可listen所有ip,当然改成装机的ip更好
修改后重启即可
systemctl restart clickhouse-server.service
可以通过查看端口来验证:
修改前:
lsof -i :8123
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
clickhous 2901438 clickhouse 53u IPv6 2962808809 0t0 TCP localhost:8123 (LISTEN)
clickhous 2901438 clickhouse 55u IPv4 2962808813 0t0 TCP localhost:8123 (LISTEN)
可以看到IPv4 and IPv6 都是绑定的localhost
curl 'http://localhost:8123/'
Ok.
curl 'http://192.168.1.100:8123/'
curl: (7) Failed to connect to 192.168.1.100 port 8123: 拒绝连接
通过localhost查询就可以,通过ip就不行。
修改后就可以了
lsof -i :8123
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
clickhous 2923409 clickhouse 58u IPv6 2963328449 0t0 TCP *:8123 (LISTEN)
curl 'http://192.168.1.100:8123/'
Ok.
如果不配置,代码中的错误大概是:
com.clickhouse.client.ClickHouseException: 拒绝连接 (Connection refused)
如果重启失败,比如:
systemctl start clickhouse-server.service
Job for clickhouse-server.service failed because the service did not take the steps required by its unit configuration.
See "systemctl status clickhouse-server.service" and "journalctl -xe" for details.
可以查看启动日志:/var/log/clickhouse-server/clickhouse-server.err.log
我就遇到了错误:
Address already in use: [::]:9000
这是因为我装了hadoop,把9000占用了。把端口改为9001即可,
客户端连接的时候指定一下端口:
clickhouse-client --port 9001
还有的错误是Address already in use: [::]:8123
这就奇怪了,lsof明明没有查到,那是因为上回启动CH失败后,CH并不会就直接失败了,停了。而是一直启动中的状态
systemctl status clickhouse-server.service
● clickhouse-server.service - ClickHouse Server (analytic DBMS for big data)
Loaded: loaded (/usr/lib/systemd/system/clickhouse-server.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: protocol) since Sun 2023-06-25 10:34:54 CST; 22s ago
Main PID: 2903753
所以需要先把CH stop,然后再start
这里说一下CH占用的端口:
<!-- Port for HTTP API. See also 'https_port' for secure connections.
This interface is also used by ODBC and JDBC drivers (DataGrip, Dbeaver, ...)
and by most of web interfaces (embedded UI, Grafana, Redash, ...).
-->
<http_port>8123</http_port>
<!-- Port for interaction by native protocol with:
- clickhouse-client and other native ClickHouse tools (clickhouse-benchmark, clickhouse-copier);
- clickhouse-server with other clickhouse-servers for distributed query processing;
- ClickHouse drivers and applications supporting native protocol
(this protocol is also informally called as "the TCP protocol");
See also 'tcp_port_secure' for secure connections.
-->
<tcp_port>9000</tcp_port>
<!-- Compatibility with MySQL protocol.
ClickHouse will pretend to be MySQL for applications connecting to this port.
-->
<mysql_port>9004</mysql_port>
<!-- Compatibility with PostgreSQL protocol.
ClickHouse will pretend to be PostgreSQL for applications connecting to this port.
-->
<postgresql_port>9005</postgresql_port>
<!-- HTTP API with TLS (HTTPS).
You have to configure certificate to enable this interface.
See the openSSL section below.
-->
<!-- <https_port>8443</https_port> -->
<!-- Native interface with TLS.
You have to configure certificate to enable this interface.
See the openSSL section below.
-->
<!-- <tcp_port_secure>9440</tcp_port_secure> -->
<!-- Native interface wrapped with PROXYv1 protocol
PROXYv1 header sent for every connection.
ClickHouse will extract information about proxy-forwarded client address from the header.
-->
<!-- <tcp_with_proxy_port>9011</tcp_with_proxy_port> -->
<!-- Port for communication between replicas. Used for data exchange.
It provides low-level data access between servers.
This port should not be accessible from untrusted networks.
See also 'interserver_http_credentials'.
Data transferred over connections to this port should not go through untrusted networks.
See also 'interserver_https_port'.
-->
<interserver_http_port>9009</interserver_http_port>
8123是http的端口,9000是tcp的端口
java的客户端是用的http协议,因此要用8123,
clickhouse-client用的是tcp协议,因此默认是9000
mysql,postgresql的端口用于,用mysql等连接CH,例如:
mysql --protocol tcp -u default -P 9004
二、java连接
java连接有很多种api,网上搜一搜,大概有三种
官方的:
https://github.com/ClickHouse/clickhouse-java
<dependency>
<groupId>com.clickhouse</groupId>
<!-- or clickhouse-grpc-client if you prefer gRPC -->
<artifactId>clickhouse-http-client</artifactId>
<version>0.4.6</version>
</dependency>
据说以前是ru.yandex.clickhouse,现在已经不再更新了,目前(2023年6月25日)就是com.clickhouse
还有第三方的比如:
https://github.com/housepower/ClickHouse-Native-JDBC
https://github.com/blynkkk/clickhouse4j
反正都是对CH提供的各种协议接口进行了封装抽象。
这里只说一下官网的api:
- Java client
- JDBC Driver
- R2DBC Driver
java client是基础层,JDBC 和 R2DBC是构建于 client之上的,JDBC是同步的,R2DBC是异步的。性能来说 当然是client更好
先看下官网的client例子:
这个例子其实是有点坑的,且不说下面query的例子 语法有点奇怪的问题,上面那个url其实不需要加jdbc:ch://,加了反而报错
16:13:07.163 [ClickHouseScheduler-1] DEBUG com.clickhouse.client.ClickHouseNode - Failed to probe localhost:0
java.net.ConnectException: connect: Address is invalid on local machine, or port is not valid on remote machine
调试了半天也没找到为啥会连接localhost:0。忽然看到代码中有一行
ClickHouseClient client = ClickHouseClient.newInstance(ClickHouseProtocol.HTTP);
既然是http协议,那会不会是jdbc开头了,它识别不出来?,改回http开头,果然,一下就成功了。
这里贴一下运行成功的代码,官网的connect方法也过时了,改成了read
import com.clickhouse.client.*;
import com.clickhouse.data.ClickHouseFormat;
import com.clickhouse.data.ClickHouseRecord;
public class CHClientTest {
public static void main(String[] args) throws ClickHouseException {
ClickHouseNodes servers = ClickHouseNodes.of(
"http://192.168.1.100:8123/tutorial"
+ "?load_balancing_policy=random&health_check_interval=5000&failover=2");
ClickHouseClient client = ClickHouseClient.newInstance(ClickHouseProtocol.HTTP);
ClickHouseResponse response = client.read(servers) // or client.connect(endpoints)
// you'll have to parse response manually if using a different format
.format(ClickHouseFormat.RowBinaryWithNamesAndTypes)
.query("select * from numbers(:limit)")
.params(1000).executeAndWait();
ClickHouseResponseSummary summary = response.getSummary();
long totalRows = summary.getTotalRowsToRead();
for (ClickHouseRecord r : response.records()) {
int num = r.getValue(0).asInteger();
System.out.println(num);
}
}
}
再看下jdbc的例子:
这里只需要改下getConnection的密码,默认情况下,CH的用户就是default,密码是空字符串
贴一下代码
import com.clickhouse.client.ClickHouseException;
import com.clickhouse.jdbc.ClickHouseDataSource;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.Properties;
public class CHJDBCTest {
public static void main(String[] args) throws ClickHouseException, SQLException {
String url = "jdbc:ch://192.168.1.100:8123/tutorial"; // use http protocol and port 8123 by default
// String url = "jdbc:ch://my-server:8443/system?ssl=true&sslmode=strict&&sslrootcert=/mine.crt";
Properties properties = new Properties();
// properties.setProperty("ssl", "true");
// properties.setProperty("sslmode", "NONE"); // NONE to trust all servers; STRICT for trusted only
ClickHouseDataSource dataSource = new ClickHouseDataSource(url, new Properties());
try (Connection conn = dataSource.getConnection("default", "");
Statement stmt = conn.createStatement()) {
ResultSet rs = stmt.executeQuery("SELECT COUNT(*) FROM tutorial.visits_v1");
while (rs.next()) {
System.out.println(rs.getBigDecimal(1));
}
}
}
}
顺便一提,try后面的小括号,叫做try-with-resources机制,将实现了 java.lang.AutoCloseable 接口的资源定义在 try 后面的小括号中,无论 try 块是正常结束仍是异常结束,这个资源都会被自动关闭。
try 小括号里面的部分称为 try-with-resources 块。编译器自动帮我们生成了finally块,并且在里面调用了资源的close方法。
好了,至此总结完毕,有新的收获再回来更新(flag一定要立)