背景介绍
用户使用迁移工具从Oracle数据库迁移数据到LightDB的过程中发现,某些GBK编码转成UTF8编码后,在插入到LightDB中会报错。以GBK编码AAA1为例,LightDB的GBK和UTF8映射表中不支持AAA1这个GBK编码的转换。不支持的GBK编码都是处于GBK编码的自定义区间。
在上图中,UTF8编码EE8080对应于GBK编码AAA1,插入到LightDB时报错。
所以,在LightDB23.4中,将GBK和UTF8转码失败的字符替换成空格。只有在客户端和服务端编码不一致,且需要GBK和UTF8之间转码的时候,这个功能才生效。
功能示例
客户端编码GBK,服务端编码UTF8
- 建表,插入数据
lightdb@postgres=# \c lt_test
You are now connected to database "lt_test" as user "lightdb".
compatible type: postgresql
lightdb@lt_test=#
lightdb@lt_test=#
lightdb@lt_test=# create table t1(cont text);
CREATE TABLE
lightdb@lt_test=#
lightdb@lt_test=#
lightdb@lt_test=# show server_encoding ;
server_encoding
-----------------
UTF8
(1 row)
lightdb@lt_test=#
lightdb@lt_test=# \encoding gbk
lightdb@lt_test=#
lightdb@lt_test=# \i gbk.test
INSERT 0 1
lightdb@lt_test=#
lightdb@lt_test=# select * from t1;
cont
-------
我 们
(1 row)
lightdb@lt_test=# select oid,datname from pg_database where datname = 'lt_test';
oid | datname
-------+---------
25485 | lt_test
(1 row)
lightdb@lt_test=#
lightdb@lt_test=# select oid,relfilenode from pg_class where relname = 't1';
oid | relfilenode
-------+-------------
25486 | 25486
(1 row)
lightdb@lt_test=# checkpoint ;
CHECKPOINT
lightdb@lt_test=#
lightdb@lt_test=# select oid,relfilenode from pg_class where relname = 't1';
oid | relfilenode
-------+-------------
25486 | 25486
(1 row)
lightdb@lt_test=#
-
gbk.test文件中的内容
-
查看实际表文件中的数据,
e6 88 91 20 e4 bb ac
,e6 88 91
表示我,e4 bb ac
表示们,中间的20
就是转换后的空格
[lightdb@localhost 25485]$ hexdump -C 25486
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00002000
[lightdb@localhost 25485]$ hexdump -C 25486
00000000 00 00 00 00 f0 a9 0c 22 00 00 00 00 1c 00 e0 1f |......."........|
00000010 00 20 04 20 00 00 00 00 e0 9f 40 00 00 00 00 00 |. . ......@.....|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001fe0 07 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00001ff0 01 00 01 00 02 09 18 00 11 e6 88 91 20 e4 bb ac |............ ...|
00002000
[lightdb@localhost 25485]$