文章目录
- MySQL
- TiDB
- Oracle
- 结论
我们组在团队内维护了一套TiDB,有时候会有其他同事来请教一些问题,当然遇到比较复杂的问题,我也会直接抛给DBA。今天有个同事来问了一下TiDB的char字段查询是否需要补空格。在我的印象中,TiDB是高度兼容MySQL的,因此也没想太多,准备简单回复和MySQL一样。但转念一想,万一有什么不一样的地方呢,因此仔细试了一把,还真有些不同,值得记录。
MySQL
我们首先看MySQL的官方的说明和例子。
MySQL 8.1 Reference Manual - 11.3.2 The CHAR and VARCHAR Types
The length of a CHAR column is fixed to the length that you declare when you create the table. The length can be any value from 0 to 255. When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed unless the PAD_CHAR_TO_FULL_LENGTH SQL mode is enabled.
– CHAR列的长度是固定的,等于创建表时申明的长度。长度可以是0到255之间的任何值。当存储CHAR值时,右边会补空格来填充至指定长度。当从数据库中检索CHAR值时,右侧空格影响会被溢出,除非启用了PAD_CHAR_TO_FULL_LENGTH SQL 模式。
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 65,535. The effective maximum length of a VARCHAR is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.
– VARCHAR列中的值是可变长字符串。长度可以指定为0到65535之间的值。VARCHAR列的有效最大长度是由最大行大小(65,535字节,所有列都共享)和使用字符集确定的。
Value | CHAR(4) | Storage Required | VARCHAR(4) | Storage Required |
---|---|---|---|---|
‘’ | ’ ’ | 4 bytes | ‘’ | 1 byte |
‘ab’ | 'ab ’ | 4 bytes | ‘ab’ | 3 bytes |
‘abcd’ | ‘abcd’ | 4 bytes | ‘abcd’ | 5 bytes |
‘abcdefgh’ | ‘abcd’ | 4 bytes | ‘abcd’ | 5 bytes |
我们按照官网给的例子简单试运行,以下代码在MySQL 8.0.32版本中进行了验证。
mysql> CREATE TABLE vc (v VARCHAR(4), c CHAR(4));
Query OK, 0 rows affected (0.01 sec)
mysql> INSERT INTO vc VALUES ('ab ', 'ab ');
Query OK, 1 row affected (0.00 sec)
mysql> SELECT CONCAT('(', v, ')'), CONCAT('(', c, ')') FROM vc;
+---------------------+---------------------+
| CONCAT('(', v, ')') | CONCAT('(', c, ')') |
+---------------------+---------------------+
| (ab ) | (ab) |
+---------------------+---------------------+
1 row in set (0.06 sec)
mysql> CREATE TABLE names (myname CHAR(10));
Query OK, 0 rows affected (0.03 sec)
mysql> INSERT INTO names VALUES ('Jones');
Query OK, 1 row affected (0.00 sec)
mysql> SELECT myname = 'Jones', myname = 'Jones ' FROM names;
+------------------+--------------------+
| myname = 'Jones' | myname = 'Jones ' |
+------------------+--------------------+
| 1 | 1 |
+------------------+--------------------+
1 row in set (0.00 sec)
mysql> SELECT myname LIKE 'Jones', myname LIKE 'Jones ' FROM names;
+---------------------+-----------------------+
| myname LIKE 'Jones' | myname LIKE 'Jones ' |
+---------------------+-----------------------+
| 1 | 0 |
+---------------------+-----------------------+
1 row in set (0.00 sec)
TiDB
TiDB的相关说明,也可以在官网上找到,不过看起来相对笼统,很容易给人一种与MySQL完全一致的错觉。
TiDB-字符串类型
但事实操作起来,有些微的不同。为便于理解,我们使用与MySQL完全一致的代码格式来跑一个例子。以下代码在TiDB 5.2.4版本中进行了验证。
TiDB> CREATE TABLE vc (v VARCHAR(4), c CHAR(4));
Query OK, 0 rows affected (0.01 sec)
TiDB> INSERT INTO vc VALUES ('ab ', 'ab ');
Query OK, 1 row affected (0.00 sec)
TiDB> SELECT CONCAT('(', v, ')'), CONCAT('(', c, ')') FROM vc;
+---------------------+---------------------+
| CONCAT('(', v, ')') | CONCAT('(', c, ')') |
+---------------------+---------------------+
| (ab ) | (ab) |
+---------------------+---------------------+
1 row in set (0.06 sec)
TiDB> CREATE TABLE names (myname CHAR(10));
Query OK, 0 rows affected (0.03 sec)
TiDB> INSERT INTO names VALUES ('Jones');
Query OK, 1 row affected (0.00 sec)
TiDB> SELECT myname = 'Jones', myname = 'Jones ' FROM names;
-- 唯一的不同之处!!! 不再匹配右侧补全的空格
+------------------+--------------------+
| myname = 'Jones' | myname = 'Jones ' |
+------------------+--------------------+
| 1 | 0 |
+------------------+--------------------+
1 row in set (0.00 sec)
TiDB> SELECT myname LIKE 'Jones', myname LIKE 'Jones ' FROM names;
+---------------------+-----------------------+
| myname LIKE 'Jones' | myname LIKE 'Jones ' |
+---------------------+-----------------------+
| 1 | 0 |
+---------------------+-----------------------+
1 row in set (0.00 sec)
Oracle
Oracle官网资料目录比较深,找了好一会,这是19版本的。VARCHAR2 and CHAR Data Types
In contrast to VARCHAR2, CHAR stores fixed-length character strings. When you create a table with a CHAR column, the column requires a string length. The default is 1 byte. The database uses blanks to pad the value to the specified length.
Oracle Database compares VARCHAR2 values using nonpadded comparison semantics and compares CHAR values using blank-padded comparison semantics.
这里就不翻译了,大致意思和MySQL的基本一样。
为便于理解,我们还是使用与MySQL完全一致的代码格式。以下SQL不保证能执行,大家意会即可。代码在Oracle 19.4.0.0.190716版本中进行了验证。
Oracle> CREATE TABLE vc (v VARCHAR2(4), c CHAR(4));
Query OK, 0 rows affected (0.01 sec)
Oracle> INSERT INTO vc VALUES ('ab ', 'ab ');
Query OK, 1 row affected (0.00 sec)
Oracle> SELECT CONCAT('(', v, ')'), CONCAT('(', c, ')') FROM vc;
+---------------------+---------------------+
| CONCAT('(', v, ')') | CONCAT('(', c, ')') |
+---------------------+---------------------+
| (ab ) | (ab ) |
+---------------------+---------------------+
1 row in set (0.06 sec)
-- 不同之处,char右侧手动插入的空格会被查询到
Oracle> CREATE TABLE names (myname CHAR(10));
Query OK, 0 rows affected (0.03 sec)
Oracle> INSERT INTO names VALUES ('Jones');
Query OK, 1 row affected (0.00 sec)
Oracle> SELECT myname = 'Jones', myname = 'Jones ' FROM names;
+------------------+--------------------+
| myname = 'Jones' | myname = 'Jones ' |
+------------------+--------------------+
| 1 | 1 |
+------------------+--------------------+
1 row in set (0.00 sec)
Oracle> SELECT myname LIKE 'Jones', myname LIKE 'Jones ' FROM names;
+---------------------+-----------------------+
| myname LIKE 'Jones' | myname LIKE 'Jones ' |
+---------------------+-----------------------+
| 0 | 1 |
+---------------------+-----------------------+
1 row in set (0.00 sec)
-- 诡异的地方,不包含空格的like不会被检索,包含空格的反而会。
结论
对于TiDB和MySQL而言,建议在插入与查询的时候,都忽略char字段右侧自动补全的空格。而对于Oracle则恰好相反,建议在插入与查询中,对于char字段都严格按照长度进行手工的右侧空格补全处理,避免出现意料外的情况。
当然,本身来说,char字段的使用以及长度的值,应该成为设计评审中重点关注的对象。