本文不讨论框架实现原理以及源码分析,只做功能使用案例说明
数据分片:
表分片可以帮助评论应用程序更有效地管理其不断增长的评论表,提高性能和可扩展性,同时还使备份和维护任务更易于管理
Apache ShardingSphere 有两种形式:
- ShardingSphere-JDBC是一个轻量级的Java框架,在Java的JDBC层提供额外的服务。
- ShardingSphere-Proxy是一个透明的数据库代理,提供了一个数据库服务器,封装了数据库二进制协议来支持异构语言。
本文主要针对ShardingSphere-JDBC 的数据分片。
依赖项:
org.apache.shardingsphere:shardingsphere-jdbc-core:5.3.2
org.apache.shardingsphere:shardingsphere-cluster-mode-core:5.3.2
org.apache.shardingsphere:shardingsphere-cluster-mode-repository-zookeeper:5.3.2
org.apache.shardingsphere:shardingsphere-cluster-mode-repository-api:5.3.2
建议使用ShardingSphere的版本是5.X版本,最好是非spring boot 的 starter版本,这样会更加灵活
配置项:
ShardingSphere-JDBC配置主要有两种方式:YAML配置和Java配置。本文选择了YAML配置方式
application.yaml
spring:
datasource:
username: my_user
password: my_password
url: jdbc:mysql://localhost:3306/reviews-db?allowPublicKeyRetrieval=true&useSSL=false
tomcat:
validation-query: "SELECT 1"
test-while-idle: true
jpa:
properties:
hibernate:
dialect: org.hibernate.dialect.MySQL8Dialect
open-in-view: false
hibernate:
ddl-auto: none
我们指定用于数据源的驱动程序将是ShardingSphereDriver
并且url
应该根据此文件选择sharding.yaml
dataSources:
master:
dataSourceClassName: com.zaxxer.hikari.HikariDataSource
driverClassName: com.mysql.jdbc.Driver
jdbcUrl: jdbc:mysql://localhost:3306/reviews-db?allowPublicKeyRetrieval=true&useSSL=false
username: my_user
password: my_password
connectionTimeoutMilliseconds: 30000
idleTimeoutMilliseconds: 60000
maxLifetimeMilliseconds: 1800000
maxPoolSize: 65
minPoolSize: 1
mode:
type: Standalone
repository:
type: JDBC
rules:
- !SHARDING
tables:
reviews:
actualDataNodes: master.reviews_$->{0..1}
tableStrategy:
standard:
shardingColumn: course_id
shardingAlgorithmName: inline
shardingAlgorithms:
inline:
type: INLINE
props:
algorithm-expression: reviews_$->{course_id % 2}
allow-range-query-with-inline-sharding: true
props:
proxy-hint-enabled: true
sql-show: true
现在让我们分析配置中重要的属性:
dataSources.master
– 这是我们主数据源的定义。mode
– 既可以是standalone withJDBC
type,也可以是cluster withZookeeper
type(推荐用于生产),用于配置信息持久化rules
– 在这里,我们可以启用各种ShardingSphere功能,例如 –!SHARDING
tables.reviews
– 在这里,我们根据inline
语法规则描述实际的表,这意味着我们将有两个表reviews_0
并按reviews_1
列分片course_id
。shardingAlgorithms
– 在这里,我们通过一个 groovy 表达式来描述手动内联分片算法,该表达式告诉评论表根据列分为两个表course_id
。
props
– 在这里,我们启用了拦截/格式化 sql 查询(可以禁用/注释 p6spy)。
重要提示: 在开始我们的应用程序之前,我们需要确保创建了我们定义的分片,因此我在我的数据库中创建了两个表:
reviews_0
和reviews_1
(init.sql
)。
请求调试:
现在我们准备启动我们的应用程序并执行一些请求
POST http://localhost:8070/api/v1/reviews/
Content-Type: application/json
{
"text": "This is a great course!",
"author": "John Doe",
"authorTelephone": "555-1234",
"authorEmail": "johndoe@example.com",
"invoiceCode": "ABC123",
"courseId": 123
}
我们可以看到如下日志:
INFO 35412 --- [nio-8070-exec-2] ShardingSphere-SQL: Actual SQL: master ::: insert into reviews_1 (created_at, last_modified_at, author, author_email, author_telephone, course_id, invoice_code, text, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?) ::: [2023-04-17 15:42:01.8069745, 2023-04-17 15:42:01.8069745, John Doe, johndoe@example.com, 555-1234, 123, ABC123, This is a great course!, 4]
如果我们要使用不同的负载执行另一个请求:
INFO 35412 --- [nio-8070-exec-8] ShardingSphere-SQL: Actual SQL: master ::: insert into reviews_1 (created_at, last_modified_at, author, author_email, author_telephone, course_id, invoice_code, text, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?) ::: [2023-04-17 15:43:47.3267788, 2023-04-17 15:43:47.3267788, Mike Scott, mikescott@example.com, 555-1234, 123, ABC123, This is an amazing course!, 5]
现在我们可以根据course_id
GET http://localhost:8070/api/v1/reviews/filter?courseId=123
GET http://localhost:8070/api/v1/reviews/filter?courseId=124
并在日志中观察我们两个表之间的路由是如何发生的。
INFO 35412 --- [nio-8070-exec-9] ShardingSphere-SQL: Actual SQL: master ::: select review0_.id as id1_0_, review0_.created_at as created_2_0_, review0_.last_modified_at as last_mod3_0_, review0_.author as author4_0_, review0_.author_email as author_e5_0_, review0_.author_telephone as author_t6_0_, review0_.course_id as course_i7_0_, review0_.invoice_code as invoice_8_0_, review0_.text as text9_0_ from reviews_1 review0_ where review0_.course_id=? ::: [123]
INFO 35412 --- [nio-8070-exec-5] ShardingSphere-SQL: Actual SQL: master ::: select review0_.id as id1_0_, review0_.created_at as created_2_0_, review0_.last_modified_at as last_mod3_0_, review0_.author as author4_0_, review0_.author_email as author_e5_0_, review0_.author_telephone as author_t6_0_, review0_.course_id as course_i7_0_, review0_.invoice_code as invoice_8_0_, review0_.text as text9_0_ from reviews_0 review0_ where review0_.course_id=? ::: [124]
第一个select
针对reviews_1
表,第二个针对reviews_0
-正在运行的分片
分片进阶操作:
默认分片策略是对配置文件中的shardingColumn进行algorithm-expression配置规则运算,如果有些定制化的场景需求,那么也可以自己实现分片计算逻辑
sharding.jdbc.config.sharding.tables.reviews.actual-data-nodes= master.reviews_$->{0..1}
sharding.jdbc.config.sharding.tables.reviews.table-strategy.standard.sharding-column=course_id
sharding.jdbc.config.sharding.tables.reviews.table-strategy.standard.precise-algorithm-class-name=com.demo.shardingjdbc.PreciseShardingDBAlgorithm
sharding.jdbc.config.sharding.tables.reviews.table-strategy.standard.range-algorithm-class-name=com.demo.shardingjdbc.RangeShardingDBAlgorithm
自定义精确匹配策略实现:
主要用于where、in
public class PreciseShardingDBAlgorithm implements PreciseShardingAlgorithm<String> {
@Override
public String doSharding(Collection<String> availableTargetNames, PreciseShardingValue<String> preciseShardingValue) {
}
}
自定义范围匹配的策略实现:
public class PreciseShardingDBAlgorithm implements RangeShardingAlgorithm<String> {
@Override
public String doSharding(Collection<String> availableTargetNames, PreciseShardingValue<String> preciseShardingValue) {
}
}
读写分离:
现在让我们想象另一个问题,评论应用时间可能会在高峰时段承受高压力,从而导致响应时间变慢并降低用户体验。针对这个问题,我们可以实现读写分离来平衡负载,提高性能。
ShardingSphere 为我们提供了读写分离的 解决方案。读写分离涉及将读取查询定向到副本数据库,将写入查询定向到主数据库,确保读取请求不会干扰写入请求并优化数据库性能。
在配置读写分离解决方案之前,我们必须对数据库架构进行一定变更(主从模式)
读写分离数据源配置:
dataSources:
master:
dataSourceClassName: com.zaxxer.hikari.HikariDataSource
driverClassName: com.mysql.jdbc.Driver
jdbcUrl: jdbc:mysql://localhost:3306/reviews-db?allowPublicKeyRetrieval=true&useSSL=false
username: my_user
password: my_password
connectionTimeoutMilliseconds: 30000
idleTimeoutMilliseconds: 60000
maxLifetimeMilliseconds: 1800000
maxPoolSize: 65
minPoolSize: 1
slave0:
dataSourceClassName: com.zaxxer.hikari.HikariDataSource
driverClassName: com.mysql.jdbc.Driver
jdbcUrl: jdbc:mysql://localhost:49922/reviews-db?allowPublicKeyRetrieval=true&useSSL=false
username: my_user
password: my_password
connectionTimeoutMilliseconds: 30000
idleTimeoutMilliseconds: 60000
maxLifetimeMilliseconds: 1800000
maxPoolSize: 65
minPoolSize: 1
slave1:
dataSourceClassName: com.zaxxer.hikari.HikariDataSource
driverClassName: com.mysql.jdbc.Driver
jdbcUrl: jdbc:mysql://localhost:49923/reviews-db?allowPublicKeyRetrieval=true&useSSL=false
username: my_user
password: my_password
connectionTimeoutMilliseconds: 30000
idleTimeoutMilliseconds: 60000
maxLifetimeMilliseconds: 1800000
maxPoolSize: 65
minPoolSize: 1
读写分离规则:
- !READWRITE_SPLITTING
dataSources:
readwrite_ds:
staticStrategy:
writeDataSourceName: master
readDataSourceNames:
- slave0
- slave1
loadBalancerName: readwrite-load-balancer
loadBalancers:
readwrite-load-balancer:
type: ROUND_ROBIN
我们指定写入数据源名称为master
读取数据源指向我们的 slaves:slave0
and slave1
; 我们选择了一种round-robin
负载均衡器算法。
重要提示: 最后要进行的更改是关于分片规则,它对新配置的读写分离规则一无所知并直接指向 master:
分片数据源变更:
sharding.jdbc.config.sharding.tables.reviews.actual-data-nodes=readwrite_ds.reviews_$->{0..1}
我们可以启动我们的应用程序,运行相同的 POST 请求并观察日志:
INFO 22860 --- [nio-8070-exec-1] ShardingSphere-SQL: Actual SQL: master ::: insert into reviews_0 (created_at, last_modified_at, author, author_email, author_telephone, course_id, invoice_code, text, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?) ::: [2023-04-17 16:12:07.25473, 2023-04-17 16:12:07.25473, Mike Scott, mikescott@example.com, 555-1234, 124, ABC123, This is an amazing course!, 7]
这里分片仍然有效,并且查询发生在master
数据源(写入数据源)中。但是如果我们要运行几个 GET 请求,我们将观察到以下内容:
INFO 22860 --- [nio-8070-exec-2] ShardingSphere-SQL: Actual SQL: slave0 ::: select review0_.id as id1_0_, review0_.created_at as created_2_0_, review0_.last_modified_at as last_mod3_0_, review0_.author as author4_0_, review0_.author_email as author_e5_0_, review0_.author_telephone as author_t6_0_, review0_.course_id as course_i7_0_, review0_.invoice_code as invoice_8_0_, review0_.text as text9_0_ from reviews_0 review0_ where review0_.course_id=? ::: [124]
INFO 22860 --- [nio-8070-exec-4] ShardingSphere-SQL: Actual SQL: slave1 ::: select review0_.id as id1_0_, review0_.created_at as created_2_0_, review0_.last_modified_at as last_mod3_0_, review0_.author as author4_0_, review0_.author_email as author_e5_0_, review0_.author_telephone as author_t6_0_, review0_.course_id as course_i7_0_, review0_.invoice_code as invoice_8_0_, review0_.text as text9_0_ from reviews_0 review0_ where review0_.course_id=? ::: [124]
您可以观察读写分离的运行情况;我们的 写入 查询发生在master
数据源中,但我们的读取查询发生在主副本(slave0
和slave1
)中,同时保持正确的分片规则。
数据屏蔽:
关于我们的应用程序的另一个假想问题。想象一下,由于数据隐私法规,某些用户或应用程序可能需要访问客户电子邮件、电话号码和发票代码等敏感信息,同时对其他人保持隐藏状态。
为了解决这个问题,我们可以实施数据屏蔽解决方案,在映射结果时或在 SQL 级别屏蔽敏感数据。ShardingSphere在这里通过另一个易于启用的功能来解决——数据屏蔽。
配置变更:
- !MASK
tables:
reviews:
columns:
invoice_code:
maskAlgorithm: md5_mask
author_email:
maskAlgorithm: mask_before_special_chars_mask
author_telephone:
maskAlgorithm: keep_first_n_last_m_mask
maskAlgorithms:
md5_mask:
type: MD5
mask_before_special_chars_mask:
type: MASK_BEFORE_SPECIAL_CHARS
props:
special-chars: '@'
replace-char: '*'
keep_first_n_last_m_mask:
type: KEEP_FIRST_N_LAST_M
props:
first-n: 3
last-m: 2
replace-char: '*'
让我们看看这里有什么:
table.reviews
– 我们为前面提到的每一列定义了三种掩码 算法maskAlgorithms.md5_mask
– 我们MD5
为 invoice_code 指定了算法类型maskAlgorithms.mask_before_special_chars_mask
– 我们MASK_BEFORE_SPECIAL_CHARS
为列配置了算法,这意味着@author_email
符号之前的所有字符都将替换为*符号。maskAlgorithms.keep_first_n_last_m_mask
– 我们KEEP_FIRST_N_LAST_M
为author_telephone
列配置了算法,这意味着只有电话号码的前 3 个和后 2 个字符保持不变;介于两者之间的所有内容都将被* 符号掩盖。
我们启动我们的应用程序并执行相同的 POST 请求
INFO 35296 --- [nio-8070-exec-1] ShardingSphere-SQL: Actual SQL: master ::: insert into reviews_0 (created_at, last_modified_at, author, author_email, author_telephone, course_id, invoice_code, text, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?) ::: [2023-04-17 16:26:51.8188306, 2023-04-17 16:26:51.8188306, Mike Scott, mikescott@example.com, 555-1234, 124, ABC123, This is an amazing course!, 9]
[
{
"text": "This is an amazing course!",
"author": "Mike Scott",
"authorTelephone": "555***34",
"authorEmail": "*********@example.com",
"invoiceCode": "bbf2dead374654cbb32a917afd236656",
"courseId": 124,
"id": 9,
"lastModifiedAt": "2023-04-17T15:44:43"
},
]
数据在数据库中保持不变,但在查询和传递时,根据我们在数据屏蔽规则中定义的算法屏蔽了电话、电子邮件和发票代码