ShardingSphere数据分片、读写分离、数据屏蔽教程

本文不讨论框架实现原理以及源码分析，只做功能使用案例说明

数据分片：

表分片可以帮助评论应用程序更有效地管理其不断增长的评论表，提高性能和可扩展性，同时还使备份和维护任务更易于管理

Apache ShardingSphere 有两种形式：

ShardingSphere-JDBC是一个轻量级的Java框架，在Java的JDBC层提供额外的服务。
ShardingSphere-Proxy是一个透明的数据库代理，提供了一个数据库服务器，封装了数据库二进制协议来支持异构语言。

本文主要针对ShardingSphere-JDBC 的数据分片。

依赖项：

org.apache.shardingsphere:shardingsphere-jdbc-core:5.3.2

org.apache.shardingsphere:shardingsphere-cluster-mode-core:5.3.2

org.apache.shardingsphere:shardingsphere-cluster-mode-repository-zookeeper:5.3.2

org.apache.shardingsphere:shardingsphere-cluster-mode-repository-api:5.3.2

建议使用ShardingSphere的版本是5.X版本，最好是非spring boot 的 starter版本，这样会更加灵活

配置项：

ShardingSphere-JDBC配置主要有两种方式：YAML配置和Java配置。本文选择了YAML配置方式

application.yaml

spring:
  datasource:
    username: my_user
    password: my_password
    url: jdbc:mysql://localhost:3306/reviews-db?allowPublicKeyRetrieval=true&useSSL=false
    tomcat:
      validation-query: "SELECT 1"
      test-while-idle: true
  jpa:
    properties:
      hibernate:
        dialect: org.hibernate.dialect.MySQL8Dialect
    open-in-view: false
    hibernate:
      ddl-auto: none

我们指定用于数据源的驱动程序将是ShardingSphereDriver并且url应该根据此文件选择sharding.yaml

dataSources:
  master:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: com.mysql.jdbc.Driver
    jdbcUrl: jdbc:mysql://localhost:3306/reviews-db?allowPublicKeyRetrieval=true&useSSL=false
    username: my_user
    password: my_password
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 65
    minPoolSize: 1

mode:
  type: Standalone
  repository:
    type: JDBC

rules:
  - !SHARDING
    tables:
      reviews:
        actualDataNodes: master.reviews_$->{0..1}
        tableStrategy:
          standard:
            shardingColumn: course_id
            shardingAlgorithmName: inline
    shardingAlgorithms:
      inline:
        type: INLINE
        props:
          algorithm-expression: reviews_$->{course_id % 2}
          allow-range-query-with-inline-sharding: true
props:
  proxy-hint-enabled: true
  sql-show: true

现在让我们分析配置中重要的属性：

dataSources.master– 这是我们主数据源的定义。
mode– 既可以是standalone with JDBCtype，也可以是cluster with Zookeepertype（推荐用于生产），用于配置信息持久化
rules– 在这里，我们可以启用各种ShardingSphere功能，例如 –!SHARDING

tables.reviews– 在这里，我们根据inline语法规则描述实际的表，这意味着我们将有两个表reviews_0并按reviews_1列分片course_id。
shardingAlgorithms– 在这里，我们通过一个 groovy 表达式来描述手动内联分片算法，该表达式告诉评论表根据列分为两个表course_id。

props– 在这里，我们启用了拦截/格式化 sql 查询（可以禁用/注释 p6spy）。

重要提示： 在开始我们的应用程序之前，我们需要确保创建了我们定义的分片，因此我在我的数据库中创建了两个表：reviews_0和reviews_1( init.sql)。

请求调试：

现在我们准备启动我们的应用程序并执行一些请求

POST http://localhost:8070/api/v1/reviews/
Content-Type: application/json

{
  "text": "This is a great course!",
  "author": "John Doe",
  "authorTelephone": "555-1234",
  "authorEmail": "johndoe@example.com",
  "invoiceCode": "ABC123",
  "courseId": 123
}

我们可以看到如下日志：

INFO 35412 --- [nio-8070-exec-2] ShardingSphere-SQL: Actual SQL: master ::: insert into reviews_1 (created_at, last_modified_at, author, author_email, author_telephone, course_id, invoice_code, text, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?) ::: [2023-04-17 15:42:01.8069745, 2023-04-17 15:42:01.8069745, John Doe, johndoe@example.com, 555-1234, 123, ABC123, This is a great course!, 4]

如果我们要使用不同的负载执行另一个请求：

INFO 35412 --- [nio-8070-exec-8] ShardingSphere-SQL: Actual SQL: master ::: insert into reviews_1 (created_at, last_modified_at, author, author_email, author_telephone, course_id, invoice_code, text, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?) ::: [2023-04-17 15:43:47.3267788, 2023-04-17 15:43:47.3267788, Mike Scott, mikescott@example.com, 555-1234, 123, ABC123, This is an amazing course!, 5]

现在我们可以根据course_id

GET http://localhost:8070/api/v1/reviews/filter?courseId=123

GET http://localhost:8070/api/v1/reviews/filter?courseId=124

并在日志中观察我们两个表之间的路由是如何发生的。

INFO 35412 --- [nio-8070-exec-9] ShardingSphere-SQL: Actual SQL: master ::: select review0_.id as id1_0_, review0_.created_at as created_2_0_, review0_.last_modified_at as last_mod3_0_, review0_.author as author4_0_, review0_.author_email as author_e5_0_, review0_.author_telephone as author_t6_0_, review0_.course_id as course_i7_0_, review0_.invoice_code as invoice_8_0_, review0_.text as text9_0_ from reviews_1 review0_ where review0_.course_id=? ::: [123]

INFO 35412 --- [nio-8070-exec-5] ShardingSphere-SQL: Actual SQL: master ::: select review0_.id as id1_0_, review0_.created_at as created_2_0_, review0_.last_modified_at as last_mod3_0_, review0_.author as author4_0_, review0_.author_email as author_e5_0_, review0_.author_telephone as author_t6_0_, review0_.course_id as course_i7_0_, review0_.invoice_code as invoice_8_0_, review0_.text as text9_0_ from reviews_0 review0_ where review0_.course_id=? ::: [124]

第一个select针对reviews_1表，第二个针对reviews_0-正在运行的分片

分片进阶操作：

默认分片策略是对配置文件中的shardingColumn进行algorithm-expression配置规则运算，如果有些定制化的场景需求，那么也可以自己实现分片计算逻辑

sharding.jdbc.config.sharding.tables.reviews.actual-data-nodes= master.reviews_$->{0..1}
sharding.jdbc.config.sharding.tables.reviews.table-strategy.standard.sharding-column=course_id
sharding.jdbc.config.sharding.tables.reviews.table-strategy.standard.precise-algorithm-class-name=com.demo.shardingjdbc.PreciseShardingDBAlgorithm
sharding.jdbc.config.sharding.tables.reviews.table-strategy.standard.range-algorithm-class-name=com.demo.shardingjdbc.RangeShardingDBAlgorithm

自定义精确匹配策略实现：

主要用于where、in

public class PreciseShardingDBAlgorithm implements PreciseShardingAlgorithm<String> {
   @Override
   public String doSharding(Collection<String> availableTargetNames, PreciseShardingValue<String> preciseShardingValue) {
        
    }
}

自定义范围匹配的策略实现：

public class PreciseShardingDBAlgorithm implements RangeShardingAlgorithm<String> {
   @Override
   public String doSharding(Collection<String> availableTargetNames, PreciseShardingValue<String> preciseShardingValue) {
        
    }
}

读写分离：

现在让我们想象另一个问题，评论应用时间可能会在高峰时段承受高压力，从而导致响应时间变慢并降低用户体验。针对这个问题，我们可以实现读写分离来平衡负载，提高性能。

ShardingSphere 为我们提供了读写分离的 解决方案。读写分离涉及将读取查询定向到副本数据库，将写入查询定向到主数据库，确保读取请求不会干扰写入请求并优化数据库性能。

在配置读写分离解决方案之前，我们必须对数据库架构进行一定变更（主从模式）

读写分离数据源配置：

dataSources:
  master:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: com.mysql.jdbc.Driver
    jdbcUrl: jdbc:mysql://localhost:3306/reviews-db?allowPublicKeyRetrieval=true&useSSL=false
    username: my_user
    password: my_password
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 65
    minPoolSize: 1

  slave0:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: com.mysql.jdbc.Driver
    jdbcUrl: jdbc:mysql://localhost:49922/reviews-db?allowPublicKeyRetrieval=true&useSSL=false
    username: my_user
    password: my_password
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 65
    minPoolSize: 1

  slave1:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: com.mysql.jdbc.Driver
    jdbcUrl: jdbc:mysql://localhost:49923/reviews-db?allowPublicKeyRetrieval=true&useSSL=false
    username: my_user
    password: my_password
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 65
    minPoolSize: 1

读写分离规则：

- !READWRITE_SPLITTING
  dataSources:
    readwrite_ds:
      staticStrategy:
        writeDataSourceName: master
        readDataSourceNames:
          - slave0
          - slave1
      loadBalancerName: readwrite-load-balancer
  loadBalancers:
    readwrite-load-balancer:
      type: ROUND_ROBIN

我们指定写入数据源名称为master读取数据源指向我们的 slaves：slave0and slave1; 我们选择了一种round-robin 负载均衡器算法。

重要提示： 最后要进行的更改是关于分片规则，它对新配置的读写分离规则一无所知并直接指向 master：

分片数据源变更：

sharding.jdbc.config.sharding.tables.reviews.actual-data-nodes=readwrite_ds.reviews_$->{0..1}

我们可以启动我们的应用程序，运行相同的 POST 请求并观察日志：

INFO 22860 --- [nio-8070-exec-1] ShardingSphere-SQL: Actual SQL: master ::: insert into reviews_0 (created_at, last_modified_at, author, author_email, author_telephone, course_id, invoice_code, text, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?) ::: [2023-04-17 16:12:07.25473, 2023-04-17 16:12:07.25473, Mike Scott, mikescott@example.com, 555-1234, 124, ABC123, This is an amazing course!, 7]

这里分片仍然有效，并且查询发生在master数据源（写入数据源）中。但是如果我们要运行几个 GET 请求，我们将观察到以下内容：

INFO 22860 --- [nio-8070-exec-2] ShardingSphere-SQL: Actual SQL: slave0 ::: select review0_.id as id1_0_, review0_.created_at as created_2_0_, review0_.last_modified_at as last_mod3_0_, review0_.author as author4_0_, review0_.author_email as author_e5_0_, review0_.author_telephone as author_t6_0_, review0_.course_id as course_i7_0_, review0_.invoice_code as invoice_8_0_, review0_.text as text9_0_ from reviews_0 review0_ where review0_.course_id=? ::: [124]

INFO 22860 --- [nio-8070-exec-4] ShardingSphere-SQL: Actual SQL: slave1 ::: select review0_.id as id1_0_, review0_.created_at as created_2_0_, review0_.last_modified_at as last_mod3_0_, review0_.author as author4_0_, review0_.author_email as author_e5_0_, review0_.author_telephone as author_t6_0_, review0_.course_id as course_i7_0_, review0_.invoice_code as invoice_8_0_, review0_.text as text9_0_ from reviews_0 review0_ where review0_.course_id=? ::: [124]

您可以观察读写分离的运行情况；我们的 写入查询发生在master数据源中，但我们的读取查询发生在主副本（slave0和slave1）中，同时保持正确的分片规则。

数据屏蔽：

关于我们的应用程序的另一个假想问题。想象一下，由于数据隐私法规，某些用户或应用程序可能需要访问客户电子邮件、电话号码和发票代码等敏感信息，同时对其他人保持隐藏状态。

为了解决这个问题，我们可以实施数据屏蔽解决方案，在映射结果时或在 SQL 级别屏蔽敏感数据。ShardingSphere在这里通过另一个易于启用的功能来解决——数据屏蔽。

配置变更：

- !MASK
  tables:
    reviews:
      columns:
        invoice_code:
          maskAlgorithm: md5_mask
        author_email:
          maskAlgorithm: mask_before_special_chars_mask
        author_telephone:
          maskAlgorithm: keep_first_n_last_m_mask

  maskAlgorithms:
    md5_mask:
      type: MD5
    mask_before_special_chars_mask:
      type: MASK_BEFORE_SPECIAL_CHARS
      props:
        special-chars: '@'
        replace-char: '*'
    keep_first_n_last_m_mask:
      type: KEEP_FIRST_N_LAST_M
      props:
        first-n: 3
        last-m: 2
        replace-char: '*'

让我们看看这里有什么：

table.reviews– 我们为前面提到的每一列定义了三种掩码算法
maskAlgorithms.md5_mask– 我们MD5为 invoice_code 指定了算法类型
maskAlgorithms.mask_before_special_chars_mask– 我们MASK_BEFORE_SPECIAL_CHARS为列配置了算法，这意味着@ author_email符号之前的所有字符都将替换为*符号。
maskAlgorithms.keep_first_n_last_m_mask– 我们KEEP_FIRST_N_LAST_M为author_telephone列配置了算法，这意味着只有电话号码的前 3 个和后 2 个字符保持不变；介于两者之间的所有内容都将被* 符号掩盖。

我们启动我们的应用程序并执行相同的 POST 请求

INFO 35296 --- [nio-8070-exec-1] ShardingSphere-SQL: Actual SQL: master ::: insert into reviews_0 (created_at, last_modified_at, author, author_email, author_telephone, course_id, invoice_code, text, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?) ::: [2023-04-17 16:26:51.8188306, 2023-04-17 16:26:51.8188306, Mike Scott, mikescott@example.com, 555-1234, 124, ABC123, This is an amazing course!, 9]

[
  {
    "text": "This is an amazing course!",
    "author": "Mike Scott",
    "authorTelephone": "555***34",
    "authorEmail": "*********@example.com",
    "invoiceCode": "bbf2dead374654cbb32a917afd236656",
    "courseId": 124,
    "id": 9,
    "lastModifiedAt": "2023-04-17T15:44:43"
  },
]

数据在数据库中保持不变，但在查询和传递时，根据我们在数据屏蔽规则中定义的算法屏蔽了电话、电子邮件和发票代码