intersect组件是解决纵向联邦学习中的隐私求交问题
fate隐私求交的方式有三种:raw,rsa,dh。raw方式不安全,rsa和dh方式是安全的,dh是基于对称加密的安全交集 rsa是基于RSA(非对称加密)的安全交集,,dh方法也用于安全的信息检索(SIR),fate intersect 支持多主机模式即(1个guest与多个host进行求交集)
可配置的hash方法有sha256、md5 和 sm3。raw交集支持base64编码,支持与缓存的交集。
统案例脚本文件:/data/projects/fate/examples/dsl/v2/
1、 1v1案例:
本次案例在guest执行任务,求交host
host | guest | |
数据文件名称 | xxl_test_host.csv | xxl_test_guest.csv |
表空间名称 | sp_host | sp_guest |
表名称 | tb_host | tb_guest |
/data/projects/fate/examples 目录创建测试目录,并拷贝系统配置
cd /data/projects/fate/examples/mytest
cp /data/projects/fate/examples/dsl/v2/intersect/test_intersect_job_dsl.json ./
cp /data/projects/fate/examples/dsl/v2/intersect/test_intersect_job_raw_conf.json ./
1.1上传文件
1.1.1、host创建上传脚本:
upload_xxl_host.json
{
"file": "/data/projects/fate/examples/mytest/xxl_test_host.csv",
"head": 1,
"partition": 10,
"work_mode": 0,
"namespace": "sp_host",
"table_name": "tb_host"
}
xxl_test_host.csv 是数据文件(需要有表头)
namespace 表空间名称
table_name 表名称
1.1.2、guest创建脚本:
upload_xxl_guest.json
{
"file": "/data/projects/fate/examples/mytest/xxl_test_guest.csv",
"head": 1,
"partition": 10,
"work_mode": 0,
"namespace": "sp_guest",
"table_name": "tb_guest"
}
1.1.3上传文件
# host 端:
source /data/projects/fate/bin/init_env.sh
flow data upload -c upload_xxl_host.json# guest端:
source /data/projects/fate/bin/init_env.sh
flow data upload -c upload_xxl_guest.json
1.2创建任务脚本
1.2.1、创建dsl文件
test_intersect_job_dsl.json
{
"components": {
"reader_0": {
"module": "Reader",
"output": {
"data": [
"data"
]
}
},
"data_transform_0": {
"module": "DataTransform",
"input": {
"data": {
"data": [
"reader_0.data"
]
}
},
"output": {
"data": [
"data"
],
"model": [
"model"
]
}
},
"intersect_0": {
"module": "Intersection",
"input": {
"data": {
"data": [
"data_transform_0.data"
]
}
},
"output": {
"data": [
"data"
]
}
}
}
}
1.2.2、创建任务配置文件
test_intersect_job_rsa_conf.json
{
"dsl_version": 2,
"initiator": {
"role": "guest",
"party_id": 9999
},
"role": {
"guest": [
9999
],
"host": [
10000
]
},
"component_parameters": {
"common": {
"intersect_0": {
"intersect_method": "rsa",
"sync_intersect_ids": false,
"only_output_key": true,
"rsa_params": {
"hash_method": "sha256",
"final_hash_method": "sha256",
"split_calculation": false,
"key_length": 2048
}
}
},
"role": {
"guest": {
"0": {
"reader_0": {
"table": {
"name": "tb_guest",
"namespace": "sp_guest"
}
},
"data_transform_0": {
"with_label": false,
"output_format": "dense"
}
}
},
"host": {
"0": {
"reader_0": {
"table": {
"name": "tb_host",
"namespace": "sp_host"
}
},
"data_transform_0": {
"with_label": false,
"output_format": "dense"
}
}
}
}
}
}
1.3执行任务(guest端执行)
cd /data/projects/fate/examples/mytest
source /data/projects/fate/bin/init_env.sh
flow job submit -d test_intersect_job_dsl.json -c test_intersect_job_raw_conf.json
2、1v2案例
本次案例在guest上执行求交任务,求交host2个文件
2.1 上传文件
2.1.1 host创建上传脚本(2个):
upload_xxl_host.json
# 上传第一个文件:upload_xxl_host1.json
{
"file": "/data/projects/fate/examples/mytest/1v2/xxl_test_host1.csv",
"head": 1,
"partition": 10,
"work_mode": 0,
"namespace": "sp_host1",
"table_name": "tb_host1"
}
# 上传第二个文件:upload_xxl_host2.json
{
"file": "/data/projects/fate/examples/mytest/xxl_test_host2.csv",
"head": 1,
"partition": 10,
"work_mode": 0,
"namespace": "sp_host2",
"table_name": "tb_host2"
}
2.1.2 guest创建脚本:
upload_xxl_guest.json
{
"file": "/data/projects/fate/examples/mytest/1v2/xxl_test_guest.csv",
"head": 1,
"partition": 10,
"work_mode": 0,
"namespace": "sp_guest1",
"table_name": "tb_guest1"
}
2.1.3 上传文件
# host 端:
source /data/projects/fate/bin/init_env.sh
cd /data/projects/fate/examples/mytest/1v2flow data upload -c upload_xxl_host1.json
flow data upload -c upload_xxl_host2.json
# guest端:
source /data/projects/fate/bin/init_env.sh
cd /data/projects/fate/examples/mytest/1v2flow data upload -c upload_xxl_guest.json
2.2 创建任务脚本
本次任务在guest上执行,任务脚本在guest端
2.2.1 创建dsl文件
test_union_job_dsl.json
{
"components": {
"reader_0": {
"module": "Reader",
"output": {
"data": [
"data"
]
}
},
"reader_1": {
"module": "Reader",
"output": {
"data": [
"data"
]
}
},
"intersection_0": {
"module": "Intersection",
"input": {
"data": {
"data": [
"reader_0.data"
]
}
},
"output": {
"data": [
"data"
]
}
},
"intersection_1": {
"module": "Intersection",
"input": {
"data": {
"data": [
"reader_1.data"
]
}
},
"output": {
"data": [
"data"
]
}
},
"union_0": {
"module": "Union",
"input": {
"data": {
"data": [
"intersection_0.data",
"intersection_1.data"
]
}
},
"output": {
"data": [
"data"
]
}
}
}
}
2.2.2 创建任务配置文件
test_union_job_conf.json
注意:因为 guest 只有1个文件,host 有2个文件,guest 1个文件求交 host 2个文件,所以这里 guest 角色的 reader_ 和 reader_1 读取的数据都是 guest 同一个表的同一份数据
{
"dsl_version": 2,
"initiator": {
"role": "guest",
"party_id": 9999
},
"role": {
"host": [
10000
],
"guest": [
9999
]
},
"component_parameters": {
"common": {
"union_0": {
"allow_missing": false,
"need_run": true
}
},
"role": {
"guest": {
"0": {
"reader_0": {
"table": {
"name": "tb_guest1",
"namespace": "sp_guest1"
}
},
"reader_1": {
"table": {
"name": "tb_guest1",
"namespace": "sp_guest1"
}
}
}
},
"host": {
"0": {
"reader_0": {
"table": {
"name": "tb_host1",
"namespace": "sp_host1"
}
},
"reader_1": {
"table": {
"name": "tb_host2",
"namespace": "sp_host2"
}
}
}
}
}
}
}
2.3 执行任务(guest端执行)
cd /data/projects/fate/examples/mytest/1v2
source /data/projects/fate/bin/init_env.shflow job submit -d test_intersect_job_dsl.json -c test_intersect_job_raw_conf.json
正确求交页面显示: