背景

fate是一个服务，还原联邦学习，所以分client和host两种身份，一般来说用户都是client，用户想要上传自己的数据，合并他人数据最终获得一个更好的模型，所以要“上传”数据。

在 FATE 框架中，横向联邦的场景被称为 homo，纵向的被称为 hetero，比如纵向安全提升树模型就叫做 hetero secure boost。

上传

官方文档：https://fate.readthedocs.io/en/latest/tutorial/pipeline
强烈建议对着官方文档看我这个！

工具

FATE框架可以使用pipeline工具进行上传。

先下载fate_client，因为Pipeline是fate_client里的一个工具。

pip install fate_client

根据文档，想要使用pipeline，需要命令行配合使用

pipeline init --ip=xxx --port=xxx

先terminal里面对pipeline初始化才能使用pipeline，ip和port要跟FATE启动时的ip和port要对应，如果是standalone，那么ip是127.0.0.1，port一般是9380。

如果记不清fate的配置了，使用（暂时还没找到，等着补上

flow

如果记不清pipeleine的配置了，使用

pipeline config show

查看

Python开发

python文件如下代码即可上传csv文件。
每一个上传的数据都会有自己的table_name和namespace，fate用这两个字段来命名区分每一个上传的数据。

from pipeline.backend.pipeline import PipeLine

pipeline = PipeLine() \
        .set_initiator(role='guest', party_id=9999) \
        .set_roles(guest=9999, host=10000) # what do guest and host stands for?


data_path='/root/Downloads/dummy.csv'
table_name='dummy'
namespace='dummy'
pipeline.add_upload_data(file=data_path,table_name=table_name,namespace=namespace)
pipeline.upload(drop=1) # what does drop=1 or 0 mean?

成功运行后，terminal会出现类似字样。在这里插入图片描述

从FATE服务中获得数据

官方文档：https://fate.readthedocs.io/en/latest/tutorial/pipeline/pipeline_tutorial_hetero_sbt/#install强烈建议对着官方文档看我这个！

文档中的sbt，其实就是Secure Boost Tree，一个决策树模型，因为使用了FATE，所以叫Secure。

工具

FATE中使用Reader类，从FATE框架中获得数据。

文档中说“load data”，我一开始以为load data是从本地load，汗！文档最好改成load data from FATE service……

使用Reader类获得数据后，可以使用DataTransform类进行变换。文档和代码有提，可以参考文档。使用Intersection可以获得两份数据的PSI值，根据Component文档，PSI是两份数据中交集程度的指标，FATE当然还提供了更多的函数，文档的代码只是举了一个PSI的例子。

Python

from pipeline.component import Reader, DataTransform, HeteroSecureBoost, Evaluation
from pipeline.interface import Data
# set pipeline operation party ids.
pipeline = PipeLine() \
        .set_initiator(role='guest', party_id=9999) \
        .set_roles(guest=9999, host=10000)

reader_0 = Reader(name="reader_0")
# bind reader operation tables
reader_0.get_party_instance(role='guest', party_id=9999).component_param(
    table={"name": "dummy", "namespace": "dummy"})

data_transform_0 = DataTransform(name="data_transform_0")
# bind transformation operation party
data_transform_0.get_party_instance(role='guest', party_id=9999).component_param(
    with_label=True)

# state a boost tree and evaluation
hetero_secureboost_0 = HeteroSecureBoost(name="hetero_secureboost_0",
                                         num_trees=5,
                                         bin_num=16,
                                         task_type="classification",
                                         objective_param={"objective": "cross_entropy"},
                                         encrypt_param={"method": "paillier"},
                                         tree_param={"max_depth": 3})
evaluation_0 = Evaluation(name="evaluation_0", eval_type="binary")


# add everyone into pipeline and ready for training
pipeline.add_component(reader_0)
pipeline.add_component(data_transform_0,data=Data(train_data=reader_0.output.data))
pipeline.add_component(hetero_secureboost_0, data=Data(train_data=data_transform_0.output.data))
pipeline.add_component(evaluation_0, data=Data(data=hetero_secureboost_0.output.data))
pipeline.compile()


# training
pipeline.fit()

# load another dataset via predict_pipeline
# predict_pipeline.predict()

# save results
pipeline.dump("pipeline_saved.pkl")