序言
这里使用的是master分支,因为官网上并没有release分支,所以先用master分支吧,可能会有问题cuiyaonan2000@163.com
参考资料:
- https://github.com/alibaba/DataX
- https://github.com/alibaba/DataX/blob/master/introduction.md --插件说明文档
源码打包
- 首先下载 GitHub - alibaba/DataX: DataX是阿里云DataWorks数据集成的开源版本。代码
- 首先如果是JDK17则会报错,后来选择JDK1.8
- Datax的运行依赖于python所以需要安装python2或者python3,centos7自带的有python2.7.5
- 然后打包生成可执行的文件 mvn -U clean package assembly:assembly -Dmaven.test.skip=true
- 成功后在根目录下的target中有相关的打包结果,如果包含所有Reader和Writer则打包会慢一点,但是还是有必要的
执行命令
在datax的bin目录下
- python datax.py -r {YOUR_READER} -w {YOUR_WRITER} 该命令是显示对应的json模板,也可以直接从source或者reader的文档中查看
-
python datax.py json文件 该命令就是执行对应的json文件
用例:Stream To Stream
{
"job": {
"content": [
{
"reader": {
"name": "streamreader",
"parameter": {
"sliceRecordCount": 10,
"column": [
{
"type": "long",
"value": "10"
},
{
"type": "string",
"value": "hello,你好,世界-DataX"
}
]
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"encoding": "UTF-8",
"print": true
}
}
}
],
"setting": {
"speed": {
"channel": 5
}
}
}
}
执行结果
MysqlReader To Stream
通过命令python datax.py -r mysqlreader -w streamwriter 查看相关的模板为
DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
Please refer to the mysqlreader document:
https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md
Please refer to the streamwriter document:
https://github.com/alibaba/DataX/blob/master/streamwriter/doc/streamwriter.md
Please save the following configuration as a json file and use
python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json
to run the job.
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": [],
"connection": [
{
"jdbcUrl": [],
"table": []
}
],
"password": "",
"username": "",
"where": ""
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"encoding": "",
"print": true
}
}
}
],
"setting": {
"speed": {
"channel": ""
}
}
}
}
然后编辑该json
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": ["Name","GroupName"],
"connection": [
{
"jdbcUrl": ["jdbc:mysql://192.168.137.2:3306/test"],
"table": ["employee"]
}
],
"password": "root",
"username": "root"
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"encoding": "",
"print": true
}
}
}
],
"setting": {
"speed": {
"channel": "1"
}
}
}
}