datax入门(datax的安装与简单使用)——01
- 1. 官网
- 2. 工具部署(通过下载DataX工具包)
- 2.1 下载、解压
- 2.2 配置
- 2.2.1 查看配置模版
- 2.2.2 根据模版配置json
- 2.2.3 启动DataX
- 3. datax的简单使用
- 3.1 mysql2stream
- 3.2 mysql2mysql
- 3.2.1 拼接where的
- 3.2.2 直接写查询的sql语句的
- 4. 解释
- 4.1 json中seeting说明
- 4.2 参数说明(以mysql为例)
1. 官网
- 地址如下:
https://github.com/alibaba/DataX/blob/master/userGuid.md. - 简介
2. 工具部署(通过下载DataX工具包)
2.1 下载、解压
- 因为官网很详细,这里就简单记录一下:
下载 datax.tar.gz ,然后解压,命令如下:tar -zxvf datax.tar.gz
- 查看解压后的目录
2.2 配置
2.2.1 查看配置模版
- 命令如下:
python datax.py -r streamreader -w streamwriter
2.2.2 根据模版配置json
- 创建
stream2stream.json
文件,如下:cd /Users/susu/study_down/about_datax/datax/job vim stream2stream.json
- stream2stream.json 内容如下:
#stream2stream.json { "job": { "content": [ { "reader": { "name": "streamreader", "parameter": { "sliceRecordCount": 10, "column": [ { "type": "long", "value": "10" }, { "type": "string", "value": "hello,你好,世界-DataX" } ] } }, "writer": { "name": "streamwriter", "parameter": { "encoding": "UTF-8", "print": true } } } ], "setting": { "speed": { "channel": 5 } } } }
2.2.3 启动DataX
- 启动命令,开始同步,如下:
python ../bin/datax.py stream2stream.json
- 同步结束,查看日志如下:
3. datax的简单使用
- 环境有限,下面就以mysql为主了,mysql_to_别的数据库,后续有机会再做介绍
3.1 mysql2stream
-
使用命令先查看模版:
python datax.py -r mysqlreader -w streamwriter
-
mysql2stream.json 如下:
{ "job": { "setting": { "speed": { "channel": 3 }, "errorLimit": { "record": 0, "percentage": 0.02 } }, "content": [{ "reader": { "name": "mysqlreader", "parameter": { "username": "root", "password": "susu@123", "column": [ "dog_num", "dog_name" ], "splitPk": "dog_num", "connection": [{ "table": [ "dog" ], "jdbcUrl": [ "jdbc:mysql://127.0.0.1:3306/datax_1" ] }] } }, "writer": { "name": "streamwriter", "parameter": { "print": true } } }] } }
-
效果如下:
python ../bin/datax.py mysql2stream.json
3.2 mysql2mysql
- 使用命令先查看模版:
python datax.py -r mysqlreader -w mysqlwriter
3.2.1 拼接where的
- mysql2mysql_where.json文件如下:
{ "job": { "content": [{ "reader": { "name": "mysqlreader", "parameter": { "column": ["*"], "connection": [{ "jdbcUrl": ["jdbc:mysql://127.0.0.1:3306/datax_1"], "table": ["dog"] }], "username": "root", "password": "susu@123", "where": "dog_num=1000003" } }, "writer": { "name": "mysqlwriter", "parameter": { "column": ["*"], "connection": [{ "jdbcUrl": "jdbc:mysql://127.0.0.1:3306/datax_2", "table": ["dog"] }], "username": "root", "password": "susu@123", "writeMode": "insert" } } }], "setting": { "speed": { "channel": "1" } } } }
- 效果如下:
python ../bin/datax.py mysql2mysql_where.json
3.2.2 直接写查询的sql语句的
- 使用querySql参数(注意querySql 和 SQL 只能保留一个),如下:
- mysql2mysql_query.json 文件代码如下:
{ "job": { "content": [{ "reader": { "name": "mysqlreader", "parameter": { "connection": [{ "jdbcUrl": ["jdbc:mysql://127.0.0.1:3306/datax_1"], "querySql": [ "select t.dog_num,t.dog_name,t.db_source from dog t where dog_num=1000004" ] }], "username": "root", "password": "susu@123" } }, "writer": { "name": "mysqlwriter", "parameter": { "column": ["*"], "connection": [{ "jdbcUrl": "jdbc:mysql://127.0.0.1:3306/datax_2", "table": ["dog"] }], "username": "root", "password": "susu@123", "writeMode": "insert" } } }], "setting": { "speed": { "channel": "1" } } } }
- 效果如下:
python ../bin/datax.py mysql2mysql_query.json
4. 解释
4.1 json中seeting说明
- 关于seeting
setting speed表示控制并发数 channel设置并发的数量 如果设置的print为true,则会打印slicRecordCount*channel次 如果是从mysql导入hdfs等其他操作,则会是真正代表并发数,而不是打印多少次
4.2 参数说明(以mysql为例)
- 其他的,从官网截图来看吧:
https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md.