DataX概述:
DataX 是阿里巴巴集团内被广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、HDFS、Hive、OceanBase、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。DataX采用了框架 + 插件 的模式,目前已开源,代码托管在github。
安装部署:
环境准备:
System Requirements:
Linux/本地也可
JDK(1.6以上,推荐1.6)
Python(推荐Python2.6.X)一定要为python2,因为后面执行datax.py的时候,里面的python的print会执行不了,导致运行不成功,会提示你print语法要加括号,python2中加不加都行 python3中必须要加,否则报语法错
Apache Maven 3.x (Compile DataX)
1:下载安装包并解压。
下载地址:https://github.com/alibaba/DataXhttps://github.com/alibaba/DataX
2:创建json文件,最好建在bin目录下。(mysql-mysql为例)
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": [
"StudentNo",
"LoginPwd",
"StudentName",
"Sex",
"GradeId",
"Phone",
"Address",
"BornDate",
"Email" //读取的列(示例,依据自己需求更改)
],
"connection": [
{
"jdbcUrl": ["jdbc:mysql://ip:端口/数据库?characterEncoding=utf8"],
"table": ["表名(读取的)"]
}
],
"password": "密码",
"username": "账号"
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"column": [
"StudentNo",
"LoginPwd",
"StudentName",
"Sex",
"GradeId",
"Phone",
"Address",
"BornDate",
"Email" //写入的列(示例,依据自己需求更改)
],
"connection": [
{
"jdbcUrl": "jdbc:mysql://ip:端口/数据库?characterEncoding=utf8",
"table": ["表名(写入的)"]
}
],
"password": "密码",
"username": "账号"
}
}
}
],
"setting": {
"speed": {
"channel": "1"
}
}
}
}
3:打开终端,进入解压的datax的bin目录,执行命令。
python datax.py E:\datax\datax.tar\datax\bin\mysqlTomysql.json //自己的json文件目录(在bin目录下运行此命令)