使用DataX同步hive数据到MySQL

news2024/11/25 3:06:43

目录

1、组件环境

2、安装datax

2.1、下载datax并解压

3、安装datax-web

3.0、下载datax-web的源码,进行编译

3.1、在MySQL中创建datax-web元数据

3.2、安装data-web 

3.2.1执行install.sh命令解压部署

3.2.1、手动修改 datax-admin配置文件

 3.2.2、手动修改 datax-executor配置文件

  3.2.3、替换datax下的python执行文件

 3.2.4、替换MySQLjar文件

4、创建MySQL和Hive数据

4.1、创建MySQL数据库

4.2、创建Hive数据库 

5、配置datax和datax-web

5.1、启动datax-web服务

5.2、页面访问及配置

5.2.1创建项目 

5.2.2、添加mysql和hive的数据源

5.2.3创建DataX任务模板

5.2.4、任务构建

5.2.5、查询创建的任务

 5.2.6、在Hive中执行数据插入

6、查看效果


1、组件环境

名称版本描述下载地址
hadoop3.4.0官方下载bin
hive3.1.3下载源码编译
mysql8.0.31
datax0.0.1-SNAPSHOThttp://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
datax-webdatax-web-2.1.2下载源码进行编译github.com
centoscentos7x86版本
java1.8
 

2、安装datax

2.1、下载datax并解压

http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz

tar -zxvf datax.tar.gz

解压到/cluster目录下 

执行测试命令

./bin/datax.py job/job.json

出现报错

  File "/cluster/datax/bin/./datax.py", line 114
    print readerRef
    ^^^^^^^^^^^^^^^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(...)?

说明:系统中安装python3和python2,默认是python3,由于datax bin中的datax默认支持python2,

所以需要指定python版本为python2,后面会将这个三个文件进行备份替换,datax-web中doc下提供了默认的支持。

  • Python (2.x) (支持Python3需要修改替换datax/bin下面的三个python文件,替换文件在doc/datax-web/datax-python3下) 必选,主要用于调度执行底层DataX的启动脚本,默认的方式是以Java子进程方式执行DataX,用户可以选择以Python方式来做自定义的改造
  • 参考地址:datax-web/doc/datax-web/datax-web-deploy.md at master · WeiYe-Jing/datax-web (github.com)

python2 bin/datax.py job/job.json  

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


2024-10-13 21:04:41.543 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2024-10-13 21:04:41.553 [main] INFO  Engine - the machine info  => 

    osInfo:    Oracle Corporation 1.8 25.40-b25
    jvmInfo:    Linux amd64 5.8.13-1.el7.elrepo.x86_64
    cpu num:    8

    totalPhysicalMemory:    -0.00G
    freePhysicalMemory:    -0.00G
    maxFileDescriptorCount:    -1
    currentOpenFileDescriptorCount:    -1

    GC Names    [PS MarkSweep, PS Scavenge]

    MEMORY_NAME                    | allocation_size                | init_size                      
    PS Eden Space                  | 256.00MB                       | 256.00MB                       
    Code Cache                     | 240.00MB                       | 2.44MB                         
    Compressed Class Space         | 1,024.00MB                     | 0.00MB                         
    PS Survivor Space              | 42.50MB                        | 42.50MB                        
    PS Old Gen                     | 683.00MB                       | 683.00MB                       
    Metaspace                      | -0.00MB                        | 0.00MB                         


2024-10-13 21:04:41.575 [main] INFO  Engine - 
{
    "content":[
        {
            "reader":{
                "name":"streamreader",
                "parameter":{
                    "column":[
                        {
                            "type":"string",
                            "value":"DataX"
                        },
                        {
                            "type":"long",
                            "value":19890604
                        },
                        {
                            "type":"date",
                            "value":"1989-06-04 00:00:00"
                        },
                        {
                            "type":"bool",
                            "value":true
                        },
                        {
                            "type":"bytes",
                            "value":"test"
                        }
                    ],
                    "sliceRecordCount":100000
                }
            },
            "writer":{
                "name":"streamwriter",
                "parameter":{
                    "encoding":"UTF-8",
                    "print":false
                }
            }
        }
    ],
    "setting":{
        "errorLimit":{
            "percentage":0.02,
            "record":0
        },
        "speed":{
            "byte":10485760
        }
    }
}

2024-10-13 21:04:41.599 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2024-10-13 21:04:41.601 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2024-10-13 21:04:41.601 [main] INFO  JobContainer - DataX jobContainer starts job.
2024-10-13 21:04:41.604 [main] INFO  JobContainer - Set jobId = 0
2024-10-13 21:04:41.623 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2024-10-13 21:04:41.624 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do prepare work .
2024-10-13 21:04:41.624 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do prepare work .
2024-10-13 21:04:41.624 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2024-10-13 21:04:41.625 [job-0] INFO  JobContainer - Job set Max-Byte-Speed to 10485760 bytes.
2024-10-13 21:04:41.626 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] splits to [1] tasks.
2024-10-13 21:04:41.627 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] splits to [1] tasks.
2024-10-13 21:04:41.649 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2024-10-13 21:04:41.654 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2024-10-13 21:04:41.657 [job-0] INFO  JobContainer - Running by standalone Mode.
2024-10-13 21:04:41.666 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2024-10-13 21:04:41.671 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2024-10-13 21:04:41.672 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2024-10-13 21:04:41.685 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2024-10-13 21:04:41.986 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[302]ms
2024-10-13 21:04:41.987 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2024-10-13 21:04:51.677 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.022s |  All Task WaitReaderTime 0.040s | Percentage 100.00%
2024-10-13 21:04:51.677 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2024-10-13 21:04:51.678 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do post work.
2024-10-13 21:04:51.678 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do post work.
2024-10-13 21:04:51.678 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2024-10-13 21:04:51.680 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /cluster/datax/hook
2024-10-13 21:04:51.682 [job-0] INFO  JobContainer - 
     [total cpu info] => 
        averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
        -1.00%                         | -1.00%                         | -1.00%
                        

     [total gc info] => 
         NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
         PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             
         PS Scavenge          | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             

2024-10-13 21:04:51.682 [job-0] INFO  JobContainer - PerfTrace not enable!
2024-10-13 21:04:51.683 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.022s |  All Task WaitReaderTime 0.040s | Percentage 100.00%
2024-10-13 21:04:51.684 [job-0] INFO  JobContainer - 
任务启动时刻                    : 2024-10-13 21:04:41
任务结束时刻                    : 2024-10-13 21:04:51
任务总计耗时                    :                 10s
任务平均流量                    :          253.91KB/s
记录写入速度                    :          10000rec/s
读出记录总数                    :              100000
读写失败总数                    :                   0

./bin/datax.py -r hdfsreader -w mysqlwriter 

3、安装datax-web

3.0、下载datax-web的源码,进行编译

git@github.com:WeiYe-Jing/datax-web.git

mvn -U clean package assembly:assembly -Dmaven.test.skip=true

  • 打包成功后的DataX包位于 {DataX_source_code_home}/target/datax/datax/ ,结构如下:

    $ cd  {DataX_source_code_home}
    $ ls ./target/datax/datax/
    bin		conf		job		lib		log		log_perf	plugin		script		tmp

3.1、在MySQL中创建datax-web元数据

mysql -u root -p

password:******

CREATE DATABASE dataxweb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

use dataxweb;

3.2、安装data-web 

3.2.1执行install.sh命令解压部署

在/cluster/datax-web-2.1.2/bin目录下

执行./install.sh

根据提示执行

执行完成后,会解压文件,并初始化数据库。

 

3.2.1、手动修改 datax-admin配置文件

/cluster/datax-web-2.1.2/modules/datax-admin/bin/env.properties

内容如下

# environment variables

JAVA_HOME=/java/jdk

WEB_LOG_PATH=/cluster/datax-web-2.1.2/modules/datax-admin/logs
WEB_CONF_PATH=/cluster/datax-web-2.1.2/modules/datax-admin/conf

DATA_PATH=/cluster/datax-web-2.1.2/modules/datax-admin/data
SERVER_PORT=6895

PID_FILE_PATH=/cluster/datax-web-2.1.2/modules/datax-admin/dataxadmin.pid


# mail account
MAIL_USERNAME="1024122298@qq.com"
MAIL_PASSWORD="*********************"


#debug
REMOTE_DEBUG_SWITCH=true
REMOTE_DEBUG_PORT=7223
 

 3.2.2、手动修改 datax-executor配置文件

/cluster/datax-web-2.1.2/modules/datax-executor/bin/env.properties

内容如下:主要是PYTHON_PATH=/cluster/datax/bin/datax.py

# environment variables

JAVA_HOME=/java/jdk

SERVICE_LOG_PATH=/cluster/datax-web-2.1.2/modules/datax-executor/logs
SERVICE_CONF_PATH=/cluster/datax-web-2.1.2/modules/datax-executor/conf
DATA_PATH=/cluster/datax-web-2.1.2/modules/datax-executor/data


## datax json文件存放位置
JSON_PATH=/cluster/datax-web-2.1.2/modules/datax-executor/json


## executor_port
EXECUTOR_PORT=9999


## 保持和datax-admin端口一致
DATAX_ADMIN_PORT=6895

## PYTHON脚本执行位置
PYTHON_PATH=/cluster/datax/bin/datax.py

## dataxweb 服务端口
SERVER_PORT=9504

PID_FILE_PATH=/cluster/datax-web-2.1.2/modules/datax-executor/service.pid


#debug 远程调试端口
REMOTE_DEBUG_SWITCH=true
REMOTE_DEBUG_PORT=7224
 

  3.2.3、替换datax下的python执行文件

  • Python (2.x) (支持Python3需要修改替换datax/bin下面的三个python文件,替换文件在doc/datax-web/datax-python3下) 必选,主要用于调度执行底层DataX的启动脚本,默认的方式是以Java子进程方式执行DataX,用户可以选择以Python方式来做自定义的改造
  • 这个单个文件可从datax-web源码目录中获取

 3.2.4、替换MySQLjar文件

datax下的mysql reader和writer的jar版本过低,使用的mysql8数据库,需要替换下jar文件

路径是:

/cluster/datax/plugin/writer/mysqlwriter/libs/mysql-connector-j-8.3.0.jar

/cluster/datax/plugin/reader/mysqlreader/libs/mysql-connector-j-8.3.0.jar

4、创建MySQL和Hive数据

4.1、创建MySQL数据库

=================================================================================================================================
2024年10月13日 星期日 第41周 00:07:20
mysql建表
-- m31094.mm definition
=================================================================================================================================
CREATE TABLE `mm` (
  `uuid` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL,
  `name` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `time` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `age` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `sex` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `job` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `address` text COLLATE utf8mb4_unicode_ci,
  PRIMARY KEY (`uuid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

不设置主键uuid

4.2、创建Hive数据库 

hive建表
create database m31094;
drop table m31094.mm;
CREATE TABLE m31094.mm (
 `uuid` STRING COMMENT '主键',
 `name` STRING COMMENT '姓名',
 `time` STRING COMMENT '时间',
 `age` STRING COMMENT '年龄',
 `sex` STRING COMMENT '性别',
 `job` STRING COMMENT '工作',
 `address` STRING COMMENT '地址'
) COMMENT '美女表'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

建表信息:

hive查询表描述信息

 DESCRIBE FORMATTED m31094.mm;
 

5、配置datax和datax-web

5.1、启动datax-web服务

启动命令

/cluster/datax-web-2.1.2/modules/datax-admin/bin/datax-admin.sh restart

tail -f /cluster/datax-web-2.1.2/modules/datax-admin/bin/console.out


/cluster/datax-web-2.1.2/modules/datax-executor/bin/datax-executor.sh restart
tail -f /cluster/datax-web-2.1.2/modules/datax-executor/bin/console.out

 查看进程的命令jps -l

kill进程的命令

sudo kill -9 $(ps -ef|grep datax|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')

 datax-executor需要启动成功,可在页面查看自动注册的信息。

5.2、页面访问及配置

http://ip:6895/index.html 

6895是自定义端口,根据实际设置

登录用户名、密码:admin/123456

5.2.1创建项目 

5.2.2、添加mysql和hive的数据源

MySQL

HIVE 

5.2.3创建DataX任务模板

5.2.4、任务构建

步骤一、配置输入

步骤二、配置输出

步骤三、 字段映射

步骤四、构建、选择任务模板,复制JSON、下一步操作

生成JSON模板内容 

{
  "job": {
    "setting": {
      "speed": {
        "channel": 3,
        "byte": 1048576
      },
      "errorLimit": {
        "record": 0,
        "percentage": 0.02
      }
    },
    "content": [
      {
        "reader": {
          "name": "hdfsreader",
          "parameter": {
            "path": "/cluster/hive/warehouse/m31094.db/mm",
            "defaultFS": "hdfs://10.7.215.33:8020",
            "fileType": "text",
            "fieldDelimiter": ",",
            "skipHeader": false,
            "column": [
              {
                "index": "0",
                "type": "string"
              },
              {
                "index": "1",
                "type": "string"
              },
              {
                "index": "2",
                "type": "string"
              },
              {
                "index": "3",
                "type": "string"
              },
              {
                "index": "4",
                "type": "string"
              },
              {
                "index": "5",
                "type": "string"
              },
              {
                "index": "6",
                "type": "string"
              }
            ]
          }
        },
        "writer": {
          "name": "mysqlwriter",
          "parameter": {
            "username": "yRjwDFuoPKlqya9h9H2Amg==",
            "password": "XCYVpFosvZBBWobFzmLWvA==",
            "column": [
              "`uuid`",
              "`name`",
              "`time`",
              "`age`",
              "`sex`",
              "`job`",
              "`address`"
            ],
            "connection": [
              {
                "table": [
                  "mm"
                ],
                "jdbcUrl": "jdbc:mysql://10.7.215.33:3306/m31094"
              }
            ]
          }
        }
      }
    ]
  }
}

5.2.5、查询创建的任务

 5.2.6、在Hive中执行数据插入

insert into m31094.mm values('9','hive数据使用datax同步到MySQL',from_unixtime(unix_timestamp()),'1000000000090101','北京','新疆','加油');

 控制台输出内容:

6、查看效果

日志执行情况

2024-10-13 21:30:00 [JobThread.run-130] <br>----------- datax-web job execute start -----------<br>----------- Param:
2024-10-13 21:30:00 [BuildCommand.buildDataXParam-100] ------------------Command parameters:
2024-10-13 21:30:00 [ExecutorJobHandler.execute-57] ------------------DataX process id: 95006
2024-10-13 21:30:00 [ProcessCallbackThread.callbackLog-186] <br>----------- datax-web job callback finish.
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.588 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.597 [main] INFO  Engine - the machine info  => 
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	osInfo:	Oracle Corporation 1.8 25.40-b25
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	jvmInfo:	Linux amd64 5.8.13-1.el7.elrepo.x86_64
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	cpu num:	8
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	totalPhysicalMemory:	-0.00G
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	freePhysicalMemory:	-0.00G
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	maxFileDescriptorCount:	-1
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	currentOpenFileDescriptorCount:	-1
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	GC Names	[PS MarkSweep, PS Scavenge]
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	MEMORY_NAME                    | allocation_size                | init_size                      
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	PS Eden Space                  | 256.00MB                       | 256.00MB                       
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	Code Cache                     | 240.00MB                       | 2.44MB                         
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	Compressed Class Space         | 1,024.00MB                     | 0.00MB                         
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	PS Survivor Space              | 42.50MB                        | 42.50MB                        
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	PS Old Gen                     | 683.00MB                       | 683.00MB                       
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	Metaspace                      | -0.00MB                        | 0.00MB                         
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.622 [main] INFO  Engine - 
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] {
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	"content":[
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 		{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 			"reader":{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 				"name":"hdfsreader",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 				"parameter":{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 					"column":[
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"index":"0",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"type":"string"
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						},
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"index":"1",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"type":"string"
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						},
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"index":"2",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"type":"string"
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						},
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"index":"3",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"type":"string"
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						},
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"index":"4",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"type":"string"
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						},
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"index":"5",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"type":"string"
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						},
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"index":"6",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"type":"string"
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						}
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 					],
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 					"defaultFS":"hdfs://10.7.215.33:8020",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 					"fieldDelimiter":",",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 					"fileType":"text",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 					"path":"hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 					"skipHeader":false
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 				}
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 			},
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 			"writer":{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 				"name":"mysqlwriter",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 				"parameter":{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 					"column":[
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						"`uuid`",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						"`name`",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						"`time`",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						"`age`",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						"`sex`",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						"`job`",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						"`address`"
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 					],
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 					"connection":[
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"jdbcUrl":"jdbc:mysql://10.7.215.33:3306/m31094",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"table":[
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 								"mm"
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							]
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						}
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 					],
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 					"password":"******",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 					"username":"root"
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 				}
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 			}
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 		}
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	],
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	"setting":{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 		"errorLimit":{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 			"percentage":0.02,
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 			"record":0
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 		},
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 		"speed":{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 			"byte":1048576,
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 			"channel":3
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 		}
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	}
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] }
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.645 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.647 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.648 [main] INFO  JobContainer - DataX jobContainer starts job.
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.650 [main] INFO  JobContainer - Set jobId = 0
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.667 [job-0] INFO  HdfsReader$Job - init() begin...
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.993 [job-0] INFO  HdfsReader$Job - hadoopConfig details:{"finalParameters":[]}
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.994 [job-0] INFO  HdfsReader$Job - init() ok and end...
2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:01.580 [job-0] INFO  OriginalConfPretreatmentUtil - table:[mm] all columns:[
2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] uuid,name,time,age,sex,job,address
2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] ].
2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:01.613 [job-0] INFO  OriginalConfPretreatmentUtil - Write data [
2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] INSERT INTO %s (`uuid`,`name`,`time`,`age`,`sex`,`job`,`address`) VALUES(?,?,?,?,?,?,?)
2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] ], which jdbcUrl like:[jdbc:mysql://10.7.215.33:3306/m31094?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]
2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:01.614 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:01.614 [job-0] INFO  JobContainer - DataX Reader.Job [hdfsreader] do prepare work .
2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:01.614 [job-0] INFO  HdfsReader$Job - prepare(), start to getAllFiles...
2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:01.614 [job-0] INFO  HdfsReader$Job - get HDFS all files in path = [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm]
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.699 [job-0] INFO  HdfsReader$Job - [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0]是[text]类型的文件, 将该文件加入source files列表
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.709 [job-0] INFO  HdfsReader$Job - [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_1]是[text]类型的文件, 将该文件加入source files列表
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.718 [job-0] INFO  HdfsReader$Job - [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_2]是[text]类型的文件, 将该文件加入source files列表
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.728 [job-0] INFO  HdfsReader$Job - [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_3]是[text]类型的文件, 将该文件加入source files列表
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.737 [job-0] INFO  HdfsReader$Job - [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_4]是[text]类型的文件, 将该文件加入source files列表
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.759 [job-0] INFO  HdfsReader$Job - [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_5]是[text]类型的文件, 将该文件加入source files列表
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.771 [job-0] INFO  HdfsReader$Job - [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_6]是[text]类型的文件, 将该文件加入source files列表
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.771 [job-0] INFO  HdfsReader$Job - 您即将读取的文件数为: [7], 列表为: [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_5,hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_4,hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_3,hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_2,hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_1,hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0,hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_6]
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.772 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.773 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.774 [job-0] INFO  JobContainer - Job set Max-Byte-Speed to 1048576 bytes.
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.775 [job-0] INFO  HdfsReader$Job - split() begin...
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.777 [job-0] INFO  JobContainer - DataX Reader.Job [hdfsreader] splits to [7] tasks.
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.779 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] splits to [7] tasks.
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.797 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.810 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.813 [job-0] INFO  JobContainer - Running by standalone Mode.
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.825 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [7] tasks.
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.830 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.831 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.845 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[2] attemptCount[1] is started
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.883 [0-0-2-reader] INFO  HdfsReader$Job - hadoopConfig details:{"finalParameters":["mapreduce.job.end-notification.max.retry.interval","mapreduce.job.end-notification.max.attempts"]}
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.885 [0-0-2-reader] INFO  Reader$Task - read start
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.886 [0-0-2-reader] INFO  Reader$Task - reading file : [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_3]
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.918 [0-0-2-reader] INFO  UnstructuredStorageReaderUtil - CsvReader使用默认值[{"captureRawRecord":true,"columnCount":0,"comment":"#","currentRecord":-1,"delimiter":",","escapeMode":1,"headerCount":0,"rawRecord":"","recordDelimiter":"\u0000","safetySwitch":false,"skipEmptyRecords":true,"textQualifier":"\"","trimWhitespace":true,"useComments":false,"useTextQualifier":true,"values":[]}],csvReaderConfig值为[null]
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.925 [0-0-2-reader] INFO  Reader$Task - end read source files...
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.247 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[2] is successed, used[403]ms
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.250 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[5] attemptCount[1] is started
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.286 [0-0-5-reader] INFO  HdfsReader$Job - hadoopConfig details:{"finalParameters":["mapreduce.job.end-notification.max.retry.interval","mapreduce.job.end-notification.max.attempts"]}
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.287 [0-0-5-reader] INFO  Reader$Task - read start
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.287 [0-0-5-reader] INFO  Reader$Task - reading file : [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0]
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.290 [0-0-5-reader] INFO  UnstructuredStorageReaderUtil - CsvReader使用默认值[{"captureRawRecord":true,"columnCount":0,"comment":"#","currentRecord":-1,"delimiter":",","escapeMode":1,"headerCount":0,"rawRecord":"","recordDelimiter":"\u0000","safetySwitch":false,"skipEmptyRecords":true,"textQualifier":"\"","trimWhitespace":true,"useComments":false,"useTextQualifier":true,"values":[]}],csvReaderConfig值为[null]
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.292 [0-0-5-reader] INFO  Reader$Task - end read source files...
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.351 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[5] is successed, used[101]ms
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.354 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[4] attemptCount[1] is started
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.379 [0-0-4-reader] INFO  HdfsReader$Job - hadoopConfig details:{"finalParameters":["mapreduce.job.end-notification.max.retry.interval","mapreduce.job.end-notification.max.attempts"]}
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.380 [0-0-4-reader] INFO  Reader$Task - read start
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.380 [0-0-4-reader] INFO  Reader$Task - reading file : [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_1]
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.384 [0-0-4-reader] INFO  UnstructuredStorageReaderUtil - CsvReader使用默认值[{"captureRawRecord":true,"columnCount":0,"comment":"#","currentRecord":-1,"delimiter":",","escapeMode":1,"headerCount":0,"rawRecord":"","recordDelimiter":"\u0000","safetySwitch":false,"skipEmptyRecords":true,"textQualifier":"\"","trimWhitespace":true,"useComments":false,"useTextQualifier":true,"values":[]}],csvReaderConfig值为[null]
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.386 [0-0-4-reader] INFO  Reader$Task - end read source files...
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.454 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[4] is successed, used[101]ms
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.457 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.486 [0-0-0-reader] INFO  HdfsReader$Job - hadoopConfig details:{"finalParameters":["mapreduce.job.end-notification.max.retry.interval","mapreduce.job.end-notification.max.attempts"]}
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.487 [0-0-0-reader] INFO  Reader$Task - read start
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.487 [0-0-0-reader] INFO  Reader$Task - reading file : [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_5]
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.489 [0-0-0-reader] INFO  UnstructuredStorageReaderUtil - CsvReader使用默认值[{"captureRawRecord":true,"columnCount":0,"comment":"#","currentRecord":-1,"delimiter":",","escapeMode":1,"headerCount":0,"rawRecord":"","recordDelimiter":"\u0000","safetySwitch":false,"skipEmptyRecords":true,"textQualifier":"\"","trimWhitespace":true,"useComments":false,"useTextQualifier":true,"values":[]}],csvReaderConfig值为[null]
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.491 [0-0-0-reader] INFO  Reader$Task - end read source files...
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.558 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[101]ms
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.561 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[1] attemptCount[1] is started
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.587 [0-0-1-reader] INFO  HdfsReader$Job - hadoopConfig details:{"finalParameters":["mapreduce.job.end-notification.max.retry.interval","mapreduce.job.end-notification.max.attempts"]}
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.588 [0-0-1-reader] INFO  Reader$Task - read start
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.588 [0-0-1-reader] INFO  Reader$Task - reading file : [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_4]
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.592 [0-0-1-reader] INFO  UnstructuredStorageReaderUtil - CsvReader使用默认值[{"captureRawRecord":true,"columnCount":0,"comment":"#","currentRecord":-1,"delimiter":",","escapeMode":1,"headerCount":0,"rawRecord":"","recordDelimiter":"\u0000","safetySwitch":false,"skipEmptyRecords":true,"textQualifier":"\"","trimWhitespace":true,"useComments":false,"useTextQualifier":true,"values":[]}],csvReaderConfig值为[null]
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.594 [0-0-1-reader] INFO  Reader$Task - end read source files...
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.662 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[1] is successed, used[101]ms
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.664 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[3] attemptCount[1] is started
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.691 [0-0-3-reader] INFO  HdfsReader$Job - hadoopConfig details:{"finalParameters":["mapreduce.job.end-notification.max.retry.interval","mapreduce.job.end-notification.max.attempts"]}
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.691 [0-0-3-reader] INFO  Reader$Task - read start
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.691 [0-0-3-reader] INFO  Reader$Task - reading file : [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_2]
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.694 [0-0-3-reader] INFO  UnstructuredStorageReaderUtil - CsvReader使用默认值[{"captureRawRecord":true,"columnCount":0,"comment":"#","currentRecord":-1,"delimiter":",","escapeMode":1,"headerCount":0,"rawRecord":"","recordDelimiter":"\u0000","safetySwitch":false,"skipEmptyRecords":true,"textQualifier":"\"","trimWhitespace":true,"useComments":false,"useTextQualifier":true,"values":[]}],csvReaderConfig值为[null]
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.696 [0-0-3-reader] INFO  Reader$Task - end read source files...
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.765 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[3] is successed, used[101]ms
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.768 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[6] attemptCount[1] is started
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.791 [0-0-6-reader] INFO  HdfsReader$Job - hadoopConfig details:{"finalParameters":["mapreduce.job.end-notification.max.retry.interval","mapreduce.job.end-notification.max.attempts"]}
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.791 [0-0-6-reader] INFO  Reader$Task - read start
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.791 [0-0-6-reader] INFO  Reader$Task - reading file : [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_6]
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.795 [0-0-6-reader] INFO  UnstructuredStorageReaderUtil - CsvReader使用默认值[{"captureRawRecord":true,"columnCount":0,"comment":"#","currentRecord":-1,"delimiter":",","escapeMode":1,"headerCount":0,"rawRecord":"","recordDelimiter":"\u0000","safetySwitch":false,"skipEmptyRecords":true,"textQualifier":"\"","trimWhitespace":true,"useComments":false,"useTextQualifier":true,"values":[]}],csvReaderConfig值为[null]
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.798 [0-0-6-reader] INFO  Reader$Task - end read source files...
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.868 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[6] is successed, used[100]ms
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.869 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.838 [job-0] INFO  StandAloneJobContainerCommunicator - Total 7 records, 282 bytes | Speed 28B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.028s | Percentage 100.00%
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.838 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.838 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.839 [job-0] INFO  JobContainer - DataX Reader.Job [hdfsreader] do post work.
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.839 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.840 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /cluster/datax/hook
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.841 [job-0] INFO  JobContainer - 
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 	 [total cpu info] => 
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 		averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 		-1.00%                         | -1.00%                         | -1.00%
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53]                         
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 	 [total gc info] => 
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 		 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 		 PS MarkSweep         | 1                  | 1                  | 1                  | 0.040s             | 0.040s             | 0.040s             
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 		 PS Scavenge          | 1                  | 1                  | 1                  | 0.022s             | 0.022s             | 0.022s             
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.841 [job-0] INFO  JobContainer - PerfTrace not enable!
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.842 [job-0] INFO  StandAloneJobContainerCommunicator - Total 7 records, 282 bytes | Speed 28B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.028s | Percentage 100.00%
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.843 [job-0] INFO  JobContainer - 
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 任务启动时刻                    : 2024-10-13 21:30:00
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 任务结束时刻                    : 2024-10-13 21:30:12
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 任务总计耗时                    :                 12s
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 任务平均流量                    :               28B/s
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 记录写入速度                    :              0rec/s
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 读出记录总数                    :                   7
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 读写失败总数                    :                   0
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 十月 13, 2024 9:30:01 下午 org.apache.hadoop.util.NativeCodeLoader <clinit>
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2024-10-13 21:30:12 [JobThread.run-165] <br>----------- datax-web job execute end(finish) -----------<br>----------- ReturnT:ReturnT [code=200, msg=LogStatistics{taskStartTime=2024-10-13 21:30:00, taskEndTime=2024-10-13 21:30:12, taskTotalTime=12s, taskAverageFlow=28B/s, taskRecordWritingSpeed=0rec/s, taskRecordReaderNum=7, taskRecordWriteFailNum=0}, content=null]
2024-10-13 21:30:12 [TriggerCallbackThread.callbackLog-186] <br>----------- datax-web job callback finish.

 查询MySQL数据库查询数据同步

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2212282.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

【python实操】python小程序之文件操作的输出指定格式数据以及异常捕获

引言 python小程序之文件操作的输出指定格式数据以及异常捕获 文章目录 引言一、文件操作之输出指定格式JSON1.1 题目1.2 代码1.3 代码解释1.3.1 总结 二、异常2.1 概念2.1.1 基本语法2.1.1.1 try...except2.1.1.2 try...except...else2.1.1.3 try...except...finally2.1.1.4 t…

量化选股:原理与实战指南(二)

🌟作者简介:热爱数据分析,学习Python、Stata、SPSS等统计语言的小高同学~🍊个人主页:小高要坚强的博客🍓当前专栏:《Python之量化交易》🍎本文内容:量化选股:原理与实战指南(二)🌸作者“三要”格言:要坚强、要努力、要学习 目录 引言 一、价值类因子简介 …

position定位静态定位/绝对定位/相对定位

1.静态定位static&#xff1a;按照标准流进行布局 <!DOCTYPE html> <html lang"en"> <head><meta charset"UTF-8"><meta name"viewport" content"widthdevice-width, initial-scale1.0"><title>D…

基于springboot Vue3的两种图形验证码工具——vue3-puzzle-vcode纯前端防人机图形滑动验证码和kaptcha图片文字验证码

一.vue3-puzzle-vcode Vue 纯前端的拼图人机验证、右滑拼图验证 官网&#xff1a; vue3-puzzle-vcode - npm (npmjs.com)https://www.npmjs.com/package/vue3-puzzle-vcode 1.1基本使用步骤 安装 npm install vue-puzzle-vcode --save 简单例子 点击开始验证按钮弹出验证弹…

linux从入门到精通-从基础学起,逐步提升,探索linux奥秘(十一)--rpm管理和计划任务

linux从入门到精通-从基础学起&#xff0c;逐步提升&#xff0c;探索linux奥秘&#xff08;十一&#xff09;–rpm管理和计划任务 一、rpm管理&#xff08;重点&#xff09; 1、rpm管理 作用&#xff1a; rpm的作用类似于windows上的电脑管家中“软件管理”、安全卫士里面“…

【机器学习】集成学习|Boosting|随机森林|Adaboost|GBDT梯度提升树|XGBoost 极限梯度提升树 及案例实现

文章目录 集成学习集成学习思想概述集成学习分类Bagging 思想Boosting思想Bagging 和 Boosting 的对比 随机森林算法随机森林实现步骤随机森林算法apiAPI 代码实现 Adaboost 算法实现步骤整体过程实现 算法推导Adaboost 案例 葡萄酒数据 GBDT (梯度提升树)提升树 BDT (Boosting…

WPF 中的 StackPanel 详解

Windows Presentation Foundation&#xff08;WPF&#xff09;是微软开发的一种用于创建桌面客户端应用程序的用户界面框架。WPF 提供了一套丰富的控件和布局能力&#xff0c;使得开发者可以轻松构建出功能强大、视觉优美的用户界面。在 WPF 的布局系统中&#xff0c;StackPane…

Git上传命令汇总

进入企业&#xff0c;每日需要上传执行用例记录到gitlab平台上&#xff0c;本文记录了常用git上传命令&#xff0c; 并用github演示。 1、本地建立分支&#xff0c;克隆远程仓库 在gitlab中&#xff0c;每个人需要创建自己的分支&#xff0c;一般以自己的名字命名&#xff0c;…

新品牌Sesame Street《芝麻街》商标版权双维权,尚未TRO

案件基本情况起诉时间&#xff1a;2024-10-8案件号&#xff1a;24-cv-09713品牌&#xff1a;Sesame Street原告&#xff1a;Sesame Workshop原告律所&#xff1a;TME起诉地&#xff1a;伊利诺伊州北部法院品牌介绍Sesame Street《芝麻街》是美国公共广播协会&#xff08;PBS&…

5个IO控制20个LED灯的方案详解

工程师们经常为了节省一两个IO口想各种方案想到抠脑壳&#xff0c;今天给大家整点活儿&#xff0c;介绍一种超级节省IO口的LED灯控制方案。 5个IO口控制20个LED灯&#xff0c;而且可以对每个LED灯实现单独控制。电路结构如下&#xff1a; 注意一下这种电路网络&#xff0c;其…

ctf.bugku-baby lfi

题目来源&#xff1a;baby lfi - Bugku CTF平台 访问页面&#xff0c; 翻译解析&#xff1a;百度翻译-您的超级翻译伙伴&#xff08;文本、文档翻译&#xff09; (baidu.com) LFI Warmups - 本地文件包含&#xff08;Local File Inclusion&#xff0c;简称LFI&#xff09; H…

【最新华为OD机试E卷-支持在线评测】喊7的次数重排(100分)多语言题解-(Python/C/JavaScript/Java/Cpp)

🍭 大家好这里是春秋招笔试突围 ,一枚热爱算法的程序员 💻 ACM金牌🏅️团队 | 大厂实习经历 | 多年算法竞赛经历 ✨ 本系列打算持续跟新华为OD-E/D卷的多语言AC题解 🧩 大部分包含 Python / C / Javascript / Java / Cpp 多语言代码 👏 感谢大家的订阅➕ 和 喜欢�…

Linux基础-进程的超详细讲解(1)_进程的概念与属性

个人主页&#xff1a;C忠实粉丝 欢迎 点赞&#x1f44d; 收藏✨ 留言✉ 加关注&#x1f493;本文由 C忠实粉丝 原创 Linux基础-进程的超详细讲解(1) 收录于专栏[Linux学习] 本专栏旨在分享学习Linux的一点学习笔记&#xff0c;欢迎大家在评论区交流讨论&#x1f48c; 目录 1. 进…

day-65 鸡蛋掉落-两枚鸡蛋

思路 动态规划&#xff1a;dp[i]表示i楼f确切的值的最小操作次数&#xff0c;对于上一次选择的楼层共有i-1种可能&#xff08;上一次从1楼,2楼…扔下&#xff09;&#xff0c;所以需要在i-1中可能中去最小值 解题过程 对于每一种可能&#xff0c;如dp[10]上一次从5楼扔下&…

ES-入门聚合查询

url 请求地址 http://192.168.1.108:9200/shopping/_search {"aggs": { //聚合操作"price_group":{ //名称,随意起名"terms":{ //分组"field": "price" //分组字段}}} } 查询出来的结果是 查询结果中价格的平均值 {&q…

KubeSphere部署mysql

演示示例使用的是3.4.1&#xff0c;各版本有名字差异 功能是一样的 由于mysql需要做数据持久化所以需要挂载数据 1.创建mysql基础配置 项目中-配置-配置字典 mysql-conf添加键值对 [client] default-character-setutf8mb4 [mysql] default-character-setutf8mb4 [mysqld] …

数据库设计与开发—初识SQLite与DbGate

一、SQLite与DbGate简介 &#xff08;一&#xff09;SQLite[1][3] SQLite 是一个部署最广泛、用 C 语言编写的数据库引擎&#xff0c;属于嵌入式数据库&#xff0c;其作为库被软件开发人员嵌入到应用程序中。 SQLite 的设计允许在不安装数据库管理系统或不需要数据库管理员的情…

QT QML 练习4

效果&#xff1a;鼠标按下Tab建可以选选择标签或者方块之间的切换 这段代码使用了 QtQuick 框架&#xff0c;创建了一个包含两个 Text 元素和两个嵌套 Rectangle 的用户界面。以下是对代码中涉及的主要知识点和实现细节的介绍&#xff1a; 知识点及代码细节介绍 导入 QtQuic…

SpringAOP学习文档

目录 一、概念二、示例代码三、切点1、execution2、within3、this4、target5、args6、annotation7、within8、target9、args10、组合切点表达式11、在Before注解中使用自定义的切入点表达式&#xff0c;以及切入点方法12、获取指定类型的真实对象 四、通知1、Around注解的通知方…

AI赋能,精准防控:AI智能分析网关V4人员徘徊算法的技术优势与应用场景

随着科技的飞速发展&#xff0c;视频监控系统在各个领域的应用越来越广泛&#xff0c;从公共安全到商业管理&#xff0c;再到交通监控等。AI智能分析网关V4作为新一代的视频分析设备&#xff0c;以其强大的智能分析能力和多样化的应用场景&#xff0c;成为市场关注的焦点。本文…