Apache DolphinScheduler-1.3.9源码分析(二)

news2024/11/25 20:29:14

引言

随着大数据的发展,任务调度系统成为了数据处理和管理中至关重要的部分。Apache DolphinScheduler 是一款优秀的开源分布式工作流调度平台,在大数据场景中得到广泛应用。

在本文中,我们将对 Apache DolphinScheduler 1.3.9 版本的源码进行深入分析,主要分析一下Master和Worker的交互设计。

感兴趣的朋友也可以回顾我们上一篇文章:Apache DolphinScheduler-1.3.9源码分析(一)

Worker配置文件

# worker listener port
worker.listen.port=1234

# worker execute thread number to limit task instances in parallel
# worker可并行的任务数限制
worker.exec.threads=100

# worker heartbeat interval, the unit is second
# worker发送心跳间隔
worker.heartbeat.interval=10

# worker max cpuload avg, only higher than the system cpu load average, worker server can be dispatched tasks. default value -1: the number of cpu cores * 2
# worker最大cpu平均负载,只有系统cpu平均负载低于该值,才能执行任务
# 默认值为-1,则最大cpu平均负载=系统cpu核数 * 2
worker.max.cpuload.avg=-1

# worker reserved memory, only lower than system available memory, worker server can be dispatched tasks. default value 0.3, the unit is G
# worker的预留内存,只有当系统可用内存大于等于该值,才能执行任务,单位为GB
# 默认0.3G
worker.reserved.memory=0.3

# default worker groups separated by comma, like 'worker.groups=default,test'
# 工作组名称,多个用,隔开
worker.groups=default

WorkerServer启动

public void run() {
    // init remoting server
    NettyServerConfig serverConfig = new NettyServerConfig();
    serverConfig.setListenPort(workerConfig.getListenPort());
    this.nettyRemotingServer = new NettyRemotingServer(serverConfig);
    this.nettyRemotingServer.registerProcessor(CommandType.TASK_EXECUTE_REQUEST, new TaskExecuteProcessor());
    this.nettyRemotingServer.registerProcessor(CommandType.TASK_KILL_REQUEST, new TaskKillProcessor());
    this.nettyRemotingServer.registerProcessor(CommandType.DB_TASK_ACK, new DBTaskAckProcessor());
    this.nettyRemotingServer.registerProcessor(CommandType.DB_TASK_RESPONSE, new DBTaskResponseProcessor());
    this.nettyRemotingServer.start();

    // worker registry
    try {
        this.workerRegistry.registry();
        this.workerRegistry.getZookeeperRegistryCenter().setStoppable(this);
        Set<String> workerZkPaths = this.workerRegistry.getWorkerZkPaths();
        this.workerRegistry.getZookeeperRegistryCenter().getRegisterOperator().handleDeadServer(workerZkPaths, ZKNodeType.WORKER, Constants.DELETE_ZK_OP);
    } catch (Exception e) {
        logger.error(e.getMessage(), e);
        throw new RuntimeException(e);
    }

    // retry report task status
    this.retryReportTaskStatusThread.start();

    /**
     * register hooks, which are called before the process exits
     */
    Runtime.getRuntime().addShutdownHook(new Thread(() -> {
        if (Stopper.isRunning()) {
            close("shutdownHook");
        }
    }));
}
注册四个Command:
  1. TASK_EXECUTE_REQUEST:task执行请求
  2. TASK_KILL_REQUEST:task停止请求
  3. DB_TASK_ACK:Worker接受到Master的调度请求,回应master
  4. DB_TASK_RESPONSE:
  • 注册WorkerServer到Zookeeper,并发送心跳
  • 报告Task执行状态

RetryReportTaskStatusThread

这是一个兜底机制,主要负责定时轮询向Master汇报任务的状态,直到Master回复状态的ACK,避免任务状态丢失;

每隔5分钟,检查一下responceCache中的ACK Cache和Response Cache是否为空,如果不为空则向Master发送ack_commandresponse command请求。

public void run() {
    ResponceCache responceCache = ResponceCache.get();

    while (Stopper.isRunning()){

        // sleep 5 minutes
        ThreadUtils.sleep(RETRY_REPORT_TASK_STATUS_INTERVAL);

        try {
            if (!responceCache.getAckCache().isEmpty()){
                Map<Integer,Command> ackCache =  responceCache.getAckCache();
                for (Map.Entry<Integer, Command> entry : ackCache.entrySet()){
                    Integer taskInstanceId = entry.getKey();
                    Command ackCommand = entry.getValue();
                    taskCallbackService.sendAck(taskInstanceId,ackCommand);
                }
            }

            if (!responceCache.getResponseCache().isEmpty()){
                Map<Integer,Command> responseCache =  responceCache.getResponseCache();
                for (Map.Entry<Integer, Command> entry : responseCache.entrySet()){
                    Integer taskInstanceId = entry.getKey();
                    Command responseCommand = entry.getValue();
                    taskCallbackService.sendResult(taskInstanceId,responseCommand);
                }
            }
        }catch (Exception e){
            logger.warn("retry report task status error", e);
        }
    }
}

Master与Worker的交互设计

Apache DolphinScheduler Master和Worker模块是两个独立的JVM进程,可以部署在不同的服务器上,Master与Worker的通信都是通过Netty实现RPC交互的,一共用到7种处理器。

模块处理器作用
mastermasterTaskResponseProcessor处理TaskExecuteResponseCommand消息,将消息添加到TaskResponseService的任务响应队列中
mastermasterTaskAckProcessor处理TaskExecuteAckCommand消息,将消息添加到TaskResponseService的任务响应队列中
mastermasterTaskKillResponseProcessor处理TaskKillResponseCommand消息,并在日志中打印消息内容
workerworkerTaskExecuteProcessor处理TaskExecuteRequestCommand消息,并发送TaskExecuteAckCommand到master,提交任务执行
workerworkerTaskKillProcessor处理TaskKillRequestCommand消息,调用kill -9 pid杀死任务对应的进程,并向master发送TaskKillResponseCommand消息
workerworkerDBTaskAckProcessor处理DBTaskAckCommand消息,针对执行成功的任务,从ResponseCache中删除
workerworkerDBTaskResponseProcessor处理DBTaskResponseCommand消息,针对执行成功的任务,从ResponseCache中删除

分发任务如何交互

master#TaskPriorityQueueConsumer

Master任务里有一个TaskPriorityQueueConsumer,会从TaskPriorityQueue里每次取3个Task分发给Worker执行,这里会创建TaskExecuteRequestCommand

TaskPriorityQueueConsumer#run()

@Override
public void run() {
    List<TaskPriority> failedDispatchTasks = new ArrayList<>();
    while (Stopper.isRunning()){
        try {
            // 每一批次分发任务数量,master.dispatch.task.num = 3
            int fetchTaskNum = masterConfig.getMasterDispatchTaskNumber();
            failedDispatchTasks.clear();
            for(int i = 0; i < fetchTaskNum; i++){
                if(taskPriorityQueue.size() <= 0){
                    Thread.sleep(Constants.SLEEP_TIME_MILLIS);
                    continue;
                }
                // if not task , blocking here
                // 从队列里面获取task
                TaskPriority taskPriority = taskPriorityQueue.take();
                // 分发给worker执行
                boolean dispatchResult = dispatch(taskPriority);
                if(!dispatchResult){
                    failedDispatchTasks.add(taskPriority);
                }
            }
            if (!failedDispatchTasks.isEmpty()) {
                // 分发失败的任务,需要重新加入队列中,等待重新分发
                for (TaskPriority dispatchFailedTask : failedDispatchTasks) {
                    taskPriorityQueue.put(dispatchFailedTask);
                }
                // If there are tasks in a cycle that cannot find the worker group,
                // sleep for 1 second
                if (taskPriorityQueue.size() <= failedDispatchTasks.size()) {
                    TimeUnit.MILLISECONDS.sleep(Constants.SLEEP_TIME_MILLIS);
                }
            }
        }catch (Exception e){
            logger.error("dispatcher task error",e);
        }
    }
}

dispatcher

/**
 * dispatch task
 *
 * @param taskPriority taskPriority
 * @return result
 */
protected boolean dispatch(TaskPriority taskPriority) {
    boolean result = false;
    try {
        int taskInstanceId = taskPriority.getTaskId();
        TaskExecutionContext context = getTaskExecutionContext(taskInstanceId);
        // 这里创建TaskExecuteRequestCommand
        ExecutionContext executionContext = new ExecutionContext(context.toCommand(), ExecutorType.WORKER, context.getWorkerGroup());

        if (taskInstanceIsFinalState(taskInstanceId)){
            // when task finish, ignore this task, there is no need to dispatch anymore
            return true;
        }else{
            // 分发任务
            // 分发算法支持:低负载优先算法,随机算法, 轮询算法。
            result = dispatcher.dispatch(executionContext);
        }
    } catch (ExecuteException e) {
        logger.error("dispatch error: {}",e.getMessage());
    }
    return result;
}

TaskExecutionContext

// 摘录自org.apache.dolphinscheduler.server.entity.TaskExecutionContext#toCommand
public Command toCommand(){
    TaskExecuteRequestCommand requestCommand = new TaskExecuteRequestCommand();
    requestCommand.setTaskExecutionContext(FastJsonSerializer.serializeToString(this));
    return requestCommand.convert2Command();
}

分发算法实现

随机算法

public class RandomSelector<T> implements Selector<T> {

    private final Random random = new Random();
    public T select(final Collection<T> source) {

        if (source == null || source.size() == 0) {
            throw new IllegalArgumentException("Empty source.");
        }

        if (source.size() == 1) {
            return (T) source.toArray()[0];
        }

        int size = source.size();
        int randomIndex = random.nextInt(size);

        return (T) source.toArray()[randomIndex];
    }

}

轮询算法

public class RoundRobinSelector<T> implements Selector<T> {

    private final AtomicInteger index = new AtomicInteger(0);

    public T select(Collection<T> source) {
        if (source == null || source.size() == 0) {
            throw new IllegalArgumentException("Empty source.");
        }
        if (source.size() == 1) {
            return (T)source.toArray()[0];
        }

        int size = source.size();
        return (T) source.toArray()[index.getAndIncrement() % size];
    }
}

低负载优先算法

public class LowerWeightRoundRobin implements Selector<HostWeight>{
    public HostWeight select(Collection<HostWeight> sources){
        int totalWeight = 0;
        int lowWeight = 0;
        HostWeight lowerNode = null;
        for (HostWeight hostWeight : sources) {
            totalWeight += hostWeight.getWeight();
            hostWeight.setCurrentWeight(hostWeight.getCurrentWeight() + hostWeight.getWeight());
            if (lowerNode == null || lowWeight > hostWeight.getCurrentWeight() ) {
                lowerNode = hostWeight;
                lowWeight = hostWeight.getCurrentWeight();
            }
        }
        lowerNode.setCurrentWeight(lowerNode.getCurrentWeight() + totalWeight);
        return lowerNode;

    }
}

TaskExecuteRequestCommand

TaskExecuteProcessor
构造方法
public TaskExecuteProcessor() {
    this.taskCallbackService = SpringApplicationContext.getBean(TaskCallbackService.class);
    this.workerConfig = SpringApplicationContext.getBean(WorkerConfig.class);
    // worker.exec.threads,默认100
    this.workerExecService = ThreadUtils.newDaemonFixedThreadExecutor("Worker-Execute-Thread", workerConfig.getWorkerExecThreads());
    this.taskExecutionContextCacheManager = SpringApplicationContext.getBean(TaskExecutionContextCacheManagerImpl.class);
}
process()方法
public void process(Channel channel, Command command) {
    Preconditions.checkArgument(CommandType.TASK_EXECUTE_REQUEST == command.getType(),
                                String.format("invalid command type : %s", command.getType()));

    // 序列化TaskExecuteRequestCommand
    TaskExecuteRequestCommand taskRequestCommand = FastJsonSerializer.deserialize(
        command.getBody(), TaskExecuteRequestCommand.class);

    logger.info("received command : {}", taskRequestCommand);

    if (taskRequestCommand == null) {
        logger.error("task execute request command is null");
        return;
    }

    String contextJson = taskRequestCommand.getTaskExecutionContext();
    TaskExecutionContext taskExecutionContext = JSONObject.parseObject(contextJson, TaskExecutionContext.class);

    if (taskExecutionContext == null) {
        logger.error("task execution context is null");
        return;
    }
    // 存入taskExecutionContextCacheManager
    setTaskCache(taskExecutionContext);
    // 创建任务日志
    Logger taskLogger = LoggerFactory.getLogger(LoggerUtils.buildTaskId(LoggerUtils.TASK_LOGGER_INFO_PREFIX,
                                                                        taskExecutionContext.getProcessDefineId(),
                                                                        taskExecutionContext.getProcessInstanceId(),
                                                                        taskExecutionContext.getTaskInstanceId()));

    taskExecutionContext.setHost(NetUtils.getAddr(workerConfig.getListenPort()));
    taskExecutionContext.setStartTime(new Date());
    taskExecutionContext.setLogPath(getTaskLogPath(taskExecutionContext));

    // local execute path
    String execLocalPath = getExecLocalPath(taskExecutionContext);
    logger.info("task instance local execute path : {}", execLocalPath);
    taskExecutionContext.setExecutePath(execLocalPath);

    // ThreadLocal存储任务日志
    FileUtils.taskLoggerThreadLocal.set(taskLogger);
    try {
        // 创建执行
        FileUtils.createWorkDirAndUserIfAbsent(execLocalPath, taskExecutionContext.getTenantCode());
    } catch (Throwable ex) {
        String errorLog = String.format("create execLocalPath : %s", execLocalPath);
        LoggerUtils.logError(Optional.ofNullable(logger), errorLog, ex);
        LoggerUtils.logError(Optional.ofNullable(taskLogger), errorLog, ex);
        taskExecutionContextCacheManager.removeByTaskInstanceId(taskExecutionContext.getTaskInstanceId());
    }
    FileUtils.taskLoggerThreadLocal.remove();

    taskCallbackService.addRemoteChannel(taskExecutionContext.getTaskInstanceId(),
                                         new NettyRemoteChannel(channel, command.getOpaque()));

    // 向master发送TaskExecuteAckCommand
    this.doAck(taskExecutionContext);

    // submit task
    workerExecService.submit(new TaskExecuteThread(taskExecutionContext, taskCallbackService, taskLogger));
}

private void doAck(TaskExecutionContext taskExecutionContext){
    // tell master that task is in executing
    TaskExecuteAckCommand ackCommand = buildAckCommand(taskExecutionContext);
    ResponceCache.get().cache(taskExecutionContext.getTaskInstanceId(),ackCommand.convert2Command(),Event.ACK);
    taskCallbackService.sendAck(taskExecutionContext.getTaskInstanceId(), ackCommand.convert2Command());
}

TaskExecuteThread

构造方法
public TaskExecuteThread(TaskExecutionContext taskExecutionContext
                         , TaskCallbackService taskCallbackService
                         , Logger taskLogger) {
    this.taskExecutionContext = taskExecutionContext;
    this.taskCallbackService = taskCallbackService;
    this.taskExecutionContextCacheManager = SpringApplicationContext.getBean(TaskExecutionContextCacheManagerImpl.class);
    this.taskLogger = taskLogger;
}
运行方法
public void run() {

    TaskExecuteResponseCommand responseCommand = new TaskExecuteResponseCommand(taskExecutionContext.getTaskInstanceId());
    try {
        logger.info("script path : {}", taskExecutionContext.getExecutePath());
        // task node
        TaskNode taskNode = JSONObject.parseObject(taskExecutionContext.getTaskJson(), TaskNode.class);

        // copy hdfs/minio file to local
        // 下载需要的资源,例如Spark/Flink jar,udf等
        downloadResource(taskExecutionContext.getExecutePath(),
                         taskExecutionContext.getResources(),
                         logger);

        taskExecutionContext.setTaskParams(taskNode.getParams());
        taskExecutionContext.setEnvFile(CommonUtils.getSystemEnvPath());
        taskExecutionContext.setDefinedParams(getGlobalParamsMap());

        // set task timeout
        setTaskTimeout(taskExecutionContext, taskNode);

        taskExecutionContext.setTaskAppId(String.format("%s_%s_%s",
                                                        taskExecutionContext.getProcessDefineId(),
                                                        taskExecutionContext.getProcessInstanceId(),
                                                        taskExecutionContext.getTaskInstanceId()));

        // 创建任务
        task = TaskManager.newTask(taskExecutionContext, taskLogger);

        // 初始化任务
        task.init();
        // 构建任务所需要的参数
        preBuildBusinessParams();
        // 执行任务
        task.handle();

        // 任务执行完成后的动作
        task.after();
        responseCommand.setStatus(task.getExitStatus().getCode());
        responseCommand.setEndTime(new Date());
        responseCommand.setProcessId(task.getProcessId());
        responseCommand.setAppIds(task.getAppIds());
        logger.info("task instance id : {},task final status : {}", taskExecutionContext.getTaskInstanceId(), task.getExitStatus());
    } catch (Exception e) {
        logger.error("task scheduler failure", e);
        // 如果出现异常,kill task
        kill();
        responseCommand.setStatus(ExecutionStatus.FAILURE.getCode());
        responseCommand.setEndTime(new Date());
        responseCommand.setProcessId(task.getProcessId());
        responseCommand.setAppIds(task.getAppIds());
    } finally {
         // 从cache中去除任务执行上下文。
        taskExecutionContextCacheManager.removeByTaskInstanceId(taskExecutionContext.getTaskInstanceId());
        // 缓存responseCommand
        ResponceCache.get().cache(taskExecutionContext.getTaskInstanceId(), responseCommand.convert2Command(), Event.RESULT);
        // 向master发送ResponseCommand
        taskCallbackService.sendResult(taskExecutionContext.getTaskInstanceId(), responseCommand.convert2Command());
        // 清除task执行路径
        clearTaskExecPath();
    }
}

master#TaskResponseService

Worker在正常执行分发任务的时候,会向Master发送ACK Command 和 Response Command。

在Master中,则由TaskAckProcessorTaskResponseProcessor进行处理。

TaskAckProcessor

public void process(Channel channel, Command command) {
    Preconditions.checkArgument(CommandType.TASK_EXECUTE_ACK == command.getType(), String.format("invalid command type : %s", command.getType()));
    TaskExecuteAckCommand taskAckCommand = FastJsonSerializer.deserialize(command.getBody(), TaskExecuteAckCommand.class);
    logger.info("taskAckCommand : {}", taskAckCommand);

    // 添加缓存
    taskInstanceCacheManager.cacheTaskInstance(taskAckCommand);

    String workerAddress = ChannelUtils.toAddress(channel).getAddress();

    ExecutionStatus ackStatus = ExecutionStatus.of(taskAckCommand.getStatus());

    // TaskResponseEvent
    TaskResponseEvent taskResponseEvent = TaskResponseEvent.newAck(ackStatus,
            taskAckCommand.getStartTime(),
            workerAddress,
            taskAckCommand.getExecutePath(),
            taskAckCommand.getLogPath(),
            taskAckCommand.getTaskInstanceId(),
            channel);

    // 主要处理逻辑
    taskResponseService.addResponse(taskResponseEvent);
}

TaskResponseProcessor

public void process(Channel channel, Command command) {
    Preconditions.checkArgument(CommandType.TASK_EXECUTE_RESPONSE == command.getType(), String.format("invalid command type : %s", command.getType()));

    TaskExecuteResponseCommand responseCommand = FastJsonSerializer.deserialize(command.getBody(), TaskExecuteResponseCommand.class);
    logger.info("received command : {}", responseCommand);

    // 缓存
    taskInstanceCacheManager.cacheTaskInstance(responseCommand);

    // TaskResponseEvent
    TaskResponseEvent taskResponseEvent = TaskResponseEvent.newResult(ExecutionStatus.of(responseCommand.getStatus()),
            responseCommand.getEndTime(),
            responseCommand.getProcessId(),
            responseCommand.getAppIds(),
            responseCommand.getTaskInstanceId(),
            channel);
    // 主要处理逻辑
    taskResponseService.addResponse(taskResponseEvent);
}

TaskResponseService

通过TaskResponseProcessorTaskAckProcessor发现,其主要逻辑都在TaskResponseService类中,而TaskResponseService中处理事件,是通过TaskResponseWorker线程实现的。

// TaskResponseEvent队列是阻塞队列
private final BlockingQueue<TaskResponseEvent> eventQueue = new LinkedBlockingQueue<>(5000);


class TaskResponseWorker extends Thread {

        @Override
        public void run() {

            while (Stopper.isRunning()){
                try {
                    // 如果没有任务事件,则会阻塞在这里
                    TaskResponseEvent taskResponseEvent = eventQueue.take();
                    // 任务实例状态持久化到数据库
                    persist(taskResponseEvent);
                } catch (InterruptedException e){
                    break;
                } catch (Exception e){
                    logger.error("persist task error",e);
                }
            }
            logger.info("TaskResponseWorker stopped");
        }
    }

    /**
     * persist  taskResponseEvent
     * @param taskResponseEvent taskResponseEvent
     */
    private void persist(TaskResponseEvent taskResponseEvent){
        Event event = taskResponseEvent.getEvent();
        Channel channel = taskResponseEvent.getChannel();

        switch (event){
            case ACK:
                try {
                    TaskInstance taskInstance = processService.findTaskInstanceById(taskResponseEvent.getTaskInstanceId());
                    if (taskInstance != null) {
                        ExecutionStatus status = taskInstance.getState().typeIsFinished() ? taskInstance.getState() : taskResponseEvent.getState();
                        processService.changeTaskState(status,
                            taskResponseEvent.getStartTime(),
                            taskResponseEvent.getWorkerAddress(),
                            taskResponseEvent.getExecutePath(),
                            taskResponseEvent.getLogPath(),
                            taskResponseEvent.getTaskInstanceId());
                    }
                    // 向worker发送DB_TASK_ACK请求
                    DBTaskAckCommand taskAckCommand = new DBTaskAckCommand(ExecutionStatus.SUCCESS.getCode(), taskResponseEvent.getTaskInstanceId());
                    channel.writeAndFlush(taskAckCommand.convert2Command());
                }catch (Exception e){
                    logger.error("worker ack master error",e);
                    DBTaskAckCommand taskAckCommand = new DBTaskAckCommand(ExecutionStatus.FAILURE.getCode(),-1);
                    channel.writeAndFlush(taskAckCommand.convert2Command());
                }
                break;
            case RESULT:
                try {
                    TaskInstance taskInstance = processService.findTaskInstanceById(taskResponseEvent.getTaskInstanceId());
                    if (taskInstance != null){
                        processService.changeTaskState(taskResponseEvent.getState(),
                                taskResponseEvent.getEndTime(),
                                taskResponseEvent.getProcessId(),
                                taskResponseEvent.getAppIds(),
                                taskResponseEvent.getTaskInstanceId());
                    }
                    // 向worker发送DB_TASK_RESPONSE请求
                    DBTaskResponseCommand taskResponseCommand = new DBTaskResponseCommand(ExecutionStatus.SUCCESS.getCode(),taskResponseEvent.getTaskInstanceId());
                    channel.writeAndFlush(taskResponseCommand.convert2Command());
                }catch (Exception e){
                    logger.error("worker response master error",e);
                    DBTaskResponseCommand taskResponseCommand = new DBTaskResponseCommand(ExecutionStatus.FAILURE.getCode(),-1);
                    channel.writeAndFlush(taskResponseCommand.convert2Command());
                }
                break;
            default:
                throw new IllegalArgumentException("invalid event type : " + event);
        }
    }

Worker#DBTaskAckProcessor和DBTaskResponseProcessor

Worker接受到Master的db_task_ack commanddb_task_response command,对应的处理器为DBTaskAckProcessorDBTaskResponseProcessor,其逻辑都是从ResponceCache删除对应的task instance command

DBTaskAckProcessor

public void process(Channel channel, Command command) {
    Preconditions.checkArgument(CommandType.DB_TASK_ACK == command.getType(),
            String.format("invalid command type : %s", command.getType()));

    DBTaskAckCommand taskAckCommand = FastJsonSerializer.deserialize(
            command.getBody(), DBTaskAckCommand.class);

    if (taskAckCommand == null){
        return;
    }

    if (taskAckCommand.getStatus() == ExecutionStatus.SUCCESS.getCode()){
        ResponceCache.get().removeAckCache(taskAckCommand.getTaskInstanceId());
    }
}

DBTaskResponseProcessor

public void process(Channel channel, Command command) {
    Preconditions.checkArgument(CommandType.DB_TASK_RESPONSE == command.getType(),
                                String.format("invalid command type : %s", command.getType()));

    DBTaskResponseCommand taskResponseCommand = FastJsonSerializer.deserialize(
        command.getBody(), DBTaskResponseCommand.class);

    if (taskResponseCommand == null){
        return;
    }

    if (taskResponseCommand.getStatus() == ExecutionStatus.SUCCESS.getCode()){
        ResponceCache.get().removeResponseCache(taskResponseCommand.getTaskInstanceId());
    }
}

停止任务如何交互

MasterTaskExecThread#waitTaskQuit

public Boolean waitTaskQuit(){
    // query new state
    taskInstance = processService.findTaskInstanceById(taskInstance.getId());

    while (Stopper.isRunning()){
        try {
            // 省略代码...

            // task instance add queue , waiting worker to kill
            // 如果master接受到cancal请求,或者工作流状态为准备停止的状态
            // master会给worker发送kill request command请求
            if(this.cancel || this.processInstance.getState() == ExecutionStatus.READY_STOP){
                cancelTaskInstance();
            }

            // 省略代码...
        } catch (Exception e) {
            // 省略代码...
        }
    }
    return true;
}

private void cancelTaskInstance() throws Exception{
    if(alreadyKilled){
        return;
    }
    alreadyKilled = true;
    taskInstance = processService.findTaskInstanceById(taskInstance.getId());
    if(StringUtils.isBlank(taskInstance.getHost())){
        taskInstance.setState(ExecutionStatus.KILL);
        taskInstance.setEndTime(new Date());
        processService.updateTaskInstance(taskInstance);
        return;
    }

    // 构造TaskKillRequestCommand
    TaskKillRequestCommand killCommand = new TaskKillRequestCommand();
    killCommand.setTaskInstanceId(taskInstance.getId());

    ExecutionContext executionContext = new ExecutionContext(killCommand.convert2Command(), ExecutorType.WORKER);

    Host host = Host.of(taskInstance.getHost());
    executionContext.setHost(host);

    nettyExecutorManager.executeDirectly(executionContext);

    logger.info("master kill taskInstance name :{} taskInstance id:{}",
            taskInstance.getName(), taskInstance.getId() );
}

Worker#TaskKillProcessor

TaskKillProcessor用于处理Master发送的Kill request command

public void process(Channel channel, Command command) {
    Preconditions.checkArgument(CommandType.TASK_KILL_REQUEST == command.getType(), String.format("invalid command type : %s", command.getType()));
    TaskKillRequestCommand killCommand = FastJsonSerializer.deserialize(command.getBody(), TaskKillRequestCommand.class);
    logger.info("received kill command : {}", killCommand);

    Pair<Boolean, List<String>> result = doKill(killCommand);

    taskCallbackService.addRemoteChannel(killCommand.getTaskInstanceId(),
            new NettyRemoteChannel(channel, command.getOpaque()));

    // 向master发送kill response command
    TaskKillResponseCommand taskKillResponseCommand = buildKillTaskResponseCommand(killCommand,result);
    taskCallbackService.sendResult(taskKillResponseCommand.getTaskInstanceId(), taskKillResponseCommand.convert2Command());
    taskExecutionContextCacheManager.removeByTaskInstanceId(taskKillResponseCommand.getTaskInstanceId());
}


private Pair<Boolean, List<String>> doKill(TaskKillRequestCommand killCommand){
    boolean processFlag = true;
    List<String> appIds = Collections.emptyList();
    int taskInstanceId = killCommand.getTaskInstanceId();
    TaskExecutionContext taskExecutionContext = taskExecutionContextCacheManager.getByTaskInstanceId(taskInstanceId);
    try {
        Integer processId = taskExecutionContext.getProcessId();

        if (processId.equals(0)) {
            taskExecutionContextCacheManager.removeByTaskInstanceId(taskInstanceId);
            logger.info("the task has not been executed and has been cancelled, task id:{}", taskInstanceId);
            return Pair.of(true, appIds);
        }

        // 执行Kill -9 命令直接删除进程
        // spark or flink如果是提交到集群,暂时Kill不掉
        String pidsStr = ProcessUtils.getPidsStr(taskExecutionContext.getProcessId());
        if (StringUtils.isNotEmpty(pidsStr)) {
            String cmd = String.format("sudo kill -9 %s", ProcessUtils.getPidsStr(taskExecutionContext.getProcessId()));
            logger.info("process id:{}, cmd:{}", taskExecutionContext.getProcessId(), cmd);
            OSUtils.exeCmd(cmd);
        }

    } catch (Exception e) {
        processFlag = false;
        logger.error("kill task error", e);
    }
    // find log and kill yarn job
    Pair<Boolean, List<String>> yarnResult = killYarnJob(Host.of(taskExecutionContext.getHost()).getIp(),
            taskExecutionContext.getLogPath(),
            taskExecutionContext.getExecutePath(),
            taskExecutionContext.getTenantCode());
    return Pair.of(processFlag && yarnResult.getLeft(), yarnResult.getRight());
}

master#TaskKillResponseProcessor

TaskKillResponseProcessor用于master处理worker停止任务的响应请求。

public void process(Channel channel, Command command) {
    Preconditions.checkArgument(CommandType.TASK_KILL_RESPONSE == command.getType(), String.format("invalid command type : %s", command.getType()));

    TaskKillResponseCommand responseCommand = FastJsonSerializer.deserialize(command.getBody(), TaskKillResponseCommand.class);
    logger.info("received task kill response command : {}", responseCommand);
}

通过对 Apache DolphinScheduler 1.3.9 的源码分析,我们深入了解了其核心模块的设计和实现。

如果你对 Apache DolphinScheduler 的源码有兴趣,可以深入研究其任务调度策略的细节部分,或者根据自身业务场景进行二次开发,充分发挥 DolphinScheduler 的调度能力。

本文完!

本文由 白鲸开源科技 提供发布支持!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2202188.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

python安装第三方库的问题与解决方法

1 速度过慢 大部分第三方库都是在国外网站&#xff0c;如果直接使用pip install 包名&#xff0c;下载速度会很慢&#xff0c;这对一些大型包是很致命的&#xff0c;如果下载中断则需要重头再来。 解决方案&#xff1a;使用国内镜像&#xff08;如清华镜像下载&#xff09;&a…

Vue 脚手架学习

1.使用 Vue 脚手架 1.1 初始化脚手架 1.1.1 具体步骤 第一步&#xff08;仅第一次执行&#xff09;&#xff1a;全局安装vue/cli。 npm install -g vue/cli 第二步&#xff1a;切换到你要创建项目的目录&#xff0c;然后使用命令创建项目 vue create xxxx 第三步&#xff1a;启…

AI绘画Stable Diffusion XL优化终极指南!

前言 如何在自己的显卡上获得SDXL的最佳质量和性能&#xff0c;以及如何选择适当的优化方法和工具&#xff0c;这一让GenAI用户倍感困惑的问题&#xff0c;业内一直没有一份清晰而详尽的评测报告可供参考。直到全栈开发者Flix San出手。 在本文中&#xff0c;Flix介绍了相关SD…

9个热门.Net开源项目汇总!

今天盘点下9月份推荐的9个开源项目&#xff08;点击标题查看详情&#xff09;。 1、Pidgin&#xff1a;一个轻量级、快速且灵活的 C# 解析库 Pidgin是基于C#的开源项目&#xff0c;是一个解析组合器库&#xff0c;提供了一个高级别的声明性工具来构建解析器&#xff0c;使得编…

雅达利“美洲虎“游戏机在iPhone模拟应用程序中重生

"美洲虎"是雅达利在 1993 年推出一年后&#xff0c;索尼的 PlayStation 和世嘉的土星接手之前&#xff0c;在日益拥挤的家用游戏机市场上保持竞争力的最后一次尝试。 虽然从未在商业上取得成功&#xff0c;但它仍然拥有一批忠实的粉丝&#xff0c;他们欣赏美洲虎独特…

SpringBoot框架下的美发店管理系统开发指南

1系统概述 1.1 研究背景 随着计算机技术的发展以及计算机网络的逐渐普及&#xff0c;互联网成为人们查找信息的重要场所&#xff0c;二十一世纪是信息的时代&#xff0c;所以信息的管理显得特别重要。因此&#xff0c;使用计算机来管理美发门店管理系统的相关信息成为必然。开发…

未来已来:AIGC时代为办公方式带来智能化转型与革新

文章目录 一、Excel&#xff1a;从数据处理到智能分析二、Word&#xff1a;从文档编辑到智能写作三、PowerPoint&#xff1a;从幻灯片制作到智能演示四、AI智能办公的挑战与未来《AI智能办公实战108招&#xff1a;ChatGPTWordPowerPointWPS》编辑推荐内容简介作者简介内页插图目…

双十一好物必买榜:数码好物推荐!

​双十一该入手一些好物来准备度过下一年&#xff0c;选择几款数码好物和工作都用得到的实用好物陪伴冬天是能够让自己更积极的迎接生活&#xff0c;能够让自己更开心满足的方式。适当的购物也是能够缓解工作压力&#xff0c;心情不好的方法&#xff0c;但依然要选择买回家不会…

Python办公自动化之TXT

在日常办公中&#xff0c;我们常常需要处理大量的 TXT 文件&#xff0c;比如记录日志、存储数据或是阅读文件内容。Python 作为一款高效的编程语言&#xff0c;可以轻松完成这些任务&#xff0c;为我们的办公流程提供极大的便利。那么&#xff0c;如何利用 Python 办公自动化来…

PDFPatcher:一个无所不能的开源PDF处理工具

如果你工作中&#xff0c;经常需要处理PDF文件&#xff0c;那这款工具绝对可以满足你的所有需求&#xff0c;PDFPatcher一款功能强大的开源PDF处理工具。 01 项目简介 这是一款基于.NET Framework 4.0 到 4.8 版本开发的开源工具&#xff0c;主要采用 iText 和 MuPDF 这两个开…

快速了解2024与AI相关的诺贝尔奖大佬重要知识!

北京时间10月8日下午5点45分许&#xff0c;2024年诺贝尔物理学奖揭晓。美国普林斯顿大学科学家约翰霍普菲尔德&#xff08;John J. Hopfield&#xff09;和加拿大多伦多大学科学家杰弗里辛顿&#xff08;Geoffrey E. Hinton&#xff09;获奖&#xff0c;以表彰他们“基于人工神…

嵌入式面试——FreeRTOS篇(四) 信号量

本篇为&#xff1a;FreeRTOS信号量篇 信号量 1、什么是信号量 答&#xff1a;信号量是一种解决同步问题的机制&#xff0c;可以实现对共享资源的有序访问。 2、信号量简介 答&#xff1a; 当计数值大于0&#xff0c;表示有信号量资源。当释放信号量&#xff0c;信号量计数…

探索利用人工智能追踪逃犯的新技术

介绍 论文地址&#xff1a;https://arxiv.org/abs/2404.12626 近年来&#xff0c;"追逃游戏 "引起了人们的广泛关注。"追逃游戏 "模拟了多组追捕者与单个逃犯之间的追捕游戏。这种博弈发生在城市道路网等图上&#xff0c;有效地找到这种博弈的策略具有多种…

10月9-10日,优阅达邀您参与 2024 新加坡科技周,一站式体验亚洲前沿技术!

一场不容错过的亚洲商业技术盛会将于新加坡滨海湾金沙会展中心盛大开幕。 当全球化的浪潮席卷每一个角落&#xff0c;中国科技企业正站在新的起点&#xff0c;迎接出海的挑战与机遇。 一场不容错过的亚洲商业技术盛会 TECH WEEK SINGAPORE&#xff08;点击报名新加坡科技周&am…

HyperWorks基于几何投影的网格变形

在Altair&#xff08;HyperWorks&#xff09;里&#xff0c;使用本节将演示如何通过 line difference 功能&#xff0c;将已有网格以几何图形为目标进行投影&#xff0c;以生成全新的网格模型。 图 7-5 网格变形模型的状态 Step01&#xff1a;读取模型。 (1) 打开文件 Exerci…

C++入门基础知识106—【关于C++continue 语句】

成长路上不孤单&#x1f60a;&#x1f60a;&#x1f60a;&#x1f60a;&#x1f60a;&#x1f60a; 【14后&#x1f60a;///C爱好者&#x1f60a;///持续分享所学&#x1f60a;///如有需要欢迎收藏转发///&#x1f60a;】 今日分享关于C continue 语句的相关内容&#xff01;…

打不死的超强生命力

水熊虫是你可能听说过的小生物&#xff0c;它们能够在极端环境中生存&#xff0c;堪称地球上的“超强幸存者”。数十年来&#xff0c;科学家们试图通过各种极端实验杀死它们&#xff0c;但无论是把它们以900米/秒的速度发射&#xff0c;还是将它们暴露在宇宙辐射下&#xff0c;…

【含开题报告+文档+PPT+源码】基于springBoot+vue超市仓库管理系统的设计与实现

开题报告 随着电子商务的快速发展和物流行业的日益壮大&#xff0c;超市仓库管理系统的重要性也日益凸显。传统的超市仓库管理方式存在许多问题&#xff0c;比如人工操作繁琐、数据统计不准确、管理效率低下等。因此&#xff0c;需要设计和实现一个高效、智能的超市仓库管理系…

c语言中有关指针的题型整理,以及一些详解

&#xff08;1&#xff09;应注意其二维数组的书写形式&#xff0c;以及逗号表达式的 &#xff08;2&#xff09;要注意数组名表示首元素地址&#xff0c;解引用之后表示元素&#xff0c;&a表示整个数组&#xff0c;1表示指向后面的&#xff0c;ptr-1又指向数组末尾&#x…

鸿蒙架构-系统架构师(七十八)

1信息加密是保证系统机密性的常用手段。使用哈希校验是保证数据完整性的常用方法。可用性保证合法用户对资源的正常访问&#xff0c;不会被不正当的拒绝。&#xff08;&#xff09;就是破坏系统的可用性。 A 跨站脚本攻击XSS B 拒绝服务攻击DoS C 跨站请求伪造攻击CSRF D 缓…