[源码解析]Oozie来龙去脉之内部执行

0x00 摘要

Oozie由Cloudera公司贡献给Apache的基于工作流引擎的开源框架，是用于Hadoop平台的开源的工作流调度引擎，用来管理Hadoop作业，进行。本文是系列的第二篇，介绍Oozie的内部执行阶段。

前文[源码解析]Oozie的来龙去脉 --- (1)提交任务阶段已经为大家展示了用户提交一个Oozie Job之后做了什么，本文将沿着一个Workflow的执行流程为大家继续剖析Oozie接下来做什么。

大致如下：

在Oozie中准备Yarn Application Master
介绍新旧两版本的Yarn Application Master区别
介绍Hive on Yarn
Tez是如何乱入到这个流程中的
Java on Yarn会是如何执行
Yarn Job结束之后如何返回Oozie

0x01 Oozie阶段

1.1 ActionStartXCommand

我们假设Workflow在start之后，就进入到了一个Hive命令。

ActionStartXCommand的主要作用就是和Yarn交互，最后提交一个Yarn Application Master。

ActionStartXCommand是 WorkflowXCommand的子类。重点函数还是loadState和execute。

public class ActionStartXCommand extends ActionXCommand<org.apache.oozie.command.wf.ActionXCommand.ActionExecutorContext> {
    private String jobId = null;
    protected String actionId = null;
    protected WorkflowJobBean wfJob = null;
    protected WorkflowActionBean wfAction = null;
    private JPAService jpaService = null;
    private ActionExecutor executor = null;
    private List<UpdateEntry> updateList = new ArrayList<UpdateEntry>();
    private List<JsonBean> insertList = new ArrayList<JsonBean>();
    protected ActionExecutorContext context = null;  
}

loadState 的作用就是从数据库中获取 WorkflowJobBean 和 WorkflowActionBean 信息

protected void loadState() throws CommandException {
    try {
        jpaService = Services.get().get(JPAService.class);
        if (jpaService != null) {
            if (wfJob == null) {
                this.wfJob = WorkflowJobQueryExecutor.getInstance().get(WorkflowJobQuery.GET_WORKFLOW,jobId);
            }
            this.wfAction = WorkflowActionQueryExecutor.getInstance().get(WorkflowActionQuery.GET_ACTION,actionId);
        }
    }
}

execute函数如下。其主要业务就是executor.start(context,wfAction); 这里的executor是HiveActionExecutor。

@Override
protected ActionExecutorContext execute() throws CommandException {
    Configuration conf = wfJob.getWorkflowInstance().getConf();
    try {
        if(!caught) {
            // 这里是业务重点，就是启动任务
            executor.start(context,wfAction);
          
            if (wfAction.isExecutionComplete()) {
                if (!context.isExecuted()) {
                    failJob(context);
                } else {
                    wfAction.setPending();
                    if (!(executor instanceof ControlNodeActionExecutor)) {
                        queue(new ActionEndXCommand(wfAction.getId(),wfAction.getType()));
                    }
                    else {
                        execSynchronous = true;
                    }
                }
            }
            updateList.add(new UpdateEntry<WorkflowActionQuery>(WorkflowActionQuery.UPDATE_ACTION_START,wfAction));
        }
    }
    finally {
            BatchQueryExecutor.getInstance().executeBatchInsertUpdateDelete(insertList,updateList,null);
            ......
            if (execSynchronous) {
                // Changing to synchronous call from asynchronous queuing to prevent
                // undue delay from ::start:: to action due to queuing
                callActionEnd();
            }
        }
    }
    return null;
}

ActionExecutor.start是异步的，还需要检查Action执行状态来推进流程，oozie通过两种方式来检查任务是否完成。

回调：当一个任务和一个计算被启动后，会为任务提供一个回调url，该任务执行完成后，会执行回调来通知oozie
轮询：在任务执行回调失败的情况下，无论任何原因，都支持以轮询的方式进行查询。

oozie提供这两种方式来控制任务。后续我们会再提到。

1.2 HiveActionExecutor

上面代码中 executor.start(context,wfAction); 就是启动任务。

HiveActionExecutor继承 ScriptLanguageActionExecutor，ScriptLanguageActionExecutor继承 JavaActionExecutor，所以后续很多函数执行的是JavaActionExecutor中的函数。

public class HiveActionExecutor extends ScriptLanguageActionExecutor {}

ActionExecutor.start就是执行的JavaActionExecutor.start()。

其会检查文件系统，比如hdfs是不是支持，Action Dir是否ready，然后会submitLauncher。

public void start(Context context,WorkflowAction action) throws ActionExecutorException {
        FileSystem actionFs = context.getAppFileSystem();
        prepareActionDir(actionFs,context);
        submitLauncher(actionFs,context,action); // 这里是业务
        check(context,action);
}

submitLauncher主要功能是：

1)对于某些类型job，调用injectActionCallback配置回调Action
2)配置 action job
3)调用createLauncherConf配置LauncherAM,即Application Master
- 3.1)配置回调conf.set(LauncherAMCallbackNotifier.OOZIE_LAUNCHER_CALLBACK_URL,callback);
- 3.2)设置"launcher Main Class"。LauncherHelper.setupMainClass(launcherJobConf,getLauncherMain(launcherJobConf,actionXml));
4)调用HadoopAccessorService.createYarnClient来创建一个YarnClient
5)调用UserGroupInformation继续配置
6)调用yarnClient.createApplication创建一个YarnClientApplication
7)记录ApplicationId
8)调用createAppSubmissionContext建立Yarn App的执行环境
- 8.1)appContext.setApplicationType("Oozie Launcher");
- 8.2)设置容器信息 ContainerLaunchContext
- 8.3)vargs.add(LauncherAM.class.getCanonicalName()); 比如设置AM启动类
- 8.4)return appContext;
9)提交App，yarnClient.submitApplication(appContext); appContext就是前面return的。

具体代码如下：

public void submitLauncher(final FileSystem actionFs,final Context context,final WorkflowAction action)throws ActionExecutorException {
    YarnClient yarnClient = null;
    try {
        // action job configuration
        Configuration actionConf = loadHadoopDefaultResources(context,actionXml);
        setupActionConf(actionConf,actionXml,appPathRoot);
        addAppNameContext(context,action);
        setLibFilesArchives(context,appPathRoot,actionConf);
				// 配置回调Action
        injectActionCallback(context,actionConf);

        Configuration launcherConf = createLauncherConf(actionFs,action,actionConf);
        yarnClient = createYarnClient(context,launcherConf);
      
        //继续配置各种Credentials
        if (UserGroupInformation.isSecurityEnabled()) {
           ......
        }

        if (alreadyRunning && !isUserRetry) {
          ......
        }
        else {
            YarnClientApplication newApp = yarnClient.createApplication();
            ApplicationId appId = newApp.getNewApplicationResponse().getApplicationId();
            ApplicationSubmissionContext appContext =
                    createAppSubmissionContext(appId,launcherConf,actionConf,credentials,actionXml);
            // 这里正式与 Yarn 交互。
            yarnClient.submitApplication(appContext);

            launcherId = appId.toString();
            ApplicationReport appReport = yarnClient.getApplicationReport(appId);
            consoleUrl = appReport.getTrackingUrl();
        }

        String jobTracker = launcherConf.get(HADOOP_YARN_RM);
        context.setStartData(launcherId,jobTracker,consoleUrl);
    }
}

protected YarnClient createYarnClient(Context context,Configuration jobConf) throws HadoopAccessorException {
        String user = context.getWorkflow().getUser();
        return Services.get().get(HadoopAccessorService.class).createYarnClient(user,jobConf);
}

0x2 旧版本LauncherMapper

这里我们有必要提一下旧版本的实现：LauncherMapper。

网上关于Oozie的文章很多都是基于旧版本，所以基本都提到了 LauncherMapper，比如：

Oozie本质就是一个作业协调工具（底层原理是通过将xml语言转换成mapreduce程序来做，但只是在集中map端做处理，避免shuffle的过程）。

Oozie执行Action时，即ActionExecutor（最主要的子类是JavaActionExecutor，hive、spark等action都是这个类的子类），JavaActionExecutor首先会提交一个LauncherMapper（map任务）到yarn，其中会执行LauncherMain（具体的action是其子类，比如JavaMain、SparkMain等），spark任务会执行SparkMain，在SparkMain中会调用org.apache.spark.deploy.SparkSubmit来提交任务。其实诉我的map任务就是识别你是什么样的任务（hive,shell,spark等），并通过该任务来启动任务所需要的环境来提交任务。提供了提交任务的接口（如hive任务，启动hive客户端或beeline等）

从文档看，OOZIE-2918 Delete LauncherMapper and its test (asasvari via pbacsko) 这时候被移除了。

我们从旧版本代码中大致看看LauncherMapper的实现。

LauncherMapper继承了 import org.apache.hadoop.mapred.Mapper;，实现了 map 函数。其内部就是调用用户代码的主函数。

import org.apache.hadoop.mapred.Mapper;

public class LauncherMapper<K1,V1,K2,V2> implements Mapper<K1,V2>,Runnable {
   @Override
    public void map(K1 key,V1 value,OutputCollector<K2,V2> collector,Reporter reporter) throws IOException {
        SecurityManager initialSecurityManager = System.getSecurityManager();
        try {
            else {
                String mainClass = getJobConf().get(CONF_OOZIE_ACTION_MAIN_CLASS);

                    new LauncherSecurityManager();
                    setupHeartBeater(reporter);
                    setupMainConfiguration();
                    // Propagating the conf to use by child job.
                    propagateToHadoopConf();

                    executePrepare();
                    Class klass = getJobConf().getClass(CONF_OOZIE_ACTION_MAIN_CLASS,Object.class);
                    Method mainMethod = klass.getMethod("main",String[].class);
                    mainMethod.invoke(null,(Object) args);
             }
        }
    }
}

在LauncherMapperHelper中，会设置LauncherMapper为启动函数。

public static void setupLauncherInfo(JobConf launcherConf,String jobId,String actionId,Path actionDir,String recoveryId,Configuration actionConf,String prepareXML) throws IOException,HadoopAccessorException {
        launcherConf.setMapperClass(LauncherMapper.class);
}

在 JavaActionExecutor 中有 org.apache.hadoop.mapred.JobClient

import org.apache.hadoop.mapred.JobClient;

public void submitLauncher(FileSystem actionFs,Context context,WorkflowAction action) throws ActionExecutorException {
            jobClient = createJobClient(context,launcherJobConf);
            LauncherMapperHelper.setupLauncherInfo(launcherJobConf,jobId,actionId,actionDir,recoveryId,prepareXML);

            // Set the launcher Main Class
            LauncherMapperHelper.setupMainClass(launcherJobConf,actionXml)); 
            LauncherMapperHelper.setupMainArguments(launcherJobConf,args);
            ......
  
            runningJob = jobClient.submitJob(launcherJobConf);  // 这里进行了提交
}

综上所述，旧版本 LauncherMapper 实现了一个 import org.apache.hadoop.mapred.Mapper;，具体是org.apache.hadoop.mapred.JobClient 负责与hadoop交互。

0x3 新版本Yarn Application Master

新版本的Oozie是和Yarn深度绑定的，所以我们需要先介绍Yarn。

3. 1 YARN简介

YARN 是 Hadoop 2.0 中的资源管理系统，它的基本设计思想是将 MRv1 中的 JobTracker拆分成了两个独立的服务：一个全局的资源管理器 ResourceManager 和每个应用程序特有的ApplicationMaster。其中 ResourceManager 负责整个系统的资源管理和分配，而 ApplicationMaster负责单个应用程序的管理。

YARN 总体上仍然是 Master/Slave 结构，在整个资源管理框架中，ResourceManager 为Master，NodeManager 为 Slave，ResourceManager 负责对各个 NodeManager 上的资源进行统一管理和调度。

当用户提交一个应用程序时，需要提供一个用以跟踪和管理这个程序的ApplicationMaster，它负责向 ResourceManager 申请资源，并要求 NodeManager 启动可以占用一定资源的任务。由于不同的ApplicationMaster 被分布到不同的节点上，因此它们之间不会相互影响。

3.2 ApplicationMaster

用户提交的每个应用程序均包含一个 AM，主要功能包括：

与 RM 调度器协商以获取资源（用 Container 表示）；
将得到的任务进一步分配给内部的任务；
与 NM 通信以启动 / 停止任务；
监控所有任务运行状态，并在任务运行失败时重新为任务申请资源以重启任务。

当用户向 YARN 中提交一个应用程序后， YARN 将分两个阶段运行该应用程序：

第一个阶段是启动 ApplicationMaster ；
第二个阶段是由 ApplicationMaster 创建应用程序，为它申请资源，并监控它的整个运行过程，直到运行完成。

工作流程分为以下几个步骤：

用户向 YARN 中提交应用程序，其中包括 ApplicationMaster 程序、启动ApplicationMaster 的命令、用户程序等。
ResourceManager 为该应用程序分配第一个 Container，并与对应的 NodeManager 通信，要求它在这个 Container 中启动应用程序的 ApplicationMaster。
ApplicationMaster 首先向 ResourceManager 注册，这样用户可以直接通过ResourceManage 查看应用程序的运行状态，然后它将为各个任务申请资源，并监控它的运行状态，直到运行结束，即重复步骤 4~7。
ApplicationMaster 采用轮询的方式通过 RPC 协议向 ResourceManager 申请和领取资源。
一旦 ApplicationMaster 申请到资源后，便与对应的 NodeManager 通信，要求它启动任务。
NodeManager 为任务设置好运行环境（包括环境变量、 JAR 包、二进制程序等）后，将任务启动命令写到一个脚本中，并通过运行该脚本启动任务。
各个任务通过某个 RPC 协议向 ApplicationMaster 汇报自己的状态和进度，以让 ApplicationMaster 随时掌握各个任务的运行状态，从而可以在任务失败时重新启动任务。在应用程序运行过程中，用户可随时通过RPC向ApplicationMaster查询应用程序的当前运行状态。
应用程序运行完成后，ApplicationMaster 向 ResourceManager 注销并关闭自己。

3.3 LauncherAM

LauncherAM就是Oozie的ApplicationMaster实现。LauncherAM.main就是Yarn调用之处。

public class LauncherAM {
  
    public static void main(String[] args) throws Exception {
        final LocalFsOperations localFsOperations = new LocalFsOperations();
        final Configuration launcherConf = readLauncherConfiguration(localFsOperations);
        UserGroupInformation.setConfiguration(launcherConf);
        // MRAppMaster adds this call as well,but it's included only in Hadoop 2.9+
        // SecurityUtil.setConfiguration(launcherConf);
        UserGroupInformation ugi = getUserGroupInformation(launcherConf);
        // Executing code inside a doAs with an ugi equipped with correct tokens.
        ugi.doAs(new PrivilegedExceptionAction<Object>() {
            @Override
            public Object run() throws Exception {
                  LauncherAM launcher = new LauncherAM(new AMRMClientAsyncFactory(),new AMRMCallBackHandler(),new HdfsOperations(new SequenceFileWriterFactory()),new LocalFsOperations(),new PrepareActionsHandler(new LauncherURIHandlerFactory(null)),new LauncherAMCallbackNotifierFactory(),new LauncherSecurityManager(),sysenv.getenv(ApplicationConstants.Environment.CONTAINER_ID.name()),launcherConf);
                    launcher.run();
                    return null;
            }
        });
    }  
}

launcher.run主要完成

通过registerWithRM调用AMRMClientAsync来注册到Resource Manager

executePrepare / setupMainConfiguration 完成初始化，准备和配置
runActionMain会根据配置调用具体的main函数，比如HiveMain
- Class<?> klass = launcherConf.getClass(CONF_OOZIE_ACTION_MAIN_CLASS,null);
- Method mainMethod = klass.getMethod("main",String[].class);
- mainMethod.invoke(null,(Object) mainArgs);
调用uploadActionDataToHDFS同步HDFS
调用unregisterWithRM从RM解绑
调用LauncherAMCallbackNotifier.notifyURL通知Oozie

具体代码如下：

public void run() throws Exception {
    try {
        actionDir = new Path(launcherConf.get(OOZIE_ACTION_DIR_PATH));
        registerWithRM(amrmCallBackHandler);
        // Run user code without the AM_RM_TOKEN so users can't request containers
        UserGroupInformation ugi = getUserGroupInformation(launcherConf,AMRMTokenIdentifier.KIND_NAME);

        ugi.doAs(new PrivilegedExceptionAction<Object>() {
            @Override
            public Object run() throws Exception {
                executePrepare(errorHolder);
                setupMainConfiguration();
                runActionMain(errorHolder); // 会根据配置调用具体的main函数，比如HiveMain
                return null;
            }
        });
    } 
    finally {
        try {
            actionData.put(ACTION_DATA_FINAL_STATUS,actionResult.toString());
            hdfsOperations.uploadActionDataToHDFS(launcherConf,actionData);
        } finally {
            try {
                unregisterWithRM(actionResult,errorHolder.getErrorMessage());
            } finally {
                LauncherAMCallbackNotifier cn = callbackNotifierFactory.createCallbackNotifier(launcherConf);
                cn.notifyURL(actionResult);
            }
        }
    }
}

但是你会发现，对比之前所说的ApplicationMaster应该实现的功能，LauncherAM 做得恁少了点，这是个疑问！ 我们在后续研究中会为大家揭开这个秘密。

0x4 Hive on Yarn

上文提到，runActionMain会根据配置调用具体的main函数。我们假设是hive action，则对应的是HiveMain。

Hive job的入口函数是在HIVE_MAIN_CLASS_NAME配置的。

public class HiveActionExecutor extends ScriptLanguageActionExecutor {
    private static final String HIVE_MAIN_CLASS_NAME = "org.apache.oozie.action.hadoop.HiveMain";

	  @Override
    public List<Class<?>> getLauncherClasses() {
        List<Class<?>> classes = new ArrayList<Class<?>>();
        classes.add(Class.forName(HIVE_MAIN_CLASS_NAME)); // 这里配置了 HiveMain
        return classes;
    }  
}

HiveMain后续调用如下

HiveMain.main ----> run ----> runHive ----> CliDriver.main(args);

最后调用 org.apache.hadoop.hive.cli.CliDriver 完成了hive操作，大致有：

设定参数；
如果有脚本，则设定脚本路径；
如果有之前的yarn child jobs，杀掉；
执行hive；
写log；

具体如下：

public class HiveMain extends LauncherMain {
    public static void main(String[] args) throws Exception {
        run(HiveMain.class,args);
    }
  
   @Override
    protected void run(String[] args) throws Exception {
        Configuration hiveConf = setUpHiveSite();
        List<String> arguments = new ArrayList<String>();

        String logFile = setUpHiveLog4J(hiveConf);
        arguments.add("--hiveconf");
        arguments.add("hive.log4j.file=" + new File(HIVE_L4J_PROPS).getAbsolutePath());
        arguments.add("--hiveconf");
        arguments.add("hive.exec.log4j.file=" + new File(HIVE_EXEC_L4J_PROPS).getAbsolutePath());

        //setting oozie workflow id as caller context id for hive
        String callerId = "oozie:" + System.getProperty(LauncherAM.OOZIE_JOB_ID);
        arguments.add("--hiveconf");
        arguments.add("hive.log.trace.id=" + callerId);

        String scriptPath = hiveConf.get(HiveActionExecutor.HIVE_SCRIPT);
        String query = hiveConf.get(HiveActionExecutor.HIVE_QUERY);
        if (scriptPath != null) {
            ......
            // print out current directory & its contents
            File localDir = new File("dummy").getAbsoluteFile().getParentFile();
            String[] files = localDir.list();

            // Prepare the Hive Script
            String script = readStringFromFile(scriptPath);
            arguments.add("-f");
            arguments.add(scriptPath);
        } else if (query != null) {
            String filename = createScriptFile(query);
            arguments.add("-f");
            arguments.add(filename);
        } 

        // Pass any parameters to Hive via arguments
        ......
        String[] hiveArgs = ActionUtils.getStrings(hiveConf,HiveActionExecutor.HIVE_ARGS);
        for (String hiveArg : hiveArgs) {
            arguments.add(hiveArg);
        }
        LauncherMain.killChildYarnJobs(hiveConf);

        try {
            runHive(arguments.toArray(new String[arguments.size()]));
        }
        finally {
            writeExternalChildIDs(logFile,HIVE_JOB_IDS_PATTERNS,"Hive");
        }
    }  
}

因此我们能看到，Oozie ApplicationMaster 在被Yarn调用之后，就是通过org.apache.hadoop.hive.cli.CliDriver 给Hive发送命令让其执行，没有什么再和ResourceManager / NodeManager 交互的过程，这真的很奇怪。这个秘密要由下面的Tez来解答。

0x5 Tez计算框架

Tez是Apache开源的支持DAG作业的计算框架，它直接源于MapReduce框架，核心思想是将Map和Reduce两个操作进一步拆分，即Map被拆分成Input、Processor、Sort、Merge和Output， Reduce被拆分成Input、Shuffle、Sort、Merge、Processor和Output等，这样，这些分解后的元操作可以任意灵活组合，产生新的操作，这些操作经过一些控制程序组装后，可形成一个大的DAG作业。

Tez有以下特点：

Apache二级开源项目
运行在YARN之上
适用于DAG（有向图）应用（同Impala、Dremel和Drill一样，可用于替换Hive/Pig等）

可以看到，Tez也是和Yarn深度绑定的。

5.1 DAGAppMaster

首先我们就找到了Tez对应的Application Master，即Tez DAG Application Master。

public class DAGAppMaster extends AbstractService {
  public String submitDAGToAppMaster(DAGPlan dagPlan,Map<String,LocalResource> additionalResources) throws TezException {
      startDAG(dagPlan,additionalResources);
    }
  }  
}

我们能看到提交Application Master代码。

public class TezYarnClient extends FrameworkClient {
  @Override
  public ApplicationId submitApplication(ApplicationSubmissionContext appSubmissionContext)
      throws YarnException,IOException,TezException {
   	ApplicationId appId= yarnClient.submitApplication(appSubmissionContext);
    ApplicationReport appReport = getApplicationReport(appId);
    return appId;
  }
}

这里是建立Application Master context 代码，设置了Application Maste类和Container。

  public static ApplicationSubmissionContext createApplicationSubmissionContext(
      ApplicationId appId,DAG dag,String amName,AMConfiguration amConfig,LocalResource> tezJarResources,Credentials sessionCreds,boolean tezLrsAsArchive,TezApiVersionInfo apiVersionInfo,ServicePluginsDescriptor servicePluginsDescriptor,JavaOptsChecker javaOptsChecker)
      throws IOException,YarnException {

    // Setup the command to run the AM
    List<String> vargs = new ArrayList<String>(8);
    vargs.add(Environment.JAVA_HOME.$() + "/bin/java");

    String amOpts = constructAMLaunchOpts(amConfig.getTezConfiguration(),capability);
    vargs.add(amOpts);

    // 这里设置了 Application Master
    vargs.add(TezConstants.TEZ_APPLICATION_MASTER_CLASS);

    // 这里设置了命令行参数 
    Vector<String> vargsFinal = new Vector<String>(8);
    // Final command
    StringBuilder mergedCommand = new StringBuilder();
    for (CharSequence str : vargs) {
      mergedCommand.append(str).append(" ");
    }
    vargsFinal.add(mergedCommand.toString());

    // 设置了container
    // Setup ContainerLaunchContext for AM container
    ContainerLaunchContext amContainer =
        ContainerLaunchContext.newInstance(amLocalResources,environment,vargsFinal,serviceData,securityTokens,acls);

    // Set up the ApplicationSubmissionContext
    ApplicationSubmissionContext appContext = Records
        .newRecord(ApplicationSubmissionContext.class);

    appContext.setAMContainerSpec(amContainer);

    return appContext;
}

5.2 与Resource Manager交互

这里只摘要部分代码，能看到Tez实现了与Yarn Resource Manager交互。

YarnTaskSchedulerService实现了AMRMClientAsync.CallbackHandler，其功能是处理由Resource Manager收到的消息，其实现了方法

import org.apache.hadoop.yarn.client.api.AMRMClient.ContainerRequest;
import org.apache.hadoop.yarn.client.api.async.AMRMClientAsync;

public class YarnTaskSchedulerService extends TaskScheduler
                             implements AMRMClientAsync.CallbackHandler {
  @Override
  public void onContainersAllocated(List<Container> containers) {
      if (!shouldReuseContainers) {
        List<Container> modifiableContainerList = Lists.newLinkedList(containers);
        assignedContainers = assignNewlyAllocatedContainers(
            modifiableContainerList);
      } 
    }
    // upcall to app must be outside locks
    informAppAboutAssignments(assignedContainers);
  }

  @Override
  public void onContainersCompleted(List<ContainerStatus> statuses) {
    synchronized (this) {
      for(ContainerStatus containerStatus : statuses) {
        ContainerId completedId = containerStatus.getContainerId();
        HeldContainer delayedContainer = heldContainers.get(completedId);

        Object task = releasedContainers.remove(completedId);
        appContainerStatus.put(task,containerStatus);
        continue;
       }

        // not found in released containers. check currently allocated containers
        // no need to release this container as the RM has already completed it
        task = unAssignContainer(completedId,false);
        if (delayedContainer != null) {
          heldContainers.remove(completedId);
          Resources.subtract(allocatedResources,delayedContainer.getContainer().getResource());
        } 
        if(task != null) {
          // completion of a container we have allocated currently
          // an allocated container completed. notify app. This will cause attempt to get killed
          appContainerStatus.put(task,containerStatus);
          continue;
        }
      }
    }

    // upcall to app must be outside locks
    for (Entry<Object,ContainerStatus> entry : appContainerStatus.entrySet()) {
      getContext().containerCompleted(entry.getKey(),entry.getValue());
    }
  }
}

onContainersAllocated ：当有新的Container 可以使用。这里时启动container 的代码。
onContainersCompleted 是Container 运行结束。在onContainersCompleted 中，如果是失败的Container，我们需要重新申请并启动Container，成功的将做记录既可以。

由此我们可以看到，Oozie是一个甩手掌柜，他只管启动Hive，具体后续如何与RM交互，则完全由Tez搞定。这就解答了之前我们所有疑惑。

最后总结下新流程：

Oozie提交LauncherAM到Yarn；
LauncherAM运行HiveMain，其调用CliDriver.main给Hive提交任务；
Hive on Tez，所以Tez准备DAGAppMaster；
Yarn与Tez交互：Tez提交DAGAppMaster到Yarn，Tez解析运行Hive命令；
Hive运行结束后，调用回调 url 通知Oozie；

原谅我用这种办法画图，因为我最讨厌看到一篇好文，结果发现图没了......

+---------+                       +----------+                       +-----------+
|         | 1-submit LauncherAM   |          | 2.CliDriver.main      |           |  
|         |---------------------->| HiveMain |---------------------> |           |
|         |                       |          |                       |           |--+
| [Oozie] |                       |  [Yarn]  |                       |   [Hive]  |  | 3.Run 
|         |                       |          |                       |           |  | Hive     
|         | 5-notifyURL of Oozie  |          | 4-submit DAGAppMaster |           |<-+
|         |<----------------------|          | <-------------------->|    Tez    |
|         |                       |          |                       |           |
+---------+                       +----------+                       +-----------+

0x6 Java on Yarn

下面我们看看如果Oozie执行一个Java程序，是如何进行的。

Java程序的主执行函数是 JavaMain，这个就简单多了，就是直接调用用户的Java主函数。

public class JavaMain extends LauncherMain {
    public static final String JAVA_MAIN_CLASS = "oozie.action.java.main";

   /**
    * @param args Invoked from LauncherAM:run()
    * @throws Exception in case of error when running the application
    */
    public static void main(String[] args) throws Exception {
        run(JavaMain.class,args);
    }

    @Override
    protected void run(String[] args) throws Exception {

        Configuration actionConf = loadActionConf();
        setYarnTag(actionConf);
        setApplicationTags(actionConf,TEZ_APPLICATION_TAGS);
        setApplicationTags(actionConf,SPARK_YARN_TAGS);

        LauncherMain.killChildYarnJobs(actionConf);

        Class<?> klass = actionConf.getClass(JAVA_MAIN_CLASS,Object.class);
        Method mainMethod = klass.getMethod("main",String[].class);
        mainMethod.invoke(null,(Object) args);
    }
}

0x7 Yarn job 执行结束

7.1 检查任务机制

前面提到，ActionExecutor.start是异步的，还需要检查Action执行状态来推进流程，oozie通过两种方式来检查任务是否完成。

回调：当一个任务和一个计算被启动后，会为任务提供一个回调url，该任务执行完成后，会执行回调来通知oozie
轮询：在任务执行回调失败的情况下，无论任何原因，都支持以轮询的方式进行查询。

oozie提供这两种方式来控制任务。

7.2 回调机制

LauncherAM 在用户程序执行完成之后，会做如下调用，以通知Oozie。这就用到了“回调”机制。

LauncherAMCallbackNotifier cn = callbackNotifierFactory.createCallbackNotifier(launcherConf);
                cn.notifyURL(actionResult);

Oozie的CallbackServlet会响应这个调用。可以看到，DagEngine.processCallback是Oozie处理程序结束之处。

public class CallbackServlet extends JsonRestServlet {
    @Override
    protected void doGet(HttpServletRequest request,HttpServletResponse response) throws ServletException,IOException {
        String queryString = request.getQueryString();
        CallbackService callbackService = Services.get().get(CallbackService.class);

        String actionId = callbackService.getActionId(queryString);

        DagEngine dagEngine = Services.get().get(DagEngineService.class).getSystemDagEngine();

        dagEngine.processCallback(actionId,callbackService.getExternalStatus(queryString),null);
        }
    }
}

DagEngine.processCallback主要是使用CompletedActionXCommand来进行。可以看到这个命令是放到 CallableQueueService 的 queue中，所以下面我们需要介绍 CallableQueueService。

 public void processCallback(String actionId,String externalStatus,Properties actionData)
          throws DagEngineException {
      XCallable<Void> command = new CompletedActionXCommand(actionId,externalStatus,actionData,HIGH_PRIORITY);
      if (!Services.get().get(CallableQueueService.class).queue(command)) {
          LOG.warn(XLog.OPS,"queue is full or system is in SAFEMODE,ignoring callback");
      }
}

7.3 异步执行

7.3.1 CallableQueueService

Oozie 使用 CallableQueueService 来异步执行操作；

public class CallableQueueService implements Service,Instrumentable {
    private final Map<String,AtomicInteger> activeCallables = new HashMap<String,AtomicInteger>();
    private final Map<String,Date> uniqueCallables = new ConcurrentHashMap<String,Date>();
    private final ConcurrentHashMap<String,Set<XCallable<?>>> interruptCommandsMap = new ConcurrentHashMap<>();
    private Set<String> interruptTypes;
    private int interruptMapMaxSize;
    private int maxCallableConcurrency;
    private int queueAwaitTerminationTimeoutSeconds;
    private int queueSize;
    private PriorityDelayQueue<CallableWrapper<?>> queue;
    private ThreadPoolExecutor executor;
    private Instrumentation instrumentation;
    private boolean newImpl = false;
    private AsyncXCommandExecutor asyncXCommandExecutor; 
  
    public void init(Services services) {
          queue = new PollablePriorityDelayQueue<CallableWrapper<?>>(PRIORITIES,MAX_CALLABLE_WAITTIME_MS,TimeUnit.MILLISECONDS,queueSize) {
                @Override
                protected boolean eligibleToPoll(QueueElement<?> element) {
                    if (element != null) {
                        CallableWrapper wrapper = (CallableWrapper) element;
                        if (element.getElement() != null) {
                            return callableReachMaxConcurrency(wrapper.getElement());
                        }
                    }
                    return false;
                }
            };  
    }
}

特点：

加入执行队列的任务可能是可以立即被吊起的，也可能是未来某个时间才触发的。
执行线程池根据任务的执行时间和任务的优先级别来选取任务吊起。
执行线程池的任务队列大小可配置，当到达队列最大值，线程池将不再接收任务。

7.3.3 PriorityDelayQueue

线程池选取的队列是oozie自定义的队列 PriorityDelayQueue：

特点：

根据队列中元素的延时时间以及其执行优先级出队列：

实现策略：

PriorityDelayQueue 中为每个优先级别的任务设置一个 延时队列 DelayQueue
因为使用的是jdk自带的延时队列 DelayQueue，可以保证的是如果任务在该队列中的延时时间满足条件，我们
通过poll()方法即可得到满足延时条件的任务，如果 poll()得到的是null，说明该队列的中任务没有满足时间条件的任务。

如何编排多个优先级的队列：
每次从PriorityDelayQueue去选取任务，都优先从最高优先级的队列来poll出任务，如果最高的优先级队列中没有满足条件的任务，则次优先级队列poll出任务，如果仍未获取
将按照队列优先等级以此类推。
饿死现象：假如高优先级中的任务在每次获取的时候都满足条件，这样容易将低优先级的队列中满足条件的任务活活饿死，为了防止这种情况的产生，在每次选取任务之前，遍历
低优先级队列任务，如果任务早已经满足出队列条件，如果超时时间超过了我们设定的最大值，我们会为这个任务提高优先级，将这个任务优先级加一，添加到上个优先级队列中进行
排队。

7.3.3 PollablePriorityDelayQueue

特点：

在从队列中选取任务的时候，先判断满足时间的任务是否满足并发等限制，如果满足再从队列中取出，而不是像PriorityDelayQueue那样，先取出如果不满足并发等限制，再将该任务重新放置回去。

任务类型：

使用线程池异步执行任务，任务和任务之间是无序的，针对具体的业务场景，可能执行的单元是需要串序执行的。oozie中封装了 CompositeCallable 和 一般的 XCallable的任务类型，前者是XCallable的一个集合，它能保证的是这个集合里面的XCallable是顺序执行的。

7.4 跳转下一个操作

CompletedActionXCommand 当Workflow command结束时候会执行，且只执行一次。对于程序结束，会在异步队列中加入一个 ActionCheckXCommand。

public class CompletedActionXCommand extends WorkflowXCommand<Void> {
    @Override
    protected Void execute() throws CommandException {
        if (this.wfactionBean.getStatus() == WorkflowActionBean.Status.PREP) {
           .....
        } else {    // RUNNING
            ActionExecutor executor = Services.get().get(ActionService.class).getExecutor(this.wfactionBean.getType());
            // this is done because oozie notifications (of sub-wfs) is send
            // every status change,not only on completion.
            if (executor.isCompleted(externalStatus)) {
                queue(new ActionCheckXCommand(this.wfactionBean.getId(),getPriority(),-1));
            }
        }
        return null;
    }  
}

异步调用到ActionCheckXCommand，其主要作用是：

如果有重试机制，则做相应配置
调用 executor.check(context,wfAction); 来检查环境信息
更新数据库中的任务信息
因为已经结束了，所以用ActionEndXCommand来执行结束

public class ActionCheckXCommand extends ActionXCommand<Void> {
    @Override
    protected Void execute() throws CommandException {

        ActionExecutorContext context = null;
        boolean execSynchronous = false;
        try {
            boolean isRetry = false; // 如果有重试机制，则做相应配置
            if (wfAction.getRetries() > 0) {
                isRetry = true;
            }
            boolean isUserRetry = false;
            context = new ActionXCommand.ActionExecutorContext(wfJob,wfAction,isRetry,isUserRetry);
          
            executor.check(context,wfAction); // 检查环境信息

            if (wfAction.isExecutionComplete()) {
                if (!context.isExecuted()) {
                    failJob(context);
                    generateEvent = true;
                } else {
                    wfAction.setPending();
                    execSynchronous = true;
                }
            }
            updateList.add(new UpdateEntry<WorkflowActionQuery>(WorkflowActionQuery.UPDATE_ACTION_CHECK,wfAction));
            updateList.add(new UpdateEntry<WorkflowJobQuery> (WorkflowJobQuery.UPDATE_WORKFLOW_STATUS_INSTANCE_MODIFIED,wfJob));
        }
        finally {
                // 更新数据库中的任务信息
                BatchQueryExecutor.getInstance().executeBatchInsertUpdateDelete(null,null);
                if (generateEvent && EventHandlerService.isEnabled()) {
                    generateEvent(wfAction,wfJob.getUser());
                }
                if (execSynchronous) {
                    // 用ActionEndXCommand来执行结束
                    new ActionEndXCommand(wfAction.getId(),wfAction.getType()).call();
                }
        }
        return null;
    }
}

调用到 JavaActionExecutor.check

根据配置信息建立 yarnClient = createYarnClient(context,jobConf);
获取程序报告信息 ApplicationReport appReport = yarnClient.getApplicationReport(applicationId);
获取程序数据 Map<String,String> actionData = LauncherHelper.getActionData(actionFs,jobConf);
设置各种信息

@Override
public void check(Context context,WorkflowAction action) throws ActionExecutorException {
    boolean fallback = false;
    YarnClient yarnClient = null;
    try {
        Element actionXml = XmlUtils.parseXml(action.getConf());
        Configuration jobConf = createBaseHadoopConf(context,actionXml);
        FileSystem actionFs = context.getAppFileSystem();
        yarnClient = createYarnClient(context,jobConf); // 根据配置信息建立
        FinalApplicationStatus appStatus = null;
        try {
            final String effectiveApplicationId = findYarnApplicationId(context,action);
            final ApplicationId applicationId = ConverterUtils.toApplicationId(effectiveApplicationId);
            final ApplicationReport appReport = yarnClient.getApplicationReport(applicationId); // 获取程序报告信息
            final YarnApplicationState appState = appReport.getYarnApplicationState();
            if (appState == YarnApplicationState.FAILED || appState == YarnApplicationState.FINISHED
                    || appState == YarnApplicationState.KILLED) {
                appStatus = appReport.getFinalApplicationStatus();
            }
        } 
        if (appStatus != null || fallback) {
            Path actionDir = context.getActionDir();
            // load sequence file into object
            Map<String,jobConf);   // 获取程序数据
            if (fallback) {
                String finalStatus = actionData.get(LauncherAM.ACTION_DATA_FINAL_STATUS);
                if (finalStatus != null) {
                    appStatus = FinalApplicationStatus.valueOf(finalStatus);
                } else {
                    context.setExecutionData(FAILED,null);
                }
            }

            String externalID = actionData.get(LauncherAM.ACTION_DATA_NEW_ID);  // MapReduce was launched
            if (externalID != null) {
                context.setExternalChildIDs(externalID);
             }

           // Multiple child IDs - Pig or Hive action
            String externalIDs = actionData.get(LauncherAM.ACTION_DATA_EXTERNAL_CHILD_IDS);
            if (externalIDs != null) {
                context.setExternalChildIDs(externalIDs);
             }

            // 设置各种信息
            context.setExecutionData(appStatus.toString(),null);
            if (appStatus == FinalApplicationStatus.SUCCEEDED) {
                if (getCaptureOutput(action) && LauncherHelper.hasOutputData(actionData)) {
                    context.setExecutionData(SUCCEEDED,PropertiesUtils.stringToProperties(actionData
                            .get(LauncherAM.ACTION_DATA_OUTPUT_PROPS)));
                }
                else {
                    context.setExecutionData(SUCCEEDED,null);
                }
                if (LauncherHelper.hasStatsData(actionData)) {
                    context.setExecutionStats(actionData.get(LauncherAM.ACTION_DATA_STATS));
                }
                getActionData(actionFs,context);
            }
            else {
                ......
                context.setExecutionData(FAILED_KILLED,null);
            }
        }
    }
    finally {
        if (yarnClient != null) {
            IOUtils.closeQuietly(yarnClient);
        }
    }
}

ActionEndXCommand会进行结束和跳转:

调用Executor来完成结束操作 executor.end(context,wfAction);
更新数据库的job信息 BatchQueryExecutor.getInstance().executeBatchInsertUpdateDelete
用 SignalXCommand 来进行跳转，进行下一个Action的执行

public class ActionEndXCommand extends ActionXCommand<Void> {
    @Override
    protected Void execute() throws CommandException {

        Configuration conf = wfJob.getWorkflowInstance().getConf();

        if (!(executor instanceof ControlNodeActionExecutor)) {
            maxRetries = conf.getInt(OozieClient.ACTION_MAX_RETRIES,executor.getMaxRetries());
            retryInterval = conf.getLong(OozieClient.ACTION_RETRY_INTERVAL,executor.getRetryInterval());
        }

        executor.setMaxRetries(maxRetries);
        executor.setRetryInterval(retryInterval);

        boolean isRetry = false;
        if (wfAction.getStatus() == WorkflowActionBean.Status.END_RETRY
                || wfAction.getStatus() == WorkflowActionBean.Status.END_MANUAL) {
            isRetry = true;
        }
        boolean isUserRetry = false;
        ActionExecutorContext context = new ActionXCommand.ActionExecutorContext(wfJob,isUserRetry);
        try {
          
            executor.end(context,wfAction); // 调用Executor来完成结束操作

            if (!context.isEnded()) {
                failJob(context);
            } else {
                wfAction.setRetries(0);
                wfAction.setEndTime(new Date());

                boolean shouldHandleUserRetry = false;
                Status slaStatus = null;
                switch (wfAction.getStatus()) {
                    case OK:
                        slaStatus = Status.SUCCEEDED;
                        break;
                    ......
                }
                if (!shouldHandleUserRetry || !handleUserRetry(context,wfAction)) {
                    SLAEventBean slaEvent = SLADbXOperations.createStatusEvent(wfAction.getSlaXml(),wfAction.getId(),slaStatus,SlaAppType.WORKFLOW_ACTION);
                    if(slaEvent != null) {
                        insertList.add(slaEvent);
                    }
                }
            }
            WorkflowInstance wfInstance = wfJob.getWorkflowInstance();
            DagELFunctions.setActionInfo(wfInstance,wfAction);
            wfJob.setWorkflowInstance(wfInstance);

            updateList.add(new UpdateEntry<WorkflowActionQuery>(WorkflowActionQuery.UPDATE_ACTION_END,wfAction));
            wfJob.setLastModifiedTime(new Date());
            updateList.add(new UpdateEntry<WorkflowJobQuery>(WorkflowJobQuery.UPDATE_WORKFLOW_STATUS_INSTANCE_MODIFIED,wfJob));
        }
        finally {
            try { 
                // 更新数据库的job信息
                BatchQueryExecutor.getInstance().executeBatchInsertUpdateDelete(insertList,null);
            }
            if (!(executor instanceof ControlNodeActionExecutor) && EventHandlerService.isEnabled()) {
                generateEvent(wfAction,wfJob.getUser());
            }
            new SignalXCommand(jobId,actionId).call(); // 进行跳转，进行下一个Action的执行
        }
        return null;
    }  
}