Apache Oozie Interview Questions and Answers

Apache Oozie is a server-based workflow scheduling system to manage Hadoop jobs.Workflows in Oozie are defined as a collection of control flow and action nodes in a directed acyclic graph. Control flow nodes define the beginning and the end of a workflow (start, end, and failure nodes) as well as a mechanism to control the workflow execution path (decision, fork, and join nodes).


1)What is Oozie?


Answer)Oozie is a workflow scheduler for Hadoop Oozie allows a user to create Directed Acyclic Graphs of workflows and these can be ran in parallel and sequential in Hadoop.It can also run plain java classes, Pig workflows and interact with the HDFS .It can run jobs sequentially and in parallel.


2)Why use oozie instead of just cascading a jobs one after another?


Answer)Major Flexibility :Start ,stop ,re-run and suspend

Oozie allows us to restart from failure


3)How to make a workflow?


Answer)First make a Hadoop job and make sure that it works Make a jar out of classes and then make a workflow.xml file and copy all of the job configuration properties in to the xml file.

Input files

Output files

Input readers and writers

mappers and reducers

job specific arguments

job.properties


4)What are the properties that we have to mention in .Properties?


Answer)Name Node

Job Tracker

Oozie.wf.application.path

Lib Path

Jar Path


5)What is application pipeline in Oozie?


Answer)It is necessary to connect workflow jobs that run regularly, but at different time intervals. The outputs of multiple subsequent runs of a workflow become the input to the next workflow. Chaining together these workflows result it is referred as a data application pipeline.


6)How to run Oozie?


Answer)$ oozie job -oozie http://172.20.95.107:11000(oozie server node)/oozie -config job.properties -run

This will give the job id.

To know the status: $ oozie job -oozie http://172.20.95.107:11000(oozie server node)/oozie -info job id


7)What are all the actions can be performed in Oozie?


Answer)Email Action

Hive Action

Shell Action

Ssh Action

Sqoop Action

Writing a custom Action Executor


8)Why we use Fork and Join nodes of oozie?


Answer)A fork node splits one path of execution into multiple concurrent paths of execution.

A join node waits until every concurrent execution path of a previous fork node arrives to it.

The fork and join nodes must be used in pairs. The join node assumes concurrent execution paths are children of the same fork node.


Launch your GraphyLaunch your Graphy
100K+ creators trust Graphy to teach online
Learn Bigdata, Spark & Machine Learning | SmartDataCamp 2024 Privacy policy Terms of use Contact us Refund policy