Apache Hama Interview Questions and Answers

Apache Hama is a framework for Big Data analytics which uses the Bulk Synchronous Parallel (BSP) computing model, which was established in 2012 as a Top-Level Project of The Apache Software Foundation.

It provides not only pure BSP programming model but also vertex and neuron centric programming models, inspired by Google's Pregel and DistBelief.

1) I get ": hostname nor servname provided, or not known" error on Cygwin/Windows.

Answer)You can fix this by changing hama.zookeeper.quorum variable 'localhost' to '127.0.0.1'.

2) I get ": Incorrect header or version mismatch from 127.0.0.1:52772 got version 3 expected version 4." while starting

Answer)Please use a release of Hadoop that is compatible with the Hama Release.

3)I get ": FATAL org.apache.hama.BSPMasterRunner: java.net.UnknownHostException: Invalid hostname for server: local" while starting.

Answer) This is the case if you're in the local-mode and tried to launch Hama via the start script. In this mode, nothing has to be launched. A multithreaded running utility will start when submitting your job.

4)When I submit a job, I see that it fails immediately without running a task.

Answer)Please look in your BSPMaster.log in the log directory under $HAMA_HOME/logs/hama-$USER-bspmaster-$HOSTNAME.log. If you see a line equal to

2012-07-28 17:45:34,708 ERROR org.apache.hama.bsp.SimpleTaskScheduler: Scheduling of job test.jar could not be done successfully. Killing it!

the scheduler could not schedule your job, because you don't have enough resources (task slots) in your cluster available. So watch closely while submitting the job, if it says

2012-07-28 17:45:34 INFO bsp.FileInputFormat: Total # of splits: 4

and your cluster shows (for example in the web UI) only 3 slots that are free, our scheduler could not successfully schedule all the tasks. If you are familiar with Hadoop, you will be confused with this behaviour. Mainly because BSP needs the tasks to run in parallel, whereas in MapReduce the map tasks are not depending on each other (so they can be processed after each other). We are sorry for the not existing error message and will fix this in near future.

5)Is there any restriction on max message sent before sync()?

Answer)In Mem-based queue case, messages is kept in memory, therefore it depends on memory available. In Spilling queue case, there's no limits.

6) java.lang.IllegalArgumentException: Messages must never be behind the vertex in ID!

Answer)This exception will be throwed out when received message belongs to non-existent vertex (dangling links). To ignore them, set "hama.check.missing.vertex" to false.

Apache Hama Interview Questions and Answers

You may also be interested in