anasanjaria anasanjaria - 5 months ago 23
Java Question

Container allocation code in YARN (Hadoop)

I am trying to tinker with the YARN container allocation code. By container allocation, I mean the decision to place the container on a specific machine in the cluster.

I want to write my own container allocation code. To begin with, I am running Hadoop in pseudo-distributed mode with YARN. I am trying to locate the relevant points in the source code. So far, using print statements, I have been able to pinpoint the class

hadoop-source-code/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationMasterProtocolPBClientImpl.java#allocate
where allocation takes place. However, I am unable to narrow it down further. Going further into this method, I have not been able to print anything.

To recap- I would like to locate the exact point in the Hadoop source code where I would need to write my own code to replace the existing container allocation mechnism.

Answer
I have not been able to print anything

At first, I thought logging is application specific but all information related to resource manager is under log file named hadoop-{username}-resourcemanager-{username}.log under log folder. Instead of print statement, I used LOG.info for debugging.

Location of allocation mechanism in hadoop source code

I am using FIFO scheduler and allocation mechanism is under method FifoScheduler#assignContainersOnNode which is called from FifoScheduler#assignContainers which is called from FifoScheduler#nodeUpdate method.

There is FifoScheduler#handle method (more information here), which keeps on tracking of different events. NODE_UPDATE is among one of those which is triggered often and hence assignment of container on given node takes place.