This repo contains some experimental code for putting the Consellation runtime system on YARN/HDFS
The nl.esciencecenter package contains a simple example application that reads an input file and computes a hash of each block.
To run this example you need a machine which has the following software installed:
- Java 1.7 or higher
- ant 1.7 or higher
- Hadoop 2.5 or higher
Note that Hadoop must be set up properly, i.e., the $HADOOP_HOME environment variable must be set up. In addition, make sure you
use the Java version that is used to run Hadoop to compile this example. Using a different version may result in unsupported version
exceptions.
After cloning this example, change to the directory and compile using:
ant
This compiles the example, and creates a ./dist directory containing a ConstellationOnYARN.jar plus all dependencies.
To run the example you will first need to put an input file in HDFS. As usual with Hadoop, bigger is better. You can use an ubuntu source image for example:
wget http://ftp.acc.umu.se/mirror/cdimage.ubuntu.com/releases/xenial/release/source/ubuntu-16.04-src-1.iso
Next put this image into HDFS for example (after replacing /user/jason with your home directory in HDFS):
hdfs dfs -copyFromLocal ubuntu-16.04-src-1.iso /user/jason
Next run the example using (after replacing /user/jason with your home directory in HDFS twice):
java -cp ./dist/ConstellationOnYarn.jar:`yarn classpath` nl.esciencecenter.ConstellationSubmitter /user/jason/ ./dist /user/jason/ubuntu-16.04-src-1.iso true 1
This starts the nl.esciencecenter.ConstellationSubmitter example locally, which connects to the YARN scheduler and submits the example. This class
expects the following command-line parameters:
- The HDFS root directory to which the dependencies should be staged-in,
/user/jasonin this example. - The local directory containing the dependencies that need to be staged in. Use
./distto run this example. - The input file (on HDFS) to process,
/user/jason/ubuntu-16.04-src-1.isoin this example. - true/false, indicating whether an attempt should be made to create precise activity contexts.
- The number of YARN Workers to submit,
1in this example.
The example first copies the ./dist directory to HDFS, as these files are needed on the YARN nodes to run the example. It then stubmits the job to YARN,
and waits for it to finish.
The example code can be found in 'src' and consists of the following packages:
nl.esciencecenterthe main example codenl.esciencecenter.constellationthe constellation applicationnl.esciencecenter.yarna somewhat generic library to access YARN
Like most YARN applications, running this application requires three parts:
nl.esciencecenter.ConstellationSubmitter: the main application that connects to the YARN scheduler and submits thenl.esciencecenter.ApplicationMasternl.esciencecenter.ApplicationMaster: the master application that runs on a YARN node, aquires the nodes to run one or morenl.esciencecenter.constellation.ConstellationWorkerapplications and starts them, starts anl.esciencecenter.constellation.ConstellationMasteritself, and waits for everything to finish.nl.esciencecenter.constellation.ConstellationWorker: the worker application that is started on the YARN nodes to processes one or morenl.esciencecenter.constellation.SHA1Jobs.
In addition, the following classes are used:
nl.esciencecenter.constellation.ConstellationMaster: the Constellation master application which starts a Constellation, submits the jobs, and gathers the results.nl.esciencecenter.constellation.SHA1Job: a job that is created by the ConstellationMaster and executed on one of the ConstellationWorkder. Each of these jobs reads a block of data from a file in HDFS, and calculates the SHA1 hash of this block.nl.esciencecenter.constellation.SHA1Result: a result object that is returned by each SHA1Job to the ConstellationMaster.nl.esciencecenter.yarn.YarnSubmitter: a generic class used by 'ConstellationOnYarn' to submit the application to YARN.nl.esciencecenter.yarn.YarnMaster: a generic class used by 'ApplicationMaster' to submit the workers to YARN.nl.esciencecenter.yarn.Utils: various utility methods used by 'ApplicationSubmitter' and 'ApplicationMaster'.