First, you need to get an Amazon AWS account and set up the EC2 API on your machine; Amazon's instructions are pretty clear. You also need to download and unzip the latest version of Hadoop on a local machine. For convenience, add the
src/contrib/ec2/binsubdirectory of hadoop to your path.
Now, you must edit
src/contrib/ec2/bin/hadoop-ec2-env.sh
and set up variables specifying your AWS credentials. For KEYNAME, use the name that you selected when you created a new key; the file should be called id_rsa-$KEYNAME tough you can change that on the PRIVATE_KEY_PATH line. You shouldn't need to change anything other than the first 5 variables in the file.
If you've done things correctly, you should now be able to run
hadoop-ec2 launch-cluster my-cluster 2
which should churn for a while, create the right groups, permissions, etc and boot up all the machines. Your cluster is good to go. "my-cluster" is the name that you chose for the cluster and can be anything you want; 2 is the number of slave nodes to run. If you have an existing cluster running, you can add machines to it by running
hadoop-ec2 launch-slaves my-cluster n
to launch n more slaves for that cluster. It will take a little time for them boot up and join the cluster, but in a few minutes, they should join.
The cluster can be turned off with
hadoop-ec2 terminate-cluster my-cluster
The next hadoop-related post will talk about how the cluster can e used and how data can be imported/exported to S3.
No comments:
Post a Comment