For example, the main code for a word count app looks like
object WordCount extends ScalaHadoopTool{
def run(args: Array[String]) : Int = {
(MapReduceTaskChain.init() -->
IO.Text[LongWritable, Text](args(0)).input -->
MapReduceTask.MapReduceTask(TokenizerMap1, SumReducer1) -->
IO.Text[Text, LongWritable](args(1)).output) execute;
return 0;
}
}
with the mapper and reducers looking like
object TokenizerMap1 extends TypedMapper[LongWritable, Text, Text, LongWritable] {
override def doMap : Unit =
v split " |\t" foreach ((word) => context.write(word, 1L))
}
object SumReducer1 extends TypedReducer[Text, LongWritable, Text, LongWritable] {
override def doReduce : Unit =
context.write(k, (0L /: v) ((total, next) => total+next))
}
It's possible to chain map and reduce tasks together, use custom file formats, etc but a lot of the features aren't documented yet.