Sunday, April 19, 2009

Hadoop Sequence Files

One way to store data that Hadoop works with is through sequence files. These are files, possibly compressed, containing pairs of Writable key/values. Hadoop has ways of splitting sequence files for doing jobs in parallel, even if they are compressed, making them a convenient way of storing your data without making your own format. To import your own code into a sequence file, execute something like:



Path path = new Path("filename.of.sequence.file");
org.apache.hadoop.fs.RawLocalFileSystem fs = new org.apache.hadoop.fs.RawLocalFileSystem();

SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, path, Text.class, BytesWritable.class);

for(loop-through-data-here}
writer.append(new Text("key"), new BytesWritable("Value"));



Obviously, in the example above, the key is of type Text and the value of type BytesWritable; you can use other types.

No comments:

Post a Comment