Hadoop - Overview
SequenceFile
- append-only(can’t seek to a specified key editing, adding or removing it like other key-value data structures like B-Trees)
- binary key-value pairs
3 formats:
- Uncompressed:
- Record Compressed: only 'values' are compressed here.
- Block Compressed: both keys and values are collected in 'blocks' separately and compressed. The size of the 'block' is configurable.
Map/reduce
- map: read from HDFS, output to local disk.
- reduce: read from the output of map, output to HDFS
HBase vs HDFS
HBase: low latency
Copy From Local
Configuration conf = new Configuration();
conf.addResource(new Path(pathHadoopCoreSite));
conf.addResource(new Path(pathHadoopHDFSSite));
FileSystem fs = FileSystem.get(conf);
Path src = new Path(pathLocal);
Path dst = new Path(pathHDFS);
fs.copyFromLocalFile(src, dst);