Hadoop - Configuration
Configuration Files
Hadoop will use default settings if not told otherwise in site.
Config files can be found in either $HADOOP_HOME/conf
or $HADOOP_CONF_DIR
- yarn-site.xml
- core-site.xml
- hdfs-site.xml
Core
fs.default.name
The default setting is local file system, do not be surprised if you see your local files when calling $ hadoop fs -ls
<property>
<name>fs.default.name</name>
<value>file:///</value>
</property>
Set it to HDFS:
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
fs.trash.interval
Enable trash bin (disabled by default) (1440 min = 24 hr)
<property>
<name>fs.trash.interval</name>
<value>1440</value>
</property>
hadoop.tmp.dir
The default tmp folder is /tmp/hadoop-${user.name}
in default:
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop-${user.name}</value>
</property>
Add the following in site to override:
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop</value>
</property>
HDFS
default: $HADOOP_HOME/src/hdfs/hdfs-default.xml
site: $HADOOP_HOME/conf/hdfs-site.xml
dfs.name.dir / dfs.data.dir
Set the folder for namenode and datanode. ${hadoop.tmp.dir}/dfs/name
and ${hadoop.tmp.dir}/dfs/data
will be used by default. Set to other folders if you want:
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/dfs/data</value>
</property>
dfs.replication
Set it to one for pseudo-cluster:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
MapReduce
default: $HADOOP_HOME/src/mapred/mapred-default.xml
site: $HADOOP_HOME/conf/mapred-site.xml
mapred.job.tracker
The host and port that the MapReduce job tracker runs at. It is set as "local" in default, meaning jobs are run in-process as a single map and reduce task::
<property>
<name>mapred.job.tracker</name>
<value>local</value>
</property>
Set it to localhost
in site::
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
</property>
hadoop-env.sh
Remember to set JAVA_HOME
in this file
Auto-completion
Add to ~/.bashrc
or ~/.bash_profile
## Autocompletion for HDFS
# hdfs(1) completion
have()
{
unset -v have
PATH=$PATH:/sbin:/usr/sbin:/usr/local/sbin type $1 &>/dev/null &&
have="yes"
}
have hadoop &&
_hdfs()
{
local cur prev
COMPREPLY=()
cur=${COMP_WORDS[COMP_CWORD]}
prev=${COMP_WORDS[COMP_CWORD-1]}
if [[ "$prev" == hdfs ]]; then
COMPREPLY=( $( compgen -W '-ls -lsr -du -dus -count -mv -cp -rm \
-rmr -expunge -put -copyFromLocal -moveToLocal -mkdir -setrep \
-touchz -test -stat -tail -chmod -chown -chgrp -help' -- $cur ) )
fi
if [[ "$prev" == -ls ]] || [[ "$prev" == -lsr ]] || \
[[ "$prev" == -du ]] || [[ "$prev" == -dus ]] || \
[[ "$prev" == -cat ]] || [[ "$prev" == -mkdir ]] || \
[[ "$prev" == -put ]] || [[ "$prev" == -rm ]] || \
[[ "$prev" == -rmr ]] || [[ "$prev" == -tail ]] || \
[[ "$prev" == -cp ]]; then
if [[ -z "$cur" ]]; then
COMPREPLY=( $( compgen -W "$( hdfs -ls / 2>-|grep -v ^Found|awk '{print $8}' )" -- "$cur" ) )
elif [[ `echo $cur | grep \/$` ]]; then
COMPREPLY=( $( compgen -W "$( hdfs -ls $cur 2>-|grep -v ^Found|awk '{print $8}' )" -- "$cur" ) )
else
COMPREPLY=( $( compgen -W "$( hdfs -ls $cur* 2>-|grep -v ^Found|awk '{print $8}' )" -- "$cur" ) )
fi
fi
} &&
complete -F _hdfs hdfs
unset have