Spark - Configuration
Here lists ways to set and get configs.
Spark Configs
Spark related configs should be set in a SparkConf
object. There are 3 options:
Option 1: set when calling spark-submit
Use --conf KEY=VALUE
Option 2: set in code
Option 3: set in file(defaults)
Ensure that SPARK_HOME
, SPARK_CONF_DIR
correctly set.
$SPARK_CONF_DIR
can be set to$SPARK_HOME/conf
-
or make a copy of
$SPARK_HOME/conf
to somewhere else, the benefit is that multiple Spark installations (versions) can use the same conf folder, and no changes when upgrading to a new version
Then the config files can be found in:
$SPARK_CONF_DIR/spark-defaults.conf
$SPARK_CONF_DIR/spark-env.sh
(they are not there by default, instead they are called spark-defaults.conf.template
and spark-env.sh .template
, just make a copy and rename them)
In spark-env.sh
, HADOOP_CONF_DIR
should be defined if you want to run spark in yarn
mode:
HADOOP_CONF_DIR=/path/to/hadoop/conf
Hadoop/YARN/HDFS Configs
Ensure that HADOOP_HOME
, HADOOP_CONF_DIR
and/or YARN_CONF_DIR
are correctly set.
Hadoop configs:
$HADOOP_CONF_DIR/core-site.xml
$HADOOP_CONF_DIR/hdfs-site.xml
To get Hadoop Configs from code:
print Hadoop Config:
val hadoopConf = sc.hadoopConfiguration.iterator()
while (hadoopConf.hasNext) {
println(hadoopConf.next().toString())
}
Print Configs
To print SparkConfig:
sc.getConf.toDebugString
Spark SQL Configs
Set SQL Configs: SET key=value;
sqlContext.sql("SET spark.sql.shuffle.partitions=10;")
View SQL Configs:
val sqlConf = sqlContext.getAllConfs
sqlConf.foreach(x => print(x._1 + " : " + x._2))
Extra Classpath
- ./spark-submit with --driver-class-path to augment the driver classpath
spark.executor.extraClassPath
to augment the executor classpath- or copy the jars to
$SPARK_HOME/jars
folder
Logging Configs
Logging can be set in $SPARK_CONF_DIR/log4j.properties
Enable DEBUG logging level for org.apache.spark.SparkEnv:
log4j.logger.org.apache.spark.SparkEnv=DEBUG
Other Configs
in spark-defaults.conf
:
spark.yarn.dist.files $SPARK_HOME/conf/metrics.properties