Configure Apache Spark Application by setting Spark properties before the SparkContext or SparkSession starts. A Spark application can be configured with SparkConf, with spark-submit options, or with shared values in spark-defaults.conf. This Apache Spark tutorial explains the common properties and where each value should be set.
Configure Apache Spark Application using Spark Properties
Following are the properties and configuration methods that could be used to tune and fit a spark application in the Apache Spark ecosystem. For version-specific defaults and advanced configuration keys, refer to the official Spark Configuration reference.
- Spark configuration sources and priority
- Spark Application Name
- Number of Spark Driver Cores
- Spark Driver’s Maximum Result Size
- Spark Driver’s Memory
- spark-submit and spark-defaults.conf
- Executor memory, master, deploy mode, local directory, logging, and listeners
- Apache Spark configuration FAQs
Spark configuration sources and priority
Spark properties can be supplied from more than one place. When the same property is supplied more than once, values set directly on SparkConf take the highest priority. Values passed through spark-submit options or a properties file come next. Values in spark-defaults.conf are used when no higher-priority value is supplied.
| Configuration place | Best used for |
|---|---|
SparkConf | Application name and runtime settings controlled by code. |
spark-submit | Per-run deployment values such as master, deploy mode, driver memory, executor memory, and driver cores. |
spark-defaults.conf | Cluster or team defaults shared by many Spark jobs. |
spark-env.sh | Per-machine environment values such as local disk directories. |
log4j2.properties | Spark logging format and log levels. |
Use explicit units for memory and time values. For example, write 512m, 2g, or 1t for memory where the property expects a size, and use supported time units such as ms, s, m, or h where the property expects a duration.
Application Name
Property Name : spark.app.name
Default value : (none)
This is the name that you could give to your spark application. This application name appears in the Web UI and logs, which makes it easy for debugging and visualizing when multiple spark applications are running on the machine/cluster.
Following is an example to set spark application name :
AppConfigureExample.java
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
/**
* Configure Apache Spark Application Name
*/
public class AppConfigureExample {
public static void main(String[] args) {
// configure spark
SparkConf conf = new SparkConf().setMaster("local[2]");
conf.set("spark.app.name", "SparkApplicationName");
// start a spark context
SparkContext sc = new SparkContext(conf);
// print the configuration
System.out.println(sc.getConf().toDebugString());
// stop the spark context
sc.stop();
}
}
Output
spark.app.id=local-1501222987079
spark.app.name=SparkApplicationName
spark.driver.host=192.168.1.100
spark.driver.port=44103
spark.executor.id=driver
spark.master=local[2]
You can also set the application name during submission with --name. That approach is useful when the same application JAR runs with different job names in different environments.
./bin/spark-submit --class com.tutorialkart.spark.AppConfigureExample --master local[2] --name "SparkApplicationName" target/spark-app.jar
Number of Spark Driver Cores
Property Name : spark.driver.cores
Default value : 1
Exception : This property is considered only in cluster mode.
It represents the maximum number of cores, a driver process may use.
Following is an example to set number spark driver cores.
AppConfigureExample.java
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
public class c{
public static void main(String[] args) {
// configure spark
SparkConf conf = new SparkConf().setMaster("local[2]");
conf.set("spark.app.name", "SparkApplicationName");
conf.set("spark.driver.cores", "2");
// start a spark context
SparkContext sc = new SparkContext(conf);
// print the configuration
System.out.println(sc.getConf().toDebugString());
// stop the spark context
sc.stop();
}
}
Output
spark.app.id=local-1501223394277
spark.app.name=SparkApplicationName
spark.driver.cores=2
spark.driver.host=192.168.1.100
spark.driver.port=42100
spark.executor.id=driver
spark.master=local[2]
The value is visible in the configuration output, but it affects the actual driver process only when the driver is launched in cluster mode. For submitted cluster jobs, prefer the submit-time option.
./bin/spark-submit --class com.tutorialkart.spark.AppConfigureExample --master spark://spark-master:7077 --deploy-mode cluster --driver-cores 2 target/spark-app.jar
Driver’s Maximum Result Size
Property Name : spark.driver.maxResultSize
Default value : 1g (meaning 1 GB)
Exception : Minimum 1MB
This is the upper limit on the total sum of size of serialized results of all partitions for each Spark action. Submitted jobs abort if the limit is exceeded. Setting it to 0 means there is no upper limit, but that can allow large results to cause out-of-memory errors in the driver.
Following is an example to set Maximum limit on Spark Driver’s memory usage.
AppConfigureExample.java
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
public class AppConfigureExample {
public static void main(String[] args) {
// configure spark
SparkConf conf = new SparkConf().setMaster("local[2]");
conf.set("spark.app.name", "SparkApplicationName");
conf.set("spark.driver.maxResultSize", "200m");
// start a spark context
SparkContext sc = new SparkContext(conf);
// print the configuration
System.out.println(sc.getConf().toDebugString());
// stop the spark context
sc.stop();
}
}
Output
spark.app.id=local-1501224103438
spark.app.name=SparkApplicationName
spark.driver.host=192.168.1.100
spark.driver.maxResultSize=200m
spark.driver.port=35249
spark.executor.id=driver
spark.master=local[2]
Use this setting as a protective limit. If a job frequently reaches this limit, avoid returning large datasets to the driver with actions such as collect(). Write the result to storage or aggregate the data further.
Driver’s Memory Usage
Property Name : spark.driver.memory
Default value : 1g (meaning 1 GB)
Exception : If spark application is submitted in client mode, the property has to be set via command line option –driver-memory.
This is the amount of heap memory used by the Spark driver, where SparkContext is initialized. In client mode, the driver JVM has already started before your application code creates SparkConf, so setting spark.driver.memory inside the program does not resize the running driver JVM.
Following is an example to set Maximum limit on Spark Driver’s memory usage.
AppConfigureExample.java
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
public class AppConfigureExample {
public static void main(String[] args) {
// configure spark
SparkConf conf = new SparkConf().setMaster("local[2]");
conf.set("spark.app.name", "SparkApplicationName");
conf.set("spark.driver.memory", "600m");
// start a spark context
SparkContext sc = new SparkContext(conf);
// print the configuration
System.out.println(sc.getConf().toDebugString());
// stop the spark context
sc.stop();
}
}
Output
spark.app.id=local-1501225134344
spark.app.name=SparkApplicationName
spark.driver.host=192.168.1.100
spark.driver.memory=600m
spark.driver.port=43159
spark.executor.id=driver
spark.master=local[2]
For client-mode submissions, set driver memory before the JVM starts.
./bin/spark-submit --class com.tutorialkart.spark.AppConfigureExample --master yarn --deploy-mode client --driver-memory 2g target/spark-app.jar
Configure Spark application with spark-submit and spark-defaults.conf
spark-submit is the usual entry point for running a Spark application outside an IDE. Use dedicated options when available, and use --conf for Spark properties that do not have a dedicated option.
./bin/spark-submit --class com.tutorialkart.spark.AppConfigureExample --master yarn --deploy-mode cluster --name "DailySparkJob" --driver-memory 2g --executor-memory 4g --conf spark.driver.maxResultSize=512m target/spark-app.jar
Use spark-defaults.conf when several jobs should share the same defaults. Each line contains a property name and value separated by whitespace.
spark.master yarn
spark.submit.deployMode cluster
spark.executor.memory 4g
spark.driver.memory 2g
spark.driver.maxResultSize 512m
spark.logConf false
Common Spark properties for executors, master, local directory, logging, and listeners
| Spark property | Default | Purpose |
|---|---|---|
spark.executor.memory | 1g | Heap memory for each executor process. Set it with --executor-memory for submitted jobs. |
spark.master | (none) | Cluster manager URL, such as local[2], spark://host:7077, yarn, or k8s://https://host:port. |
spark.submit.deployMode | client | Defines whether the driver runs on the submitting machine or inside the cluster. |
spark.local.dir | /tmp | Local scratch space for shuffle files and spilled data. Cluster managers may override this with environment variables. |
spark.logConf | false | Logs the effective SparkConf at startup when set to true. |
spark.extraListeners | (none) | Comma-separated listener classes for custom monitoring, auditing, or metrics. |
spark.log.callerContext | (none) | Concise application context written to YARN/HDFS audit logs when applicable. |
spark.driver.supervise | false | Restarts the driver automatically in Spark standalone cluster mode if it fails with a non-zero status. |
After startup, verify the final values in the Spark web UI under the Environment tab. In local mode, the UI is usually available at http://localhost:4040 while the application is running. You can also print sc.getConf().toDebugString(), as shown in the examples above.
Apache Spark application configuration QA checklist
- The application name is clear enough to identify the job in Spark UI and logs.
- Driver memory is set with
--driver-memoryfor client-mode jobs. - Executor memory is configured separately from driver memory.
spark.driver.maxResultSizeis used as a safety limit, not as a substitute for better result handling.- The effective Spark properties are verified in the Spark UI or startup logs.
FAQs about configuring Apache Spark application
How to configure Apache Spark application?
You can configure an Apache Spark application using SparkConf, spark-submit options, and spark-defaults.conf. Use SparkConf for application-controlled settings and spark-submit for deployment values such as master, deploy mode, driver memory, executor memory, and driver cores.
Which Spark configuration source has the highest priority?
Values set directly on SparkConf have the highest priority. Values passed through spark-submit or a properties file come next. Values in spark-defaults.conf are used when no higher-priority value is supplied.
Why does spark.driver.memory not change when set inside application code?
In client mode, the driver JVM has already started before the application creates SparkConf. Therefore, setting spark.driver.memory in code does not resize the JVM. Set driver memory with --driver-memory or a defaults/properties file before startup.
Where can I view the final Spark configuration values?
Open the Spark application web UI and check the Environment tab. You can also print sc.getConf().toDebugString() or enable spark.logConf=true while debugging configuration issues.
Should spark.driver.maxResultSize be set to 0?
Setting spark.driver.maxResultSize to 0 removes the limit, but it can expose the driver to out-of-memory errors. Prefer reducing the data returned to the driver or writing results to storage instead of collecting very large results.
Summary of Apache Spark application configuration
In this Apache Spark Tutorial, we learned how to configure a Spark application using SparkConf, spark-submit, and spark-defaults.conf. We also discussed common Spark properties including spark.app.name, spark.driver.cores, spark.driver.maxResultSize, spark.driver.memory, spark.executor.memory, spark.master, spark.submit.deployMode, spark.local.dir, spark.logConf, spark.extraListeners, spark.log.callerContext, and spark.driver.supervise.
TutorialKart.com