Spark - YARN
yarn-cluster vs yarn-client
-
yarn-cluster: the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application.
-
yarn-client: the driver runs in the client process, and the application master is only used for requesting resources from YARN
--master
- Spark standalone and Mesos modes:
--master <master’s address>
- YARN: --master yarn. The address will be picked up from Hadoop configs
YARN Commands
Show logs
$ yarn logs -applicationId <applicationId>
List all running nodes(only nodes with Node-State as RUNNING)
$ yarn node -list
List all nodes(not limited to RUNNING nodes, but also LOST, DECOMMISSIONED, etc)
$ yarn node -list -all
Check queue status
$ yarn queue -status default
Queue Name : default
State : RUNNING
Capacity : 4.9%
Current Capacity : 91.2%
Maximum Capacity : 50.0%
Default Node Label expression :
Accessible Node Labels : *