start the master

./sbin/start-master.sh

this command starts the master service. it has two points of interaction that are relevent here - a web ui resource and a job uri. The web ui can be accessed at localhost:8080.

start the slaves. limit each instance to 2 cores and 4G ram. create three instances.

SPARK_WORKER_INSTANCES=3 ./sbin/start-slave.sh spark://your-computer-name.local:7077 -c 2 -m 4G

load the python shell (interfacing with master)

./bin/pyspark --master spark://your-computer-name.local:7077

sources

  1. spark xml
  2. spark csv
  3. pyspark java integration
  4. standalone docs