MIDACO Parallelization in Python (Spark)

MIDACO 6.0 Python Spark Gateway

midaco.py.txt

Above gateway uses Spark instead of Multiprocessing

in order to execute several solution candidats in parallel.

MIDACO parallelization with Apache Spark is especially useful for (massive) parallelization on cluster- and cloud-computing systems which consist of individual virtual machines (VM). Such system are provided for example by Amazon EC2, Google Cloud, IBM Cloud, Digital Ocean and many academic institutions.

Setup a Spark Cluster (Linux)
Step 0.1	Setup several virtual machines (VM). Each VM with its own IP, for example: IP-VM1 = 100.100.10.01, IP-VM2 = 100.100.10.02, IP-VM3 = 100.100.10.03 Ensure that each VM can access each other via SSH Keys
Step 0.2	Download Spark: https://spark.apache.org/downloads.html For example: spark-2.3.0-bin-hadoop2.7.tgz (pre-built for Apache Hadoop)
Step 0.3	Store a copy of the unzipped spark folder on every VM. Name it for example "spark"
Step 0.4	Select one VM (e.g. 100.100.10.01) as master node by executing the command: ./spark/sbin/start-master.sh --host 100.100.10.01
Step 0.5	Select each other VM as slave node by executing the command: ./spark/sbin/start-slave.sh spark://100.100.10.01:7077
Step 0.6	The Spark cluster should now be up and running. Visiting the address 100.100.10.01:8080 in a web-browser should now look something like this
Running MIDACO on the Spark Cluster
Step 1	Download above MIDACO python spark gateway and remove .txt extension
Step 2	Download appropriate library file midacopy.dll or midacopy.so here
Step 3	Download an example (e.g. example.py) and remove .txt extension
Step 4	Execute MIDACO on the Spark cluster with a command like this: ./spark/bin/spark-submit --master spark://100.100.10.01:7077 example.py
Note: The advanced Text-I/O examples are particularly well suited to be used with Spark

Screenshot of MIDACO running on a Spark Cluster with 32 Quad-Core CPU's

Comprehensive MIDACO Spark step-by-step instruction is [ under construction ]

These are some preliminary bash scripts for a 36 machine spark cluster:

spark_setup_commands.sh

run_example_on_spark.sh

Note that Spark relevant commands inside midaco.py.txt itself are minimal:

[Line 24]   from pyspark import SparkContext
[Line 237]   sc = SparkContext(appName="MIDACO-SPARK-PARALLEL")
[Line 254]   rdd = sc.parallelize( A , p ).map(lambda x: problem_function(x))
[Line 256]   B = rdd.take(p)

MIDACO-SOLVER Numerical High-Performance Optimization Software

Nav view search

Navigation

Search

MIDACO Parallelization in Python (Spark)