Physics Data Reduction With Spark 20190124
Physics Data Reduction With Spark 20190124
250 250
200 200
Runtime (Minutes)
Runtime (Minutes)
150 150
100 100
Graph 5: Number of concurrent active tasks throughout job execution for 1 PB of
input with 64 KB of “readAhead” buffer, 100 Spark executors, and 8 logical cores
50 50 per executor.
0 0
22 TB 44 TB 66 TB 88 TB 110 TB 22 TB 44 TB 110 TB 220 TB 1 PB EOS Service
Dataset Size
Dataset Size A disk-based, low-latency storage service with a highly-scalable
Graph 1: Runtime performance in minutes for different input size Graph 2: Runtime performance in minutes for different input size hierarchical namespace, which enables data access through the
with 407 Spark executors, 2 cores per Spark executor, 7 GB per Spark with 100 Spark executors, 8 cores per Spark executor, 7 GB per Spark XRootD protocol. It provides storage for both physics and user use
executor. The “readAhead” connector buffer is set to 32 MB. executor. The “readAhead” connector buffer is set to 64 KB which cases via different service instances such as EOSPUBLIC, EOSCMS etc.
drastically improved the performance compared to Graph 1.