Abstract
Like other emerging fields, Stream Processing Engines (SPEs) pose several challenges to the researchers such as resource awareness, dynamic configurations, heterogeneous clusters, and load balancing. All of these aspects play a major role in the job scheduling process. Inefficiency in any of them causes problems for achieving the maximum throughput. SPEs must contemplate other aspects like resource provisioning, job’s computation requirement, physical distance between communicating nodes, etc. Currently, SPEs ignore topology’s structure as well as inter-executor traffic while scheduling. Due to this, frequently communicating tasks may end up at different computing nodes which increases network latency. In this paper, A3-Storm, a scheduler, based on topology and traffic is proposed that optimizes resource usage for heterogeneous clusters. The aim is to improve efficiency using resource-aware task assignments that results in enhanced throughput and resource utilization. A3-Storm schedules topology using inter-executor traffic and supervisor node’s computing power. A3-Storm is divided into two phases: in the first phase, executors are logically grouped to minimize inter-group communication traffic according to the topology structure or inter-executor traffic. In the second phase, these groups are assigned to physical nodes starting from the most powerful node. Apache Storm (a popular open-source SPE) is used for the implementation of A3-Storm. Results are generated with the help of 2 benchmark topologies, and results are compared with 3 state-of-the-art algorithms. Extensive experiment results show up to 25% and 12% improvement in throughput as compared to the default Storm scheduler and resource-aware scheduler, respectively, with a significant amount of resource savings through consolidation.




























Similar content being viewed by others
References
Chen M, Mao S, Liu Y (2014) Big data: a survey. Mob Netw Appl 19:171–209. https://doi.org/10.1007/s11036-013-0489-0
Alotaibi S, Mehmood R, Katib I (2020) The role of big data and twitter data analytics in healthcare supply chain management. Springer, Cham, pp 267–279
Casado R, Younas M (2015) Emerging trends and technologies in big data processing. Concurr Comput 27:2078–2091. https://doi.org/10.1002/cpe.3398
Apache (2015) Storm. Apache storm. In: Apache. http://storm-project.net/. Accessed Mar 2020
Apache Software Foundation (2014) S4 incubation status—Apache incubator. http://incubator.apache.org/projects/s4.html. Accessed Mar 2020
Apache Software Foundation (2018) Apache Spark™—unified analytics engine for big data. In: Apache Spark. https://spark.apache.org/. Accessed Mar 2020
Philip Chen CL, Zhang CY (2014) Data-intensive applications, challenges, techniques and technologies: a survey on Big Data. Inf Sci (NY) 275:314–347. https://doi.org/10.1016/j.ins.2014.01.015
Apache Storm Scheduler. http://storm.apache.org/releases/2.0.0/Storm-Scheduler.html. Accessed 19 Mar 2020
Aniello L, Baldoni R, Querzoni L (2013) Adaptive online scheduling in storm. In: DEBS 2013—Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems. ACM, pp 207–218
Eskandari L, Mair J, Huang Z, Eyers D (2018) Poster: iterative scheduling for distributed stream processing systems. In: DEBS 2018—Proceedings of the 12th ACM International Conference on Distributed and Event-Based Systems. ACM Press, New York, New York, USA, pp 234–237
Eskandari L, Mair J, Huang Z, Eyers D (2018) T3-Scheduler: a topology and Traffic aware two-level Scheduler for stream processing systems in a heterogeneous cluster. Futur Gener Comput Syst 89:617–632. https://doi.org/10.1016/j.future.2018.07.011
Light J (2017) Energy usage profiling for green computing. In: Proceeding of the IEEE International Conference on Computing, Communication and Automation, ICCCA 2017 2017-Janua, pp 1287–1291. https://doi.org/10.1109/CCAA.2017.8230017
Liu X, Buyya R (2018) D-Storm: dynamic resource-efficient scheduling of stream processing applications. In: Proceedings of the International Conference on Parallel Distributed System—ICPADS 2017-Decem, pp 485–492. https://doi.org/10.1109/ICPADS.2017.00070
Fan J, Chen H, Hu F (2016) Adaptive task scheduling in storm. In: Proceedings of 2015 4th International Conference on Computer Science and Network Technology, ICCSNT 2015. IEEE, pp 309–314
Peng B, Hosseini M, Hong Z, et al (2015) R-storm: resource-aware scheduling in storm. In: Middleware 2015—Proceedings of the 16th Annual Middleware Conference. ACM, pp 149–161
Weng Z, Guo Q, Wang C et al (2017) AdaStorm: resource efficient storm with adaptive configuration. In: Proceedings of the International Conference on Data Engineering. IEEE, pp 1363–1364
Li C, Zhang J (2017) Real-time scheduling based on optimized topology and communication traffic in distributed real-time computation platform of storm. J Netw Comput Appl 87:100–115. https://doi.org/10.1016/j.jnca.2017.03.007
Eskandari L, Huang Z, Eyers D (2016) P-scheduler: adaptive hierarchical scheduling in Apache Storm. In: ACM International Conference Proceeding Series. ACM, pp 1–10
Xu J, Chen Z, Tang J, Su S (2014) T-storm: traffic-aware online scheduling in storm. In: Proceedings of the International Conference on Distributed Computing Systems. IEEE, pp 535–544
Apache Zookeeper (2016) Apache ZooKeeper—Home. https://zookeeper.apache.org/. Accessed Mar 2020
Noll MG (2012) Understanding the parallelism of a storm topology what is storm ? http://storm.apache.org/releases/2.1.0/Understanding-the-parallelism-of-a-Storm-topology.html. Accessed Mar 2020
Van Der Veen JS, Van Der Waaij B, Lazovik E et al (2015) Dynamically scaling apache storm for the analysis of streaming data. In: Proceedings of the 2015 IEEE 1st International Conference on Big Data Computing Service and Applications, BigDataService 2015. IEEE, pp 154–161
Zhang J, Li C, Zhu L, Liu Y (2017) The real-time scheduling strategy based on traffic and load balancing in storm. In: Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications, 14th IEEE International Conference on Smart City and 2nd IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2016. IEEE, pp 372–379
Smirnov P, Melnik M, Nasonov D (2017) Performance-aware scheduling of streaming applications using genetic algorithm. Procedia Comput Sci 108:2240–2249. https://doi.org/10.1016/j.procs.2017.05.249
Madsen KGS, Zhou Y (2015) Dynamic resource management in a Massively Parallel Stream Processing Engine. In: International Conference on Information and Knowledge Management, Proceedings. ACM, pp 13–22
Khalid YN, Aleem M, Prodan R et al (2018) E-OSched: a load balancing scheduler for heterogeneous multicores. J Supercomput 74:5399–5431. https://doi.org/10.1007/s11227-018-2435-1
Floating Point Operations Per Second (FLOPS) Definition. https://techterms.com/definition/flops. Accessed Mar 2020
Storm A (2014) Isolation scheduler. In: GitHub. https://storm.incubator.apache.org/2013/01/11/storm082-released.html. Accessed Mar 2020
Storm topology explained using word count topology example | CoreJavaGuru. http://www.corejavaguru.com/bigdata/storm/word-count-topology. Accessed 19 Mar 2020
Exclamation topology. https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/ExclamationTopology.java. Accessed 19 Mar 2020
Karanth S (2014) Mastering Hadoop. https://books.google.com.pk/books?id=IdEGBgAAQBAJ&pg=PT374&lpg=PT374&dq=Exclamation+Topology&source=bl&ots=lVJVSSfjyR&sig=ACfU3U3AEcjALh6n1V8XaQqheeZ0teZ_IQ&hl=en&sa=X&ved=2ahUKEwiolfDIyKHoAhUvSRUIHRyQAQwQ6AEwCXoECC8QAQ#v=onepage&q=ExclamationTopology&f=. Accessed Mar 2020
Hussain A, Aleem M, Khan A et al (2018) RALBA: a computation-aware load balancing scheduler for cloud computing. Clust Comput 21:1667–1680. https://doi.org/10.1007/s10586-018-2414-6
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Muhammad, A., Aleem, M. A3-Storm: topology-, traffic-, and resource-aware storm scheduler for heterogeneous clusters. J Supercomput 77, 1059–1093 (2021). https://doi.org/10.1007/s11227-020-03289-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-020-03289-9