MLlib is an Apache Spark library that provides many machine learning algorithms and data processing utilities. Although the default configuration of these algorithms yields satisfactory results for practitioners, further tuning is often needed to improve resource usage efficiency. Furthermore, tuned MLlib algorithms may run faster than those using default configurations. However, this improvement depends on several factors, including machine settings, dataset design, and operating system preferences. Previous studies have generally focused on developing sophisticated tuners for MLlib, evaluating algorithm-focused optimizers for their competitiveness. Although derivative-based and model-free optimizers have been modified for use with MLlib, sampling-based optimizers are generally overlooked. To fill this research gap, this study empirically compares sampling-based and model-free techniques for tuning MLlib. Firstly, Monte Carlo and Cross-Entropy sampling algorithms are adapted to optimize MLlib algorithms. Subsequently, model-free techniques, including grid and random search algorithms, are compared with these sampling-based algorithms. Through extensive experimentation, their advantages and limitations are highlighted. Finally, threats to validity and future directions for unlocking the tuning potential of Apache Spark are discussed by interpreting performance bottlenecks and promising areas for optimization.
Building similarity graph...
Analyzing shared references across papers
Loading...
M. Maruf Ozturk
ADBA computer science.
Suleyman Demirel University
Building similarity graph...
Analyzing shared references across papers
Loading...
M. Maruf Ozturk (Tue,) studied this question.
www.synapsesocial.com/papers/69a75acdc6e9836116a2118d — DOI: https://doi.org/10.69882/adba.cs.2026012
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: