• International Journal of Technology (IJTech)
  • Vol 17, No 1 (2026)

A Zeroth-Order Stochastic Gradient Descent Method for Communication-Efficient Federated Learning

A Zeroth-Order Stochastic Gradient Descent Method for Communication-Efficient Federated Learning

Title: A Zeroth-Order Stochastic Gradient Descent Method for Communication-Efficient Federated Learning
Hodaka Nishi, Shiro Yano, Megumi Miyashita, Shunta Onishi, Yuta Goto, Toshiyuki Kondo

Corresponding email:


Cite this article as:
Nishi, H., Yano, S., Miyashita, M., Onishi, S., Goto, Y., & Kondo, T. (2026). A zeroth-order stochastic gradient descent method for communication-efficient federated learning. International Journal of Technology, 17 (1), 250–260


9
Downloads
Hodaka Nishi Department of Electrical Engineering and Computer Science, Tokyo University of Agriculture and Technology, 2-24-16 Naka-cho, Koganei-shi, Tokyo 184-8588, Japan
Shiro Yano InfoTech Div., Toyota Motor Corporation, Otemachi Bldg. 6F, 1-6-1 Otemachi, Chiyoda-ku, Tokyo, 100-0004, Japan
Megumi Miyashita Department of Electrical Engineering and Computer Science, Tokyo University of Agriculture and Technology, 2-24-16 Naka-cho, Koganei-shi, Tokyo 184-8588, Japan
Shunta Onishi Department of Electrical Engineering and Computer Science, Tokyo University of Agriculture and Technology, 2-24-16 Naka-cho, Koganei-shi, Tokyo 184-8588, Japan
Yuta Goto Department of Electrical Engineering and Computer Science, Tokyo University of Agriculture and Technology, 2-24-16 Naka-cho, Koganei-shi, Tokyo 184-8588, Japan
Toshiyuki Kondo Department of Electrical Engineering and Computer Science, Tokyo University of Agriculture and Technology, 2-24-16 Naka-cho, Koganei-shi, Tokyo 184-8588, Japan
Email to Corresponding Author

Abstract
A Zeroth-Order Stochastic Gradient Descent Method for Communication-Efficient Federated Learning

Federated learning (FL) has emerged as a key paradigm for decentralized data privacy-preserving machine learning. However, substantial communication costs often hinder its practical application, especially as deep learning models scale to millions or billions of parameters. This communication bottleneck becomes particularly acute in heterogeneous networks with clients who are resource-constrained. To address this challenge, this study proposes a novel FL framework that leverages black-box optimization, specifically the zeroth-order (ZO) method, to reduce communication overhead. The proposed method, named ZO-FedSGD, reframes the learning process to eliminate the need for transmitting high-dimensional model parameters. Instead, each communication round involves exchanging only a constant number of scalar values, including a random seed and function evaluations, making the communication cost independent of the model size. Extensive experiments were conducted to compare ZO-FedSGD with the existing FedAvg algorithm on the MNIST datasets. The evaluation focused on model accuracy and total communication efficiency. Our results reveal a trade-off: ZO-FedSGD required more rounds to converge and achieved a slightly lower final accuracy. However, it demonstrated superior communication efficiency—to reach 90% accuracy, ZO-FedSGD required approximately 104 communicated parameters, compared to 106 for FedAvg, representing a two-order-of-magnitude reduction. In conclusion, this study validates ZO-FedSGD as a viable and highly efficient alternative for FL in communication-constrained scenarios. It offers a new direction for designing scalable FL systems and a promising solution to the statistical heterogeneity problem.

Black-box optimization; Federated learning; Two-point estimation

References

Chen, X., Liu, S., Xu, K., Li, X., Lin, X., Hong, M., & Cox, D. (2019). Zo-adamm: Zeroth-order adaptive momentum method for black-box optimization. Advances in Neural Information Processing Systems, 32. https://doi.org/10.48550/arXiv.1910.06513

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: Human language technologies, volume 1 (long and short papers) (pp. 4171–4186). Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1423

Dritsas, E., & Trigka, M. (2025). Federated learning for iot: A survey of techniques, challenges, and applications. Journal of Sensor and Actuator Networks, 14 (1), 9. https://doi.org/10.3390/jsan14010009

Feurer, M., & Hutter, F. (2019). Hyperparameter optimization. In F. Hutter, L. Kotthoff, & J. Vanschoren (Eds.), Automated machine learning: Methods, systems, challenges (pp. 3–33). Springer International Publishing. https://doi.org/10.1007/978-3-030-05318-5 1

Ghadimi, S., & Lan, G. (2013). Stochastic first- and zeroth-order methods for non-convex stochastic programming. arXiv preprint. https://doi.org/10.48550/arXiv.1309.5549

Golovin, D., Karro, J., Kochanski, G., Lee, C., Song, X., & Zhang, Q. (2019). Gradientless descent: High-dimensional zeroth-order optimization. arXiv preprint. https://doi.org/10.48550/arXiv.1911.06317

Golovin, D., Solnik, B., Moitra, S., Kochanski, G., Karro, J., & Sculley, D. (2017). Google vizier: A service for black-box optimization. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1487–1495. https://doi.org/10.1145/3097983.3098043

Guan, H., Yap, P.-T., Bozoki, A., & Liu, M. (2024). Federated learning for medical image analysis: A survey. Pattern Recognition, 151, 110424. https://doi.org/10.1016/j.patcog.2024.110424

Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., Xu, R., et al. (2025). Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint. https://doi.org/10.48550/arXiv.2501.12948

Hansen, N., Auger, A., Ros, R., Finck, S., & Posik, P. (2010). Comparing results of 31 algorithms from the black-box optimization benchmarking bbob-2009. Proceedings of the 12th Annual Conference Companion on Genetic and Evolutionary Computation, 1689–1696. https://doi.org/10.1145/1830761.1830790

Jiang, Z., Chua, F.-F., & Lim, A. H.-L. (2025). Privacy-preserving data uploading scheme based on threshold secret sharing algorithm for internet of vehicles. International Journal of Technology, 16 (3), 731–747. https://doi.org/10.14716/ijtech.v16i3.7260

Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., et al. (2021). Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14 (1–2), 1–210. https://doi.org/10.1561/2200000083

Lai, F., Dai, Y., Singapuram, S., Liu, J., Zhu, X., Madhyastha, H., et al. (2022). Fedscale: Benchmarking model and system performance of federated learning at scale. International Conference on Machine Learning, 11814–11827. https://doi.org/10.1145/3477114.3488760

LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., et al. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1 (4), 541–551. https://doi.org/10.1162/neco.1989.1.4.541

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86 (11), 2278–2324. https://doi.org/10.1109/5.726791

Li, J., Zhang, Y., Li, Y., Gong, X., & Wang, W. (2024). Fedsparse: A communication-efficient federated learning framework based on sparse updates. Electronics, 13 (24), 5042. https://doi.org/10.3390/electronics13245042

Li, L., Fan, Y., Tse, M., & Lin, K. Y. (2020). A review of applications in federated learning. Computers & Industrial Engineering, 149, 106854. https://doi.org/10.1016/j.cie.2020.106854

Li, Z., Ying, B., Liu, Z., Dong, C., & Yang, H. (2024). Achieving dimension-free communication in federated learning via zeroth-order optimization. arXiv preprint. https://doi.org/10.48550/arXiv.2405.15861

Lin, J., Zhu, L., Chen, W., Wang, W., & Han, S. (2023). Tiny machine learning: Progress and futures [feature]. IEEE Circuits and Systems Magazine, 23 (3), 8–34. https://doi.org/10.1109/MCAS.2023.3302182

Liu, S., Chen, P., Kailkhura, B., Zhang, G., Hero, A. O., & Varshney, P. K. (2020). A primer on zeroth-order optimization in signal processing and machine learning: Principles, recent advances, and applications. IEEE Signal Processing Magazine, 37 (5), 43–54. https://doi.org/10.1109/MSP.2020.3003837

Liu, S., Chen, P., Zhu, W., & Carin, L. (2018). Zeroth-order stochastic variance reduction for nonconvex optimization. Advances in Neural Information Processing Systems. https://doi.org/10.48550/arXiv.1805.10367

Ma, X., Wang, J., & Zhang, X. (2025). Data-free black-box federated learning via zeroth-order gradient estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 39 (18), 19314–19322. https://doi.org/10.1609/aaai.v39i18.34126

McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017). Communication efficient learning of deep networks from decentralized data. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 1273–1282. https://doi.org/10.48550/arXiv.1602.05629

Nesterov, Y., & Spokoiny, V. (2017). Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17 (2), 527–566. https://doi.org/10.1007/s10208-015-9296-2

Nguyen, D. C., Ding, M., Pathirana, P. N., Seneviratne, A., Li, J., & Poor, H. V. (2021). Federated learning for internet of things: A comprehensive survey. IEEE Communications Surveys & Tutorials, 23 (3), 1622–1658. https://doi.org/10.1109/COMST.2021.3075439

Nguyen, D. C., Pham, Q.-V., Pathirana, P. N., Ding, M., Seneviratne, A., Lin, Z., et al. (2023). Federated learning for smart healthcare: A survey. ACM Computing Surveys, 55 (3), 1–37. https://doi.org/10.1145/3501296

Reisizadeh, A., Mokhtari, A., Hassani, H., Jadbabaie, A., & Pedarsani, R. (2020). Fedpaq: A communication-efficient federated learning method with periodic averaging and quantization. International Conference on Artificial Intelligence and Statistics, 2021–2031. https://doi.org/10.48550/arXiv.1909.13014

Rieke, N., Hancox, J., Li, W., Milletari, F., Roth, H. R., Albarqouni, S., et al. (2020). The future of digital health with federated learning. NPJ Digital Medicine, 3 (1), 1–7. https://doi.org/10.1038/s41746-020-00323-1

Rubinstein, R. Y., & Kroese, D. P. (2004). The cross-entropy method: A unified approach to combinatorial optimization, monte-carlo simulation and machine learning. Springer Science & Business Media. https://doi.org/10.1007/978-1-4757-4321-0

Teo, Z. L., Jin, L., Li, S., Miao, D., Zhang, X., Ng, W. Y., et al. (2024). Federated machine learning in healthcare: A systematic review on clinical applications and technical architecture. Cell Reports Medicine, 5 (2), 101419. https://doi.org/10.1016/j.xcrm.2024.101419

Turner, R., Eriksson, D., McCourt, M., Kiili, J., Laaksonen, E., Xu, Z., et al. (2021). Bayesian optimization is superior to random search for machine learning hyperparameter tuning: Analysis of the black-box optimization challenge 2020. NeurIPS 2020 Competition and Demonstration Track, 3–26. https://doi.org/10.48550/arXiv.2104.10201

Wang, S., Tuor, T., Salonidis, T., Leung, K. K., Makaya, C., He, T., et al. (2019). Adaptive federated learning in resource constrained edge computing systems. IEEE Journal on Selected Areas in Communications, 37 (6), 1205–1221. https://doi.org/10.1109/JSAC.2019.2904348

Wang, X., Jin, Y., Schmitt, S., & Olhofer, M. (2023). Recent advances in bayesian optimization. ACM Computing Surveys, 55 (13s), 287:1–287:36. https://doi.org/10.1145/3582078

Wang, Y., Du, S., Balakrishnan, S., & Singh, A. (2018). Stochastic zeroth-order optimization in high dimensions. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, 1356–1365. https://doi.org/10.48550/arXiv.1710.10551

Wei, K., Li, J., Ding, M., Ma, C., Yang, H. H., Farokhi, F., et al. (2020). Federated learning with differential privacy: Algorithms and performance analysis. IEEE Transactions on Information Forensics and Security, 15, 3454–3469. https://doi.org/10.1109/TIFS.2020.2988575

Wu, C., Wu, F., Lyu, L., Huan, Y., & Xie, X. (2022). Communication-efficient federated learning via knowledge distillation. Nature Communications, 13 (1), 2032. https://doi.org/10.1038/s41467-022-29763-x

Yang, Q., Liu, Y., Chen, T., & Tong, Y. (2019). Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology, 10 (2), 1–19. https://doi.org/10.1145/3298981

Yang, Y., Yang, Z., Wang, L., Zhu, L., & Wang, M. (2025). Dynamic personalized federated learning via representation-driven clustering. IEEE Internet of Things Journal. https://doi.org/10.1109/JIOT.2025.3577661

Zhang, C., Xie, Y., Bai, H., Yu, B., Li, W., & Gao, Y. (2021). A survey on federated learning. Knowledge-Based Systems, 216, 106775. https://doi.org/10.1016/j.knosys.2021.106775