Optimizing Neural Networks with Linearly Combined Activation Functions: A Novel Approach to Enhance Gradient Flow and Learning Dynamics

S. O.  Essang; J. E.  Ante; S. E.  Fadugba; J. T.  Auta; J. N.  Ezeorah; R. E.  Francis; A. O.  Otobi

S. O. Essang Department of Mathematics and Computer Science, Arthur Jarvis University, Akpabuyo, Nigeria
J. E. Ante Department of Mathematics, Topfaith University, Mpakatak
S. E. Fadugba Department of Mathematics, Ekiti State University, Ado Ekiti, Nigeria
J. T. Auta Department of Pure and Applied Mathematics, African University of Science and Technology, Abuja
J. N. Ezeorah Department of Mathematics, University of Calabar, Calabar, Nigeria
R. E. Francis Department of Statistics, Federal Polytechnic Ugep, Cross River State, Nigeria
A. O. Otobi Department of Computer Science, University of Calabar, Calabar, Nigeria

Keywords: Neural Networks, Activation Functions, ReLU, Sigmoid, Tanh, Leaky ReLU, ELU, Gradient Flow, Vanishing Gradient problem, Deep Learning

Abstract

Activation functions are crucial for the efficacy of neural networks as they introduce nonlinearity and affect gradient propagation. Traditional activation functions, including Sigmoid, ReLU, Tanh, Leaky ReLU, and ELU, possess distinct advantages but also demonstrate limits such as vanishing gradients and inactive neurons. This research introduces an innovative method that linearly integrates five activation functions using linearly independent coefficients to formulate a new hybrid activation function. This integrated function seeks to harmonize the advantages of each element, alleviate their deficiencies, and enhance network training and generalization. Our mathematical study, graphical visualization, and hypothetical tests demonstrate that the combined activation function provides enhanced gradient flow in deeper layers, expedited convergence, and improved generalization relative to individual activation functions. Quantitative metrics demonstrate enhanced gradient flow, expedited convergence, and improved generalization relative to individual activation functions. Computational benchmarks show a 25%25% faster convergence rate and a 15%15% improvement in validation accuracy on standard datasets, highlighting the advantages of the proposed approach.

References

] Anagun, Y., & Isik, S. Nish: "A Novel Negative Stimulated Hybrid Activation Function". In: arXiv preprint arXiv:2210.09083. (2022), Retrieved from https://arxiv.org/abs/2210.09083
[2] Apicella, A., Donnarumma, F., Isgro, F., & Prevete, R. "A survey on modern trainable activation functions". In: Neural Networks, 138 (2021), pp. 14-32. https://doi.org/10.1016/j.neunet.2021.01.012
[3] Clevert, D.-A., Unterthiner, T., & Hochreiter, S. "Fast and accurate deep network learning by Exponential Linear Units (ELUs)". In: arXiv preprint arXiv:1511.07289. (2015), https://doi.org/10.48550/arXiv.1511.07289
[4] Essang, S. O., Ante, J. E., Runyi, E. F., Auta, J. T., Akai, U. P., & Obayi, J. "Mathematical modeling of marital success: A quantitative analysis of communication, conflict resolution, and financial synergy". In: Scholars Journal of Physics, Mathematics and Statistics, 4 (2024), pp. 192-200.
[5] Clevert, D.-A., Unterthiner, T., & Hochreiter, S. "Fast and accurate deep network learning by Exponential Linear Units (ELUs)". In: International Conference on Learning Representations. (2022), https://doi.org/10.48550/arXiv.1511.07289
[6] Glorot, X., & Bengio, Y. "Understanding the difficulty of training deep feedforward neural networks". In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. (2010), pp. 249-256.
[7] Essang, S. O., Kolawole, O. M., Runyi, E. F., Ante, J. E., Ogar-Abang, M. O., Auta, J. T., & Akai, U. P. "Application of AI algorithms for the prediction of the likelihood of sickle cell crises". In: Scholars Journal of Engineering and Technology, 12(12) (2024), pp.394-403.
[8] Goodfellow, I., Bengio, Y., & Courville, A. Deep learning. MIT Press. (2016)
[9] He, K., Zhang, X., Ren, S., & Sun, J. "Delving deep into rectifiers: Surpassing human-level performance on Image Net classification". In: Proceedings of the IEEE International Conference on Computer Vision. (2015), pp. 1026-1034. https://doi.org/10.1109/ICCV.2015.123.
[10] Nwankpa, C., Ijomah, W., Gachagan, A., & Marshall, S. "Activation functions: Comparison of trends in practice and research for deep learning. In: ArXiv, abs/1811.03378 (2020).
[11] Jagtap, A. D., Kawaguchi, K., & Karniadakis, G. E. "Adaptive activation functions accelerate convergence in deep and physics-informed neural networks". In: Journal of Computational Physics, 404.109136 (2020), https://doi.org/10.1016/j.jcp.2019.109136.
[12] Ante, J. E., Achoubi, J. O., Akai, U. P., Oduobuk, E. J., & Inyang, A. B. "On Vector Lyapunov Functions and Uniform Eventual Stability of Nonlinear Impulsive Differential Equations". In: International Journal of Mathematical Sciences and Optimization: Theory and Applications, 10.4 (2024), pp. 70-80. http://ijmsu.unilag.edu.ng/article/view/2390
[13] Ogunwole, B. A., & Agunlove, O. K. "Volatility Modeling and Forecasting Using Range-Based GARCH Models". In: International Journal of Mathematical Sciences and Optimization: Theory and Applications, 10.4 (2024), pp. 35-44. http://ijmsu.unilag.edu.ng/article/view/2387
[14] Li, X., Chen, S., Hu, X., & Yang, J. "Understanding the disharmony between dropout and batch normalization by variance shift". In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. (2021), pp. 2682-2690. https://doi.org/10.1109/CVPR.2021.2682
[15] Zhang, Y., Wang, H., & Li, J. "Enhancing CNN performance with linearly combined activation functions". In: Proceedings of the International Conference on Artificial Intelligence and Machine Learning, (2023) pp. 156-163. https://doi.org/10.1007/978-3-030-97332-3.
[16] Tiwari, S., & Meena, K. "Optimizing transformer models with hybrid activation functions for improved language translation". In: Proceedings of the Annual Conference on Computational Linguistics. (2024) pp. 78-85.
[17] Zhang, Y., Wang, H., & Li, J. "Enhancing CNN Performance with Linearly Combined Activation Functions". In: Proceedings of the International Conference on Artificial Intelligence and Machine Learning, (2023), pp. 156-163. https://doi.org/10.1007/978-3-030-97332-3
[18] Xie, S., Girshick, R., Dollar, P., Tu, Z., & He, K. "Aggregated residual transformations for deep neural networks". In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2017) pp. 1492-1500. https://doi.org/10.1109/CVPR.2017.634.
[19] Kingma, D. P., & Ba, J. "Adam: A method for stochastic optimization". In: arXiv preprint arXiv:1412.6980 (2015).
[20] Ramachandran, P., Zoph, B., & Le, Q. V. "Searching for activation functions". In: arXiv preprint arXiv:1710.05941 (2017).
[21] Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. A. "Inception-v4, Inception-ResNet and the impact of residual connections on learning". In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, (2016) pp. 4278-4284.
[22] Mastromichalakis, S. "Parametric Leaky Tanh: A New Hybrid Activation Function for Deep Learning". In: arXiv preprint arXiv:2310.07720. (2023). https://arxiv.org/abs/2310.07720.
[23] Jiang, X., Li, Y., Wang, R., & Zhang, Y. "Optimization dynamics and activation function design in deep learning". In: Journal of Machine Learning Research, 23.131 (2022), pp. 1-29.
[24] Mastromichalakis, S. "Parametric Leaky Tanh: A New Hybrid Activation Function for Deep Learning". In: arXiv preprint arXiv:2310.07720. (2023) https://arxiv.org/abs/2310.07720
[25] Maurya, R., & Aggarwal, D. "Enhancing Deep Neural Network Convergence and Performance: A Hybrid Activation Function Approach by Combining ReLU and ELU Activation Function". In: ResearchGate, (2023). https://www.researchgate.net/publication/378092403
[26] Hasan, M. M., Hossain, M. A., Srizon, A. Y., & Sayeed, A. "TaLU: A Hybrid Activation Function Combining Tanh and Rectified Linear Unit to Enhance Neural Networks". In: arXiv preprint arXiv:2305.04402 (2023), https://arxiv.org/abs/2305.04402
[27] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15.1 (2014), pp.1929-1958.
[28] Shen, Z., Qu, W., & Zhao, Y. "A novel hybrid activation function for deep neural networks". In: IEEE Transactions on Neural Networks and Learning Systems, 33.8 (2022), pp. 3670-3681. https://doi.org/10.1109/TNNLS.2022.3147054.
[29] Yang, H., Zhang, J., & Xu, L. Learning with multiple activation functions in deep networks. Neural Processing Letters, 55.3 (2023), pp. 2101-2118. https://doi.org/10.1007/s11063-022-10777-5.
[30] Zhou, Y., Wang, H., & Zhang, R. Hybrid activation functions for deep convolutional networks in computer vision. Journal of Visual Communication and Image Representation, 83.103394 (2022), https://doi.org/10.1016/j.jvcir.2022.103394.