A Model Decomposition-in-Time of Recurrent Neural Networks: A Feasibility Analysis

L D’Amore

doi:10.17352/tcsit.000091

PDF HTML

Submitted: February 3, 2025

Published: Feb 19, 2025

DOI: 10.17352/tcsit.000091

Keywords:

Parallel-in-time approach, Domain decomposition, Additive objective function, Constrained least square problems, Recurrent neural networks, Parallel algorithm

L D’Amore*

University of Naples Federico II, Naples, Italy

Abstract

In the context of Recurrent Neural Networks, minimization of the Loss Function (LF) causes the most training overhead. Following the Parallel In-Time approaches, we introduce an ab-initio decomposition across time direction. The key point of our approach lies in the innovative defi nition of local objective functions which allows us to overcome the sequential nature of the network and the management of dependencies between time steps. In particular, we defi ne local RNNs by adding a suitable overlapping operator to the local objective functions which guarantees their matching between adjacent subsequences. In this way, we get to a fully parallelizable decomposition of the RNN whose implementation avoids global synchronizations or pipelining. Nearest neighbours communications guarantee the algorithm’s convergence. We hope that these fi ndings encourage readers to further extend the framework according to their specifi c application requirements.

Downloads

Download data is not yet available.

How to Cite

D’Amore, L. (2025). A Model Decomposition-in-Time of Recurrent Neural Networks: A Feasibility Analysis. Trends in Computer Science and Information Technology, 10(1), 007–010. https://doi.org/10.17352/tcsit.000091

Issue

Vol. 10 No. 1 (2025)

Section

Short Communications

Copyright & License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

References

Schmidt RM. Recurrent neural networks (rnns): A gentle introduction and overview. arXiv:1912.05911. 2019. Available from: https://doi.org/10.48550/arXiv.1912.05911

Bishop CM. Pattern Recognition and Machine Learning. Springer; 2006. ISBN-10: 0-387-31073-8. Available from: https://link.springer.com/book/9780387310732

Dennis JE Jr, Schnabel RB. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. SIAM; 1996. Available from: https://books.google.co.in/books/about/Numerical_Methods_for_Unconstrained_Opti.html?id=RtxcWd0eBD0C&redir_esc=y

Bertsekas DP. Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey. arXiv:1507.01030v2. 2017. Available from: https://doi.org/10.48550/arXiv.1507.01030

Paine T, Jin H, Yang J, Lin Z, Huang T. GPU asynchronous stochastic gradient descent to speed up neural network training. arXiv:1312.6186. 2013. Available from: https://doi.org/10.48550/arXiv.1312.6186

Jager S, Zorn HP, Igel S, Zirpins C. Parallelized training of deep NN: Comparison of current concepts and frameworks. In: Proceedings of the Second Workshop on Distributed Infrastructures for Deep Learning. 2018;15-20. Available from: https://dl.acm.org/doi/10.1145/3286490.3286561

LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436-444. Available from: https://www.nature.com/articles/nature14539

Lions JL, Maday Y, Turinici G. A parareal in time discretization of PDE’s. C R Acad Sci Paris Ser I Math. 2001;332:661-668. Available from: https://www.scirp.org/reference/referencespapers?referenceid=2232887

Mayer R, Jacobsen HA. Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques and Tools. arXiv:1903.11314v1 [cs.DC]. 2019. Available from: https://doi.org/10.1145/3363554

Messi Nguelle T, Nzekon Nzeko’o AJ, Onana DD. Parallelization of Recurrent Neural Network training algorithm with implicit aggregation on multi-core architectures. 2024. hal-04542984v2. Available from: https://inria.hal.science/hal-04542984v2/document

Huang Z, Zweig G, Levit M, Dumoulin B, Oguz B, Chang S. Accelerating recurrent neural network training via two stage classes and parallelization. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding. IEEE. 2013;326-331. Available from: https://ieeexplore.ieee.org/abstract/document/6707751

Nievergelt J. Parallel methods for integrating ordinary differential equations. Commun ACM. 1964;7:731-733. Available from: https://doi.org/10.1145/355588.365137

Parallel in time [Internet]. Available from: https://parallel-in-time.org/

Arcucci R, D’Amore L, Carracciuolo L. On the problem-decomposition of scalable 4D-Var Data Assimilation models. In: Proceedings of the International Conference on High Performance Computing and Simulation. Amsterdam, The Netherlands. 2015;5895942. Available from: https://ieeexplore.ieee.org/document/7237097

D’Amore L, Cacciapuoti R. Parallel Dynamic Domain Decomposition in Space-Time for Data Assimilation problems. J Phys Conf Ser. 2021;1828(1):012131. Available from: https://iopscience.iop.org/article/10.1088/1742-6596/1828/1/012131

D’Amore L, Cacciapuoti R. Space-Time Decomposition of Kalman Filter. Numer Math Theor Meth Appl. 2023;16(4):847-882. Available from: https://global-sci.com/article/90230/space-time-decomposition-of-kalman-filter

Elman JL. Finding structure in time. Cogn Sci. 1990;14(2):179-211. Available from: https://onlinelibrary.wiley.com/doi/pdf/10.1207/s15516709cog1402_1

D’Amore L, Murli A. Regularization of a Fourier series method for the Laplace transform inversion with real data. Inverse Problems. 2002;18(4):1185-1205. Available from: https://iopscience.iop.org/article/10.1088/0266-5611/18/4/315/pdf

Murli A, Cuomo S, D’Amore L, Galletti A. Numerical regularization of a real inversion formula based on the Laplace transform’s eigenfunction expansion of the inverse function. Inverse Problems. 2007;23(2):713-731. Available from: https://iopscience.iop.org/article/10.1088/0266-5611/23/2/015/pdf

Gander MJ. Schwarz methods over the course of time. ETNA. 2008;31:228-255. Available from: https://etna.math.kent.edu/vol.31.2008/pp228-255.dir/pp228-255.pdf

Gander W. Least squares with a quadratic constraint. Numer Math. 1980;36:291-307. Available from: https://link.springer.com/article/10.1007/BF01396656

Golub GH, Heath M, Wahba G. Generalized Cross-Validation as a Method for Choosing a Good Ridge Parameter. Technometrics. 1979;21(2):215-223. Available from: https://doi.org/10.2307/1268518

Carracciuolo L, D’Amore L, D’Amore L, Murli A. Towards a parallel component for imaging in PETSc programming environment: A case study in 3-D echocardiography. Parallel Comput. 2006;32(1):67-83. Available from: https://doi.org/10.1016/j.parco.2005.09.001

Article Sidebar

Main Article Content