A Model Decomposition-in-Time of Recurrent Neural Networks: A Feasibility Analysis
Main Article Content
Abstract
In the context of Recurrent Neural Networks, minimization of the Loss Function (LF) causes the most training overhead. Following the Parallel In-Time approaches, we introduce an ab-initio decomposition across time direction. The key point of our approach lies in the innovative defi nition of local objective functions which allows us to overcome the sequential nature of the network and the management of dependencies between time steps. In particular, we defi ne local RNNs by adding a suitable overlapping operator to the local objective functions which guarantees their matching between adjacent subsequences. In this way, we get to a fully parallelizable decomposition of the RNN whose implementation avoids global synchronizations or pipelining. Nearest neighbours communications guarantee the algorithm’s convergence. We hope that these fi ndings encourage readers to further extend the framework according to their specifi c application requirements.
Downloads
Article Details
Copyright (c) 2025 D’Amore L.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Schmidt RM. Recurrent neural networks (rnns): A gentle introduction and overview. arXiv:1912.05911. 2019. Available from: https://doi.org/10.48550/arXiv.1912.05911
Bishop CM. Pattern Recognition and Machine Learning. Springer; 2006. ISBN-10: 0-387-31073-8. Available from: https://link.springer.com/book/9780387310732
Dennis JE Jr, Schnabel RB. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. SIAM; 1996. Available from: https://books.google.co.in/books/about/Numerical_Methods_for_Unconstrained_Opti.html?id=RtxcWd0eBD0C&redir_esc=y
Bertsekas DP. Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey. arXiv:1507.01030v2. 2017. Available from: https://doi.org/10.48550/arXiv.1507.01030
Paine T, Jin H, Yang J, Lin Z, Huang T. GPU asynchronous stochastic gradient descent to speed up neural network training. arXiv:1312.6186. 2013. Available from: https://doi.org/10.48550/arXiv.1312.6186
Jager S, Zorn HP, Igel S, Zirpins C. Parallelized training of deep NN: Comparison of current concepts and frameworks. In: Proceedings of the Second Workshop on Distributed Infrastructures for Deep Learning. 2018;15-20. Available from: https://dl.acm.org/doi/10.1145/3286490.3286561
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436-444. Available from: https://www.nature.com/articles/nature14539
Lions JL, Maday Y, Turinici G. A parareal in time discretization of PDE’s. C R Acad Sci Paris Ser I Math. 2001;332:661-668. Available from: https://www.scirp.org/reference/referencespapers?referenceid=2232887
Mayer R, Jacobsen HA. Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques and Tools. arXiv:1903.11314v1 [cs.DC]. 2019. Available from: https://doi.org/10.1145/3363554
Messi Nguelle T, Nzekon Nzeko’o AJ, Onana DD. Parallelization of Recurrent Neural Network training algorithm with implicit aggregation on multi-core architectures. 2024. hal-04542984v2. Available from: https://inria.hal.science/hal-04542984v2/document
Huang Z, Zweig G, Levit M, Dumoulin B, Oguz B, Chang S. Accelerating recurrent neural network training via two stage classes and parallelization. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding. IEEE. 2013;326-331. Available from: https://ieeexplore.ieee.org/abstract/document/6707751
Nievergelt J. Parallel methods for integrating ordinary differential equations. Commun ACM. 1964;7:731-733. Available from: https://doi.org/10.1145/355588.365137
Parallel in time [Internet]. Available from: https://parallel-in-time.org/
Arcucci R, D’Amore L, Carracciuolo L. On the problem-decomposition of scalable 4D-Var Data Assimilation models. In: Proceedings of the International Conference on High Performance Computing and Simulation. Amsterdam, The Netherlands. 2015;5895942. Available from: https://ieeexplore.ieee.org/document/7237097
D’Amore L, Cacciapuoti R. Parallel Dynamic Domain Decomposition in Space-Time for Data Assimilation problems. J Phys Conf Ser. 2021;1828(1):012131. Available from: https://iopscience.iop.org/article/10.1088/1742-6596/1828/1/012131
D’Amore L, Cacciapuoti R. Space-Time Decomposition of Kalman Filter. Numer Math Theor Meth Appl. 2023;16(4):847-882. Available from: https://global-sci.com/article/90230/space-time-decomposition-of-kalman-filter
Elman JL. Finding structure in time. Cogn Sci. 1990;14(2):179-211. Available from: https://onlinelibrary.wiley.com/doi/pdf/10.1207/s15516709cog1402_1
D’Amore L, Murli A. Regularization of a Fourier series method for the Laplace transform inversion with real data. Inverse Problems. 2002;18(4):1185-1205. Available from: https://iopscience.iop.org/article/10.1088/0266-5611/18/4/315/pdf
Murli A, Cuomo S, D’Amore L, Galletti A. Numerical regularization of a real inversion formula based on the Laplace transform’s eigenfunction expansion of the inverse function. Inverse Problems. 2007;23(2):713-731. Available from: https://iopscience.iop.org/article/10.1088/0266-5611/23/2/015/pdf
Gander MJ. Schwarz methods over the course of time. ETNA. 2008;31:228-255. Available from: https://etna.math.kent.edu/vol.31.2008/pp228-255.dir/pp228-255.pdf
Gander W. Least squares with a quadratic constraint. Numer Math. 1980;36:291-307. Available from: https://link.springer.com/article/10.1007/BF01396656
Golub GH, Heath M, Wahba G. Generalized Cross-Validation as a Method for Choosing a Good Ridge Parameter. Technometrics. 1979;21(2):215-223. Available from: https://doi.org/10.2307/1268518
Carracciuolo L, D’Amore L, D’Amore L, Murli A. Towards a parallel component for imaging in PETSc programming environment: A case study in 3-D echocardiography. Parallel Comput. 2006;32(1):67-83. Available from: https://doi.org/10.1016/j.parco.2005.09.001