A Model Decomposition-in-Time of Recurrent Neural Networks: A Feasibility Analysis
Main Article Content
Abstract
In the context of Recurrent Neural Networks, minimization of the Loss Function (LF) causes the most training overhead. Following the Parallel In-Time approaches, we introduce an ab-initio decomposition across time direction. The key point of our approach lies in the innovative defi nition of local objective functions which allows us to overcome the sequential nature of the network and the management of dependencies between time steps. In particular, we defi ne local RNNs by adding a suitable overlapping operator to the local objective functions which guarantees their matching between adjacent subsequences. In this way, we get to a fully parallelizable decomposition of the RNN whose implementation avoids global synchronizations or pipelining. Nearest neighbours communications guarantee the algorithm’s convergence. We hope that these fi ndings encourage readers to further extend the framework according to their specifi c application requirements.
Downloads
Article Details
Copyright (c) 2025 D’Amore L.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Licensing and protecting the author rights is the central aim and core of the publishing business. Peertechz dedicates itself in making it easier for people to share and build upon the work of others while maintaining consistency with the rules of copyright. Peertechz licensing terms are formulated to facilitate reuse of the manuscripts published in journals to take maximum advantage of Open Access publication and for the purpose of disseminating knowledge.
We support 'libre' open access, which defines Open Access in true terms as free of charge online access along with usage rights. The usage rights are granted through the use of specific Creative Commons license.
Peertechz accomplice with- [CC BY 4.0]
Explanation
'CC' stands for Creative Commons license. 'BY' symbolizes that users have provided attribution to the creator that the published manuscripts can be used or shared. This license allows for redistribution, commercial and non-commercial, as long as it is passed along unchanged and in whole, with credit to the author.
Please take in notification that Creative Commons user licenses are non-revocable. We recommend authors to check if their funding body requires a specific license.
With this license, the authors are allowed that after publishing with Peertechz, they can share their research by posting a free draft copy of their article to any repository or website.
'CC BY' license observance:
License Name |
Permission to read and download |
Permission to display in a repository |
Permission to translate |
Commercial uses of manuscript |
CC BY 4.0 |
Yes |
Yes |
Yes |
Yes |
The authors please note that Creative Commons license is focused on making creative works available for discovery and reuse. Creative Commons licenses provide an alternative to standard copyrights, allowing authors to specify ways that their works can be used without having to grant permission for each individual request. Others who want to reserve all of their rights under copyright law should not use CC licenses.
Schmidt RM. Recurrent neural networks (rnns): A gentle introduction and overview. arXiv:1912.05911. 2019. Available from: https://doi.org/10.48550/arXiv.1912.05911
Bishop CM. Pattern Recognition and Machine Learning. Springer; 2006. ISBN-10: 0-387-31073-8. Available from: https://link.springer.com/book/9780387310732
Dennis JE Jr, Schnabel RB. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. SIAM; 1996. Available from: https://books.google.co.in/books/about/Numerical_Methods_for_Unconstrained_Opti.html?id=RtxcWd0eBD0C&redir_esc=y
Bertsekas DP. Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey. arXiv:1507.01030v2. 2017. Available from: https://doi.org/10.48550/arXiv.1507.01030
Paine T, Jin H, Yang J, Lin Z, Huang T. GPU asynchronous stochastic gradient descent to speed up neural network training. arXiv:1312.6186. 2013. Available from: https://doi.org/10.48550/arXiv.1312.6186
Jager S, Zorn HP, Igel S, Zirpins C. Parallelized training of deep NN: Comparison of current concepts and frameworks. In: Proceedings of the Second Workshop on Distributed Infrastructures for Deep Learning. 2018;15-20. Available from: https://dl.acm.org/doi/10.1145/3286490.3286561
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436-444. Available from: https://www.nature.com/articles/nature14539
Lions JL, Maday Y, Turinici G. A parareal in time discretization of PDE’s. C R Acad Sci Paris Ser I Math. 2001;332:661-668. Available from: https://www.scirp.org/reference/referencespapers?referenceid=2232887
Mayer R, Jacobsen HA. Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques and Tools. arXiv:1903.11314v1 [cs.DC]. 2019. Available from: https://doi.org/10.1145/3363554
Messi Nguelle T, Nzekon Nzeko’o AJ, Onana DD. Parallelization of Recurrent Neural Network training algorithm with implicit aggregation on multi-core architectures. 2024. hal-04542984v2. Available from: https://inria.hal.science/hal-04542984v2/document
Huang Z, Zweig G, Levit M, Dumoulin B, Oguz B, Chang S. Accelerating recurrent neural network training via two stage classes and parallelization. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding. IEEE. 2013;326-331. Available from: https://ieeexplore.ieee.org/abstract/document/6707751
Nievergelt J. Parallel methods for integrating ordinary differential equations. Commun ACM. 1964;7:731-733. Available from: https://doi.org/10.1145/355588.365137
Parallel in time [Internet]. Available from: https://parallel-in-time.org/
Arcucci R, D’Amore L, Carracciuolo L. On the problem-decomposition of scalable 4D-Var Data Assimilation models. In: Proceedings of the International Conference on High Performance Computing and Simulation. Amsterdam, The Netherlands. 2015;5895942. Available from: https://ieeexplore.ieee.org/document/7237097
D’Amore L, Cacciapuoti R. Parallel Dynamic Domain Decomposition in Space-Time for Data Assimilation problems. J Phys Conf Ser. 2021;1828(1):012131. Available from: https://iopscience.iop.org/article/10.1088/1742-6596/1828/1/012131
D’Amore L, Cacciapuoti R. Space-Time Decomposition of Kalman Filter. Numer Math Theor Meth Appl. 2023;16(4):847-882. Available from: https://global-sci.com/article/90230/space-time-decomposition-of-kalman-filter
Elman JL. Finding structure in time. Cogn Sci. 1990;14(2):179-211. Available from: https://onlinelibrary.wiley.com/doi/pdf/10.1207/s15516709cog1402_1
D’Amore L, Murli A. Regularization of a Fourier series method for the Laplace transform inversion with real data. Inverse Problems. 2002;18(4):1185-1205. Available from: https://iopscience.iop.org/article/10.1088/0266-5611/18/4/315/pdf
Murli A, Cuomo S, D’Amore L, Galletti A. Numerical regularization of a real inversion formula based on the Laplace transform’s eigenfunction expansion of the inverse function. Inverse Problems. 2007;23(2):713-731. Available from: https://iopscience.iop.org/article/10.1088/0266-5611/23/2/015/pdf
Gander MJ. Schwarz methods over the course of time. ETNA. 2008;31:228-255. Available from: https://etna.math.kent.edu/vol.31.2008/pp228-255.dir/pp228-255.pdf
Gander W. Least squares with a quadratic constraint. Numer Math. 1980;36:291-307. Available from: https://link.springer.com/article/10.1007/BF01396656
Golub GH, Heath M, Wahba G. Generalized Cross-Validation as a Method for Choosing a Good Ridge Parameter. Technometrics. 1979;21(2):215-223. Available from: https://doi.org/10.2307/1268518
Carracciuolo L, D’Amore L, D’Amore L, Murli A. Towards a parallel component for imaging in PETSc programming environment: A case study in 3-D echocardiography. Parallel Comput. 2006;32(1):67-83. Available from: https://doi.org/10.1016/j.parco.2005.09.001