A Model Decomposition-in-Time of Recurrent Neural Networks: A Feasibility Analysis

Main Article Content

L D’Amore*

Abstract

In the context of Recurrent Neural Networks, minimization of the Loss Function (LF) causes the most training overhead. Following the Parallel In-Time approaches, we introduce an ab-initio decomposition across time direction. The key point of our approach lies in the innovative defi nition of local objective functions which allows us to overcome the sequential nature of the network and the management of dependencies between time steps. In particular, we defi ne local RNNs by adding a suitable overlapping operator to the local objective functions which guarantees their matching between adjacent subsequences. In this way, we get to a fully parallelizable decomposition of the RNN whose implementation avoids global synchronizations or pipelining. Nearest neighbours communications guarantee the algorithm’s convergence. We hope that these fi ndings encourage readers to further extend the framework according to their specifi c application requirements.

Downloads

Download data is not yet available.

Article Details

D’Amore, L. (2025). A Model Decomposition-in-Time of Recurrent Neural Networks: A Feasibility Analysis. Trends in Computer Science and Information Technology, 007–010. https://doi.org/10.17352/tcsit.000091
Short Communications

Copyright (c) 2025 D’Amore L.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Licensing and protecting the author rights is the central aim and core of the publishing business. Peertechz dedicates itself in making it easier for people to share and build upon the work of others while maintaining consistency with the rules of copyright. Peertechz licensing terms are formulated to facilitate reuse of the manuscripts published in journals to take maximum advantage of Open Access publication and for the purpose of disseminating knowledge.

We support 'libre' open access, which defines Open Access in true terms as free of charge online access along with usage rights. The usage rights are granted through the use of specific Creative Commons license.

Peertechz accomplice with- [CC BY 4.0]

Explanation

'CC' stands for Creative Commons license. 'BY' symbolizes that users have provided attribution to the creator that the published manuscripts can be used or shared. This license allows for redistribution, commercial and non-commercial, as long as it is passed along unchanged and in whole, with credit to the author.

Please take in notification that Creative Commons user licenses are non-revocable. We recommend authors to check if their funding body requires a specific license.

With this license, the authors are allowed that after publishing with Peertechz, they can share their research by posting a free draft copy of their article to any repository or website.
'CC BY' license observance:

License Name

Permission to read and download

Permission to display in a repository

Permission to translate

Commercial uses of manuscript

CC BY 4.0

Yes

Yes

Yes

Yes

The authors please note that Creative Commons license is focused on making creative works available for discovery and reuse. Creative Commons licenses provide an alternative to standard copyrights, allowing authors to specify ways that their works can be used without having to grant permission for each individual request. Others who want to reserve all of their rights under copyright law should not use CC licenses.

Schmidt RM. Recurrent neural networks (rnns): A gentle introduction and overview. arXiv:1912.05911. 2019. Available from: https://doi.org/10.48550/arXiv.1912.05911

Bishop CM. Pattern Recognition and Machine Learning. Springer; 2006. ISBN-10: 0-387-31073-8. Available from: https://link.springer.com/book/9780387310732

Dennis JE Jr, Schnabel RB. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. SIAM; 1996. Available from: https://books.google.co.in/books/about/Numerical_Methods_for_Unconstrained_Opti.html?id=RtxcWd0eBD0C&redir_esc=y

Bertsekas DP. Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey. arXiv:1507.01030v2. 2017. Available from: https://doi.org/10.48550/arXiv.1507.01030

Paine T, Jin H, Yang J, Lin Z, Huang T. GPU asynchronous stochastic gradient descent to speed up neural network training. arXiv:1312.6186. 2013. Available from: https://doi.org/10.48550/arXiv.1312.6186

Jager S, Zorn HP, Igel S, Zirpins C. Parallelized training of deep NN: Comparison of current concepts and frameworks. In: Proceedings of the Second Workshop on Distributed Infrastructures for Deep Learning. 2018;15-20. Available from: https://dl.acm.org/doi/10.1145/3286490.3286561

LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436-444. Available from: https://www.nature.com/articles/nature14539

Lions JL, Maday Y, Turinici G. A parareal in time discretization of PDE’s. C R Acad Sci Paris Ser I Math. 2001;332:661-668. Available from: https://www.scirp.org/reference/referencespapers?referenceid=2232887

Mayer R, Jacobsen HA. Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques and Tools. arXiv:1903.11314v1 [cs.DC]. 2019. Available from: https://doi.org/10.1145/3363554

Messi Nguelle T, Nzekon Nzeko’o AJ, Onana DD. Parallelization of Recurrent Neural Network training algorithm with implicit aggregation on multi-core architectures. 2024. hal-04542984v2. Available from: https://inria.hal.science/hal-04542984v2/document

Huang Z, Zweig G, Levit M, Dumoulin B, Oguz B, Chang S. Accelerating recurrent neural network training via two stage classes and parallelization. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding. IEEE. 2013;326-331. Available from: https://ieeexplore.ieee.org/abstract/document/6707751

Nievergelt J. Parallel methods for integrating ordinary differential equations. Commun ACM. 1964;7:731-733. Available from: https://doi.org/10.1145/355588.365137

Parallel in time [Internet]. Available from: https://parallel-in-time.org/

Arcucci R, D’Amore L, Carracciuolo L. On the problem-decomposition of scalable 4D-Var Data Assimilation models. In: Proceedings of the International Conference on High Performance Computing and Simulation. Amsterdam, The Netherlands. 2015;5895942. Available from: https://ieeexplore.ieee.org/document/7237097

D’Amore L, Cacciapuoti R. Parallel Dynamic Domain Decomposition in Space-Time for Data Assimilation problems. J Phys Conf Ser. 2021;1828(1):012131. Available from: https://iopscience.iop.org/article/10.1088/1742-6596/1828/1/012131

D’Amore L, Cacciapuoti R. Space-Time Decomposition of Kalman Filter. Numer Math Theor Meth Appl. 2023;16(4):847-882. Available from: https://global-sci.com/article/90230/space-time-decomposition-of-kalman-filter

Elman JL. Finding structure in time. Cogn Sci. 1990;14(2):179-211. Available from: https://onlinelibrary.wiley.com/doi/pdf/10.1207/s15516709cog1402_1

D’Amore L, Murli A. Regularization of a Fourier series method for the Laplace transform inversion with real data. Inverse Problems. 2002;18(4):1185-1205. Available from: https://iopscience.iop.org/article/10.1088/0266-5611/18/4/315/pdf

Murli A, Cuomo S, D’Amore L, Galletti A. Numerical regularization of a real inversion formula based on the Laplace transform’s eigenfunction expansion of the inverse function. Inverse Problems. 2007;23(2):713-731. Available from: https://iopscience.iop.org/article/10.1088/0266-5611/23/2/015/pdf

Gander MJ. Schwarz methods over the course of time. ETNA. 2008;31:228-255. Available from: https://etna.math.kent.edu/vol.31.2008/pp228-255.dir/pp228-255.pdf

Gander W. Least squares with a quadratic constraint. Numer Math. 1980;36:291-307. Available from: https://link.springer.com/article/10.1007/BF01396656

Golub GH, Heath M, Wahba G. Generalized Cross-Validation as a Method for Choosing a Good Ridge Parameter. Technometrics. 1979;21(2):215-223. Available from: https://doi.org/10.2307/1268518

Carracciuolo L, D’Amore L, D’Amore L, Murli A. Towards a parallel component for imaging in PETSc programming environment: A case study in 3-D echocardiography. Parallel Comput. 2006;32(1):67-83. Available from: https://doi.org/10.1016/j.parco.2005.09.001