Statistical Quality and Reproducibility of Pseudorandom Number Generators in Machine Learning Technologies

Machine learning (ML) frameworks rely heavily on pseudorandom number generators (PRNGs) for tasks such as data shuffling, weight initialization, dropout, and optimization. Yet, the statistical quality and reproducibility of these generators—particularly when integrated into frameworks like PyTorch,...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	International Journal of Data Informatics and Intelligent Computing Ročník 4; číslo 3; s. 23 - 32
Hlavný autor:	Antunes, Benjamin
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	20.08.2025
ISSN:	2583-6250, 2583-6250
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Abstract	Machine learning (ML) frameworks rely heavily on pseudorandom number generators (PRNGs) for tasks such as data shuffling, weight initialization, dropout, and optimization. Yet, the statistical quality and reproducibility of these generators—particularly when integrated into frameworks like PyTorch, TensorFlow, and NumPy—are underexplored. In this paper, we compare the statistical quality of PRNGs used in ML frameworks (Mersenne Twister, PCG, and Philox) against their original C implementations. Using the rigorous TestU01 BigCrush test suite, we evaluate 896 independent random streams for each generator. Our results challenge claims of statistical robustness, revealing that even generators labelled "crush-resistant" (e.g., PCG, Philox) may fail certain statistical tests. Surprisingly, we can observe some differences in failure profiles between the native and framework-integrated versions of the same algorithm, highlighting some implementation differences that may exist. Mersenne Twister implementation in Pytorch and Numpy does not have the exact same failure profile as the original implementation in C. In addition, this is also the case for the TensorFlow implementation of Philox.
AbstractList	Machine learning (ML) frameworks rely heavily on pseudorandom number generators (PRNGs) for tasks such as data shuffling, weight initialization, dropout, and optimization. Yet, the statistical quality and reproducibility of these generators—particularly when integrated into frameworks like PyTorch, TensorFlow, and NumPy—are underexplored. In this paper, we compare the statistical quality of PRNGs used in ML frameworks (Mersenne Twister, PCG, and Philox) against their original C implementations. Using the rigorous TestU01 BigCrush test suite, we evaluate 896 independent random streams for each generator. Our results challenge claims of statistical robustness, revealing that even generators labelled "crush-resistant" (e.g., PCG, Philox) may fail certain statistical tests. Surprisingly, we can observe some differences in failure profiles between the native and framework-integrated versions of the same algorithm, highlighting some implementation differences that may exist. Mersenne Twister implementation in Pytorch and Numpy does not have the exact same failure profile as the original implementation in C. In addition, this is also the case for the TensorFlow implementation of Philox.
Author	Antunes, Benjamin
Author_xml	– sequence: 1 givenname: Benjamin orcidid: 0000-0002-0700-6558 surname: Antunes fullname: Antunes, Benjamin
BookMark	eNpN0EtPAjEUBeDGYCIie5f9A4N9zgxLQxRN8M1-0scduARa0s6Y8O9FcOHqntyTnMV3TQYhBiDklrOJnqqS3-HGI7rJt0I5EVxdkKHQtSxKodngX74i45zRMs2VFqJWQ7L56kyHuUNntvSjN1vsDtQETz9hn6LvHVo8_WJL3zP0PqZjG3f0td9ZSHQOAZLpYsoUA30xbo0B6AJMChhWdAluHeI2rhDyDblszTbD-O-OyPLxYTl7KhZv8-fZ_aJwtVRFWUnpalbxVoN1vHItA-m1kNxLU7NSCeenwjhb2lZWzlqlq0pCJUswvjVWjgg7z7oUc07QNvuEO5MODWfNSas5azW_Ws1RS_4A8hVkbQ
ContentType	Journal Article
DBID	AAYXX CITATION
DOI	10.59461/ijdiic.v4i3.214
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList	CrossRef
DeliveryMethod	fulltext_linktorsrc
EISSN	2583-6250
EndPage	32
ExternalDocumentID	10_59461_ijdiic_v4i3_214
GroupedDBID	AAYXX CITATION M~E
ID	FETCH-LOGICAL-c834-6733c8071f5ebc17cf0e3d5231d3a80642cd92acb6bf37cbb45773e736eadfab3
ISSN	2583-6250
IngestDate	Sat Nov 29 07:37:55 EST 2025
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Issue	3
Language	English
License	https://creativecommons.org/licenses/by-sa/4.0
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c834-6733c8071f5ebc17cf0e3d5231d3a80642cd92acb6bf37cbb45773e736eadfab3
ORCID	0000-0002-0700-6558
OpenAccessLink	https://ijdiic.com/index.php/research/article/download/214/145
PageCount	10
ParticipantIDs	crossref_primary_10_59461_ijdiic_v4i3_214
PublicationCentury	2000
PublicationDate	2025-08-20
PublicationDateYYYYMMDD	2025-08-20
PublicationDate_xml	– month: 08 year: 2025 text: 2025-08-20 day: 20
PublicationDecade	2020
PublicationTitle	International Journal of Data Informatics and Intelligent Computing
PublicationYear	2025
SSID	ssib051452284
Score	1.9189041
Snippet	Machine learning (ML) frameworks rely heavily on pseudorandom number generators (PRNGs) for tasks such as data shuffling, weight initialization, dropout, and...
SourceID	crossref
SourceType	Index Database
StartPage	23
Title	Statistical Quality and Reproducibility of Pseudorandom Number Generators in Machine Learning Technologies
Volume	4
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2583-6250 dateEnd: 99991231 omitProxy: false ssIdentifier: ssib051452284 issn: 2583-6250 databaseCode: M~E dateStart: 20220101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lj9MwELaqhQMXBALEWz5wQVWW4Ilj58hjEUhLtYce9hbZjiNSsW6121bLZf_E_mHGj6TZCiT2wCWKrGjUer6MZyYz3xDyRhpZ5kqxLAcwGKBAm1VMVVlpdYHnEVc6krgei9lMnp5WJ5PJdd8Ls_0pnJOXl9Xqv6oa11DZvnX2FuoehOIC3qPS8Ypqx-s_Kd67j4F9GTc_MmT8Sl2Iq8DuGsthw4f1kwu7aRADrlmeTWdhNkjioQ4zeDrn5xL98H7ocZ9BGVLxffHhYlcKfzOzmNzcz2qtpqnpaaCE_jYQga6nca5Ef4KGjMR6kwYIfLRuoc4SOXjKTTDuk60s35kwxiVkGGHFJfuHtWSDixHUYGxPYXQyx0Tovs3nVVF6o98tmq4zh9uig0MWG1Nv0mvvHXtDMSKGQUFGHSXUXkLN_HD0O0zwytcJfr866o0UOpjosoZJ1sM_iZ-_g5B3ez9j5O6M_Jb5A3I_aYJ-iEB5SCbWPSKLEUhoAglFxdA9kNBlS8cgoREkdAcS2jmaQEJ7kNAxSB6T-Zej-aevWZq6kRkJhW8FASPR8Wy51ea9MG1uoeEYBjSgpA9XTYNvs9GlbkEYrQsuBFgBJdqkVml4Qg7c0tmnhOZGNspA20hZFlx7oqCcKR8AWxC20s_I235v6lXkVqn_poznt3j2Bbm3A-NLcrA-39hX5K7Z4saevw7a_A3pPHw7
linkProvider	ISSN International Centre
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Statistical+Quality+and+Reproducibility+of+Pseudorandom+Number+Generators+in+Machine+Learning+Technologies&rft.jtitle=International+Journal+of+Data+Informatics+and+Intelligent+Computing&rft.au=Antunes%2C+Benjamin&rft.date=2025-08-20&rft.issn=2583-6250&rft.eissn=2583-6250&rft.volume=4&rft.issue=3&rft.spage=23&rft.epage=32&rft_id=info:doi/10.59461%2Fijdiic.v4i3.214&rft.externalDBID=n%2Fa&rft.externalDocID=10_59461_ijdiic_v4i3_214
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2583-6250&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2583-6250&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2583-6250&client=summon