Statistical Quality and Reproducibility of Pseudorandom Number Generators in Machine Learning Technologies

Machine learning (ML) frameworks rely heavily on pseudorandom number generators (PRNGs) for tasks such as data shuffling, weight initialization, dropout, and optimization. Yet, the statistical quality and reproducibility of these generators—particularly when integrated into frameworks like PyTorch,...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:International Journal of Data Informatics and Intelligent Computing Ročník 4; číslo 3; s. 23 - 32
Hlavný autor: Antunes, Benjamin
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: 20.08.2025
ISSN:2583-6250, 2583-6250
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Machine learning (ML) frameworks rely heavily on pseudorandom number generators (PRNGs) for tasks such as data shuffling, weight initialization, dropout, and optimization. Yet, the statistical quality and reproducibility of these generators—particularly when integrated into frameworks like PyTorch, TensorFlow, and NumPy—are underexplored. In this paper, we compare the statistical quality of PRNGs used in ML frameworks (Mersenne Twister, PCG, and Philox) against their original C implementations. Using the rigorous TestU01 BigCrush test suite, we evaluate 896 independent random streams for each generator. Our results challenge claims of statistical robustness, revealing that even generators labelled "crush-resistant" (e.g., PCG, Philox) may fail certain statistical tests. Surprisingly, we can observe some differences in failure profiles between the native and framework-integrated versions of the same algorithm, highlighting some implementation differences that may exist. Mersenne Twister implementation in Pytorch and Numpy does not have the exact same failure profile as the original implementation in C. In addition, this is also the case for the TensorFlow implementation of Philox.
AbstractList Machine learning (ML) frameworks rely heavily on pseudorandom number generators (PRNGs) for tasks such as data shuffling, weight initialization, dropout, and optimization. Yet, the statistical quality and reproducibility of these generators—particularly when integrated into frameworks like PyTorch, TensorFlow, and NumPy—are underexplored. In this paper, we compare the statistical quality of PRNGs used in ML frameworks (Mersenne Twister, PCG, and Philox) against their original C implementations. Using the rigorous TestU01 BigCrush test suite, we evaluate 896 independent random streams for each generator. Our results challenge claims of statistical robustness, revealing that even generators labelled "crush-resistant" (e.g., PCG, Philox) may fail certain statistical tests. Surprisingly, we can observe some differences in failure profiles between the native and framework-integrated versions of the same algorithm, highlighting some implementation differences that may exist. Mersenne Twister implementation in Pytorch and Numpy does not have the exact same failure profile as the original implementation in C. In addition, this is also the case for the TensorFlow implementation of Philox.
Author Antunes, Benjamin
Author_xml – sequence: 1
  givenname: Benjamin
  orcidid: 0000-0002-0700-6558
  surname: Antunes
  fullname: Antunes, Benjamin
BookMark eNpN0EtPAjEUBeDGYCIie5f9A4N9zgxLQxRN8M1-0scduARa0s6Y8O9FcOHqntyTnMV3TQYhBiDklrOJnqqS3-HGI7rJt0I5EVxdkKHQtSxKodngX74i45zRMs2VFqJWQ7L56kyHuUNntvSjN1vsDtQETz9hn6LvHVo8_WJL3zP0PqZjG3f0td9ZSHQOAZLpYsoUA30xbo0B6AJMChhWdAluHeI2rhDyDblszTbD-O-OyPLxYTl7KhZv8-fZ_aJwtVRFWUnpalbxVoN1vHItA-m1kNxLU7NSCeenwjhb2lZWzlqlq0pCJUswvjVWjgg7z7oUc07QNvuEO5MODWfNSas5azW_Ws1RS_4A8hVkbQ
ContentType Journal Article
DBID AAYXX
CITATION
DOI 10.59461/ijdiic.v4i3.214
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
DeliveryMethod fulltext_linktorsrc
EISSN 2583-6250
EndPage 32
ExternalDocumentID 10_59461_ijdiic_v4i3_214
GroupedDBID AAYXX
CITATION
M~E
ID FETCH-LOGICAL-c834-6733c8071f5ebc17cf0e3d5231d3a80642cd92acb6bf37cbb45773e736eadfab3
ISSN 2583-6250
IngestDate Sat Nov 29 07:37:55 EST 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Issue 3
Language English
License https://creativecommons.org/licenses/by-sa/4.0
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c834-6733c8071f5ebc17cf0e3d5231d3a80642cd92acb6bf37cbb45773e736eadfab3
ORCID 0000-0002-0700-6558
OpenAccessLink https://ijdiic.com/index.php/research/article/download/214/145
PageCount 10
ParticipantIDs crossref_primary_10_59461_ijdiic_v4i3_214
PublicationCentury 2000
PublicationDate 2025-08-20
PublicationDateYYYYMMDD 2025-08-20
PublicationDate_xml – month: 08
  year: 2025
  text: 2025-08-20
  day: 20
PublicationDecade 2020
PublicationTitle International Journal of Data Informatics and Intelligent Computing
PublicationYear 2025
SSID ssib051452284
Score 1.9189041
Snippet Machine learning (ML) frameworks rely heavily on pseudorandom number generators (PRNGs) for tasks such as data shuffling, weight initialization, dropout, and...
SourceID crossref
SourceType Index Database
StartPage 23
Title Statistical Quality and Reproducibility of Pseudorandom Number Generators in Machine Learning Technologies
Volume 4
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2583-6250
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssib051452284
  issn: 2583-6250
  databaseCode: M~E
  dateStart: 20220101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lj9MwELaqhQMXBALEWz5wQVWW4Ilj58hjEUhLtYce9hbZjiNSsW6121bLZf_E_mHGj6TZCiT2wCWKrGjUer6MZyYz3xDyRhpZ5kqxLAcwGKBAm1VMVVlpdYHnEVc6krgei9lMnp5WJ5PJdd8Ls_0pnJOXl9Xqv6oa11DZvnX2FuoehOIC3qPS8Ypqx-s_Kd67j4F9GTc_MmT8Sl2Iq8DuGsthw4f1kwu7aRADrlmeTWdhNkjioQ4zeDrn5xL98H7ocZ9BGVLxffHhYlcKfzOzmNzcz2qtpqnpaaCE_jYQga6nca5Ef4KGjMR6kwYIfLRuoc4SOXjKTTDuk60s35kwxiVkGGHFJfuHtWSDixHUYGxPYXQyx0Tovs3nVVF6o98tmq4zh9uig0MWG1Nv0mvvHXtDMSKGQUFGHSXUXkLN_HD0O0zwytcJfr866o0UOpjosoZJ1sM_iZ-_g5B3ez9j5O6M_Jb5A3I_aYJ-iEB5SCbWPSKLEUhoAglFxdA9kNBlS8cgoREkdAcS2jmaQEJ7kNAxSB6T-Zej-aevWZq6kRkJhW8FASPR8Wy51ea9MG1uoeEYBjSgpA9XTYNvs9GlbkEYrQsuBFgBJdqkVml4Qg7c0tmnhOZGNspA20hZFlx7oqCcKR8AWxC20s_I235v6lXkVqn_poznt3j2Bbm3A-NLcrA-39hX5K7Z4saevw7a_A3pPHw7
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Statistical+Quality+and+Reproducibility+of+Pseudorandom+Number+Generators+in+Machine+Learning+Technologies&rft.jtitle=International+Journal+of+Data+Informatics+and+Intelligent+Computing&rft.au=Antunes%2C+Benjamin&rft.date=2025-08-20&rft.issn=2583-6250&rft.eissn=2583-6250&rft.volume=4&rft.issue=3&rft.spage=23&rft.epage=32&rft_id=info:doi/10.59461%2Fijdiic.v4i3.214&rft.externalDBID=n%2Fa&rft.externalDocID=10_59461_ijdiic_v4i3_214
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2583-6250&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2583-6250&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2583-6250&client=summon