Statistical Quality and Reproducibility of Pseudorandom Number Generators in Machine Learning Technologies
Machine learning (ML) frameworks rely heavily on pseudorandom number generators (PRNGs) for tasks such as data shuffling, weight initialization, dropout, and optimization. Yet, the statistical quality and reproducibility of these generators—particularly when integrated into frameworks like PyTorch,...
Uložené v:
| Vydané v: | International Journal of Data Informatics and Intelligent Computing Ročník 4; číslo 3; s. 23 - 32 |
|---|---|
| Hlavný autor: | |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
20.08.2025
|
| ISSN: | 2583-6250, 2583-6250 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Machine learning (ML) frameworks rely heavily on pseudorandom number generators (PRNGs) for tasks such as data shuffling, weight initialization, dropout, and optimization. Yet, the statistical quality and reproducibility of these generators—particularly when integrated into frameworks like PyTorch, TensorFlow, and NumPy—are underexplored. In this paper, we compare the statistical quality of PRNGs used in ML frameworks (Mersenne Twister, PCG, and Philox) against their original C implementations. Using the rigorous TestU01 BigCrush test suite, we evaluate 896 independent random streams for each generator. Our results challenge claims of statistical robustness, revealing that even generators labelled "crush-resistant" (e.g., PCG, Philox) may fail certain statistical tests. Surprisingly, we can observe some differences in failure profiles between the native and framework-integrated versions of the same algorithm, highlighting some implementation differences that may exist. Mersenne Twister implementation in Pytorch and Numpy does not have the exact same failure profile as the original implementation in C. In addition, this is also the case for the TensorFlow implementation of Philox. |
|---|---|
| AbstractList | Machine learning (ML) frameworks rely heavily on pseudorandom number generators (PRNGs) for tasks such as data shuffling, weight initialization, dropout, and optimization. Yet, the statistical quality and reproducibility of these generators—particularly when integrated into frameworks like PyTorch, TensorFlow, and NumPy—are underexplored. In this paper, we compare the statistical quality of PRNGs used in ML frameworks (Mersenne Twister, PCG, and Philox) against their original C implementations. Using the rigorous TestU01 BigCrush test suite, we evaluate 896 independent random streams for each generator. Our results challenge claims of statistical robustness, revealing that even generators labelled "crush-resistant" (e.g., PCG, Philox) may fail certain statistical tests. Surprisingly, we can observe some differences in failure profiles between the native and framework-integrated versions of the same algorithm, highlighting some implementation differences that may exist. Mersenne Twister implementation in Pytorch and Numpy does not have the exact same failure profile as the original implementation in C. In addition, this is also the case for the TensorFlow implementation of Philox. |
| Author | Antunes, Benjamin |
| Author_xml | – sequence: 1 givenname: Benjamin orcidid: 0000-0002-0700-6558 surname: Antunes fullname: Antunes, Benjamin |
| BookMark | eNpN0EtPAjEUBeDGYCIie5f9A4N9zgxLQxRN8M1-0scduARa0s6Y8O9FcOHqntyTnMV3TQYhBiDklrOJnqqS3-HGI7rJt0I5EVxdkKHQtSxKodngX74i45zRMs2VFqJWQ7L56kyHuUNntvSjN1vsDtQETz9hn6LvHVo8_WJL3zP0PqZjG3f0td9ZSHQOAZLpYsoUA30xbo0B6AJMChhWdAluHeI2rhDyDblszTbD-O-OyPLxYTl7KhZv8-fZ_aJwtVRFWUnpalbxVoN1vHItA-m1kNxLU7NSCeenwjhb2lZWzlqlq0pCJUswvjVWjgg7z7oUc07QNvuEO5MODWfNSas5azW_Ws1RS_4A8hVkbQ |
| ContentType | Journal Article |
| DBID | AAYXX CITATION |
| DOI | 10.59461/ijdiic.v4i3.214 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| DeliveryMethod | fulltext_linktorsrc |
| EISSN | 2583-6250 |
| EndPage | 32 |
| ExternalDocumentID | 10_59461_ijdiic_v4i3_214 |
| GroupedDBID | AAYXX CITATION M~E |
| ID | FETCH-LOGICAL-c834-6733c8071f5ebc17cf0e3d5231d3a80642cd92acb6bf37cbb45773e736eadfab3 |
| ISSN | 2583-6250 |
| IngestDate | Sat Nov 29 07:37:55 EST 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | false |
| Issue | 3 |
| Language | English |
| License | https://creativecommons.org/licenses/by-sa/4.0 |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c834-6733c8071f5ebc17cf0e3d5231d3a80642cd92acb6bf37cbb45773e736eadfab3 |
| ORCID | 0000-0002-0700-6558 |
| OpenAccessLink | https://ijdiic.com/index.php/research/article/download/214/145 |
| PageCount | 10 |
| ParticipantIDs | crossref_primary_10_59461_ijdiic_v4i3_214 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-08-20 |
| PublicationDateYYYYMMDD | 2025-08-20 |
| PublicationDate_xml | – month: 08 year: 2025 text: 2025-08-20 day: 20 |
| PublicationDecade | 2020 |
| PublicationTitle | International Journal of Data Informatics and Intelligent Computing |
| PublicationYear | 2025 |
| SSID | ssib051452284 |
| Score | 1.9189041 |
| Snippet | Machine learning (ML) frameworks rely heavily on pseudorandom number generators (PRNGs) for tasks such as data shuffling, weight initialization, dropout, and... |
| SourceID | crossref |
| SourceType | Index Database |
| StartPage | 23 |
| Title | Statistical Quality and Reproducibility of Pseudorandom Number Generators in Machine Learning Technologies |
| Volume | 4 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2583-6250 dateEnd: 99991231 omitProxy: false ssIdentifier: ssib051452284 issn: 2583-6250 databaseCode: M~E dateStart: 20220101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lj9MwELaqhQMXBALEWz5wQVWW4Ilj58hjEUhLtYce9hbZjiNSsW6121bLZf_E_mHGj6TZCiT2wCWKrGjUer6MZyYz3xDyRhpZ5kqxLAcwGKBAm1VMVVlpdYHnEVc6krgei9lMnp5WJ5PJdd8Ls_0pnJOXl9Xqv6oa11DZvnX2FuoehOIC3qPS8Ypqx-s_Kd67j4F9GTc_MmT8Sl2Iq8DuGsthw4f1kwu7aRADrlmeTWdhNkjioQ4zeDrn5xL98H7ocZ9BGVLxffHhYlcKfzOzmNzcz2qtpqnpaaCE_jYQga6nca5Ef4KGjMR6kwYIfLRuoc4SOXjKTTDuk60s35kwxiVkGGHFJfuHtWSDixHUYGxPYXQyx0Tovs3nVVF6o98tmq4zh9uig0MWG1Nv0mvvHXtDMSKGQUFGHSXUXkLN_HD0O0zwytcJfr866o0UOpjosoZJ1sM_iZ-_g5B3ez9j5O6M_Jb5A3I_aYJ-iEB5SCbWPSKLEUhoAglFxdA9kNBlS8cgoREkdAcS2jmaQEJ7kNAxSB6T-Zej-aevWZq6kRkJhW8FASPR8Wy51ea9MG1uoeEYBjSgpA9XTYNvs9GlbkEYrQsuBFgBJdqkVml4Qg7c0tmnhOZGNspA20hZFlx7oqCcKR8AWxC20s_I235v6lXkVqn_poznt3j2Bbm3A-NLcrA-39hX5K7Z4saevw7a_A3pPHw7 |
| linkProvider | ISSN International Centre |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Statistical+Quality+and+Reproducibility+of+Pseudorandom+Number+Generators+in+Machine+Learning+Technologies&rft.jtitle=International+Journal+of+Data+Informatics+and+Intelligent+Computing&rft.au=Antunes%2C+Benjamin&rft.date=2025-08-20&rft.issn=2583-6250&rft.eissn=2583-6250&rft.volume=4&rft.issue=3&rft.spage=23&rft.epage=32&rft_id=info:doi/10.59461%2Fijdiic.v4i3.214&rft.externalDBID=n%2Fa&rft.externalDocID=10_59461_ijdiic_v4i3_214 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2583-6250&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2583-6250&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2583-6250&client=summon |