An Open Data Service for Supporting Research in Machine Learning on Tokamak Data
The increasing complexity and volume of plasma fusion experimental data, coupled with the growing adoption of machine learning in fusion research, necessitate advanced and efficient data management solutions. We propose an open data service for fusion experiments operated by the UKAEA, designed to a...
Uložené v:
| Vydané v: | IEEE transactions on plasma science Ročník 53; číslo 9; s. 2440 - 2449 |
|---|---|
| Hlavní autori: | , , , , , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
01.09.2025
|
| Predmet: | |
| ISSN: | 0093-3813, 1939-9375 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | The increasing complexity and volume of plasma fusion experimental data, coupled with the growing adoption of machine learning in fusion research, necessitate advanced and efficient data management solutions. We propose an open data service for fusion experiments operated by the UKAEA, designed to address the evolving needs of machine-learning-driven fusion research. Our system provides a framework to organize MAST, MAST upgrade (MAST-U), and Joint European Torus (JET) experimental data in accordance with findability, accessibility, interoperability, and reuse (FAIR) principles, using distributed object storage for scalability and a relational database for efficient metadata indexing. In addition, it offers simplified abstractions through an application programming interface (API), facilitating seamless data access and integration with data analysis and machine learning workflows. Performance evaluation of metrics such as data load time and throughput, across varying numbers of parallel workers, demonstrates the data pipeline's optimization for efficient machine learning application development. Our solution significantly enhances support for data-driven research and machine learning applications in fusion by laying the groundwork for open, FAIR-compliant fusion data, which enables cross-machine analysis, prompts international collaboration, and potentially accelerates advancements in fusion energy research. |
|---|---|
| AbstractList | The increasing complexity and volume of plasma fusion experimental data, coupled with the growing adoption of machine learning in fusion research, necessitate advanced and efficient data management solutions. We propose an open data service for fusion experiments operated by the UKAEA, designed to address the evolving needs of machine-learning-driven fusion research. Our system provides a framework to organize MAST, MAST upgrade (MAST-U), and Joint European Torus (JET) experimental data in accordance with findability, accessibility, interoperability, and reuse (FAIR) principles, using distributed object storage for scalability and a relational database for efficient metadata indexing. In addition, it offers simplified abstractions through an application programming interface (API), facilitating seamless data access and integration with data analysis and machine learning workflows. Performance evaluation of metrics such as data load time and throughput, across varying numbers of parallel workers, demonstrates the data pipeline's optimization for efficient machine learning application development. Our solution significantly enhances support for data-driven research and machine learning applications in fusion by laying the groundwork for open, FAIR-compliant fusion data, which enables cross-machine analysis, prompts international collaboration, and potentially accelerates advancements in fusion energy research. |
| Author | Cummings, Nathan Jackson, Samuel Akers, Rob Hodson, James Thiyagalingam, Jeyan Khan, Saiful Pamela, Stanislas de Witt, Shaun |
| Author_xml | – sequence: 1 givenname: Samuel orcidid: 0000-0001-5301-5095 surname: Jackson fullname: Jackson, Samuel email: samuel.jackson@ukaea.uk organization: U.K. Atomic Energy Authority (UKAEA), Culham Centre for Fusion Energy, Culham Science Centre, Abingdon, U.K – sequence: 2 givenname: Saiful orcidid: 0000-0002-6796-5670 surname: Khan fullname: Khan, Saiful email: saiful.khan@stfc.ac.uk organization: Scientific Computing Department, Science and Technology Facilities Council, Rutherford Appleton Laboratory, Harwell Campus, Didcot, U.K – sequence: 3 givenname: Nathan orcidid: 0000-0003-4359-6337 surname: Cummings fullname: Cummings, Nathan email: nathan.cummings@ukaea.uk organization: U.K. Atomic Energy Authority (UKAEA), Culham Centre for Fusion Energy, Culham Science Centre, Abingdon, U.K – sequence: 4 givenname: James orcidid: 0009-0002-4797-3419 surname: Hodson fullname: Hodson, James email: james.hodson@ukaea.uk organization: U.K. Atomic Energy Authority (UKAEA), Culham Centre for Fusion Energy, Culham Science Centre, Abingdon, U.K – sequence: 5 givenname: Shaun orcidid: 0000-0003-4196-3658 surname: de Witt fullname: de Witt, Shaun email: shaun.de-witt@ukaea.uk organization: U.K. Atomic Energy Authority (UKAEA), Culham Centre for Fusion Energy, Culham Science Centre, Abingdon, U.K – sequence: 6 givenname: Stanislas orcidid: 0000-0001-8854-1749 surname: Pamela fullname: Pamela, Stanislas email: Stanislas.Pamela@ukaea.uk organization: U.K. Atomic Energy Authority (UKAEA), Culham Centre for Fusion Energy, Culham Science Centre, Abingdon, U.K – sequence: 7 givenname: Rob surname: Akers fullname: Akers, Rob email: rob.akers@ukaea.uk organization: U.K. Atomic Energy Authority (UKAEA), Culham Centre for Fusion Energy, Culham Science Centre, Abingdon, U.K – sequence: 8 givenname: Jeyan orcidid: 0000-0002-2167-1343 surname: Thiyagalingam fullname: Thiyagalingam, Jeyan email: t.jeyan@stfc.ac.uk organization: Scientific Computing Department, Science and Technology Facilities Council, Rutherford Appleton Laboratory, Harwell Campus, Didcot, U.K |
| BookMark | eNpFkMtOwkAUhicGEwHdu3AxL1A8Z6bTdpYErwkGIrhuDtNTqci0mVYT394iJK7-5L8tvpEY-NqzENcIE0Swt-vlaqJAmYk2mY7RnokhWm0jq1MzEEMAqyOdob4Qo7b9AMDYgBqK5dTLRcNe3lFHcsXhu3IsyzrI1VfT1KGr_Lt85ZYpuK2svHwht608y3nv-ENYe7mud7Sn3d_HpTgv6bPlq5OOxdvD_Xr2FM0Xj8-z6TxyKsEuiklhUW6My5TVWWYImJxG45I4YUiKsrSJjZkV0IaUKWijUuVSU2hDxA71WMDx14W6bQOXeROqPYWfHCE_EMl7IvmBSH4i0k9ujpOKmf_riCqzYPQvQlVenQ |
| CODEN | ITPSBD |
| Cites_doi | 10.1109/TPS.2023.3268170 10.1038/sdata.2016.18 10.1088/1361-6587/ac8618 10.14778/3025111.3025117 10.1016/j.fusengdes.2018.02.003 10.1038/s41597-020-00771-0 10.1080/17538947.2014.1003106 10.1088/1361-6587/acc60f 10.1007/s42488-022-00068-4 10.1088/0029-5515/41/10/310 10.1109/TPS.2022.3223732 10.1007/978-3-642-22351-8_1 10.1016/j.proeng.2016.07.449 10.5334/jors.148 10.1007/978-3-642-33299-9_1 10.1088/0029-5515/45/10/s13 10.1016/j.fusengdes.2007.03.029 10.1088/1741-4326/ab121c 10.1109/ACCESS.2023.3245043 10.1109/TSC.2022.3164146 10.1088/1741-4326/ad346e 10.1016/j.softx.2024.101869 10.1088/0029-5515/49/10/104017 |
| ContentType | Journal Article |
| DBID | 97E ESBDL RIA RIE AAYXX CITATION |
| DOI | 10.1109/TPS.2025.3583419 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE Xplore Open Access Journals IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE/IET Electronic Library CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Applied Sciences Physics |
| EISSN | 1939-9375 |
| EndPage | 2449 |
| ExternalDocumentID | 10_1109_TPS_2025_3583419 11128905 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: UK Atomic Energy Authority; UKAEA funderid: 10.13039/100008516 |
| GroupedDBID | -~X .DC 0R~ 29I 4.4 53G 5GY 5VS 6IK 97E AAJGR AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACGOD ACIWK ACNCT AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD ESBDL HZ~ H~9 IAAWW IBMZZ ICLAB IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNS TAE TN5 TWZ VH1 AAYXX CITATION |
| ID | FETCH-LOGICAL-c261t-4a21dfb5c8293885a0eac315c646e06dff9694ee20aba25dab272c75d35aaec13 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001556121000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0093-3813 |
| IngestDate | Sat Nov 29 07:27:18 EST 2025 Wed Oct 01 07:05:15 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 9 |
| Language | English |
| License | https://creativecommons.org/licenses/by/4.0/legalcode |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c261t-4a21dfb5c8293885a0eac315c646e06dff9694ee20aba25dab272c75d35aaec13 |
| ORCID | 0009-0002-4797-3419 0000-0001-8854-1749 0000-0002-2167-1343 0000-0003-4196-3658 0000-0002-6796-5670 0000-0003-4359-6337 0000-0001-5301-5095 |
| OpenAccessLink | https://ieeexplore.ieee.org/document/11128905 |
| PageCount | 10 |
| ParticipantIDs | crossref_primary_10_1109_TPS_2025_3583419 ieee_primary_11128905 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-09-01 |
| PublicationDateYYYYMMDD | 2025-09-01 |
| PublicationDate_xml | – month: 09 year: 2025 text: 2025-09-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationTitle | IEEE transactions on plasma science |
| PublicationTitleAbbrev | TPS |
| PublicationYear | 2025 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| References | ref13 Ciattaglia (ref2) ref14 (ref23) 2025 ref52 ref11 ref10 Pinches (ref24) 2025 ref19 (ref47) 2025 (ref37) 2025 Holtkamp (ref1) 2007; 82 (ref27) 2024 ref45 Bommasani (ref7) 2021 ref44 ref43 Li (ref6) 2023 ref8 (ref22) 2019 (ref36) 2025 ref9 ref4 (ref21) 2023 ref3 (ref41) 2025 ref5 (ref16) 2025 ref30 (ref48) 2025 ref33 ref32 (ref51) 2025 (ref28) 2024 (ref38) 2025 (ref25) 2025 (ref35) 2025 ref39 Wan (ref17) 2022; 50 Sammuli (ref15) 2018; 129 (ref42) 2025 Bednar (ref12) (ref49) 2025 ref26 (ref29) 2025 (ref34) 2025 (ref40) 2025 Paszke (ref50) 2019 (ref20) 2020 (ref46) 2025 (ref18) 2024 Liu (ref31) 2016; 154 |
| References_xml | – volume-title: 25 Years of Massive Fusion Energy Experiment Data Open on the ’cloud’ and Available To Everyone year: 2024 ident: ref18 – volume-title: DataCite year: 2019 ident: ref22 – ident: ref4 doi: 10.1109/TPS.2023.3268170 – ident: ref13 doi: 10.1038/sdata.2016.18 – volume-title: Dublin Core year: 2020 ident: ref20 – ident: ref26 doi: 10.1088/1361-6587/ac8618 – ident: ref32 doi: 10.14778/3025111.3025117 – volume-title: Kerchunk - Kerchunk Documentation year: 2025 ident: ref40 – volume-title: Ceph year: 2025 ident: ref48 – volume: 129 start-page: 12 year: 2018 ident: ref15 article-title: TokSearch: A search engine for fusion experimental data publication-title: Fusion Eng. Design doi: 10.1016/j.fusengdes.2018.02.003 – ident: ref19 doi: 10.1038/s41597-020-00771-0 – volume-title: DCAT year: 2023 ident: ref21 – ident: ref33 doi: 10.1080/17538947.2014.1003106 – volume-title: Pint: Makes Units Easy - Pint Documentation year: 2025 ident: ref46 – volume-title: HDF Format year: 2025 ident: ref34 – volume-title: IMAS Data Dictionary Documentation year: 2025 ident: ref23 – year: 2019 ident: ref50 article-title: PyTorch: An imperative style, high-performance deep learning library publication-title: arXiv:1912.01703 – volume-title: Ukaea/fair-mast year: 2024 ident: ref28 – volume-title: Rasdaman year: 2025 ident: ref29 – volume-title: Ukaea/UDA year: 2024 ident: ref27 – ident: ref5 doi: 10.1088/1361-6587/acc60f – volume-title: FAIR Principles year: 2025 ident: ref25 – ident: ref45 doi: 10.1007/s42488-022-00068-4 – start-page: 1 volume-title: Proc. IEEE Int. Conf. Environ. Electr. Eng. IEEE Ind. Commercial Power Syst. Eur. ident: ref2 article-title: The European DEMO fusion reactor: Design status and challenges from balance of plant point of view – start-page: 85 volume-title: Proc. Python Sci. Conf. ident: ref12 article-title: The pandata scalable open-source analysis stack – ident: ref8 doi: 10.1088/0029-5515/41/10/310 – volume: 50 start-page: 4980 issue: 12 year: 2022 ident: ref17 article-title: A robust and fast data management system for machine-learning research of tokamaks publication-title: IEEE Trans. Plasma Sci. doi: 10.1109/TPS.2022.3223732 – ident: ref30 doi: 10.1007/978-3-642-22351-8_1 – volume: 154 start-page: 207 year: 2016 ident: ref31 article-title: Managing large multidimensional array hydrologic datasets: A case study comparing NetCDF and SciDB publication-title: Proc. Eng. doi: 10.1016/j.proeng.2016.07.449 – volume-title: GraphQL year: 2025 ident: ref42 – volume-title: Introduction To the Integrated Modelling & Analysis Suite (IMAS) year: 2025 ident: ref24 – year: 2021 ident: ref7 article-title: On the opportunities and risks of foundation models publication-title: arXiv:2108.07258 – ident: ref39 doi: 10.5334/jors.148 – year: 2023 ident: ref6 article-title: Multimodal foundation models: From specialists to general-purpose assistants publication-title: arXiv:2309.10020 – volume-title: Tiledb Slicing Benchmarks year: 2025 ident: ref38 – volume-title: Cambridge Service for Data Driven Discovery (CSD3) year: 2025 ident: ref51 – ident: ref3 doi: 10.1007/978-3-642-33299-9_1 – volume-title: PostgreSQL JSON Types year: 2025 ident: ref41 – volume-title: Dask year: 2025 ident: ref49 – volume-title: Zarr File Format year: 2025 ident: ref36 – ident: ref9 doi: 10.1088/0029-5515/45/10/s13 – volume-title: Amazon Sustainability Data Initiative year: 2025 ident: ref37 – volume: 82 start-page: 427 issue: 5 year: 2007 ident: ref1 article-title: An overview of the ITER project publication-title: Fusion Eng. Design doi: 10.1016/j.fusengdes.2007.03.029 – ident: ref11 doi: 10.1088/1741-4326/ab121c – ident: ref44 doi: 10.1109/ACCESS.2023.3245043 – volume-title: NetCDF Format year: 2025 ident: ref35 – volume-title: TokSearch year: 2025 ident: ref16 – ident: ref43 doi: 10.1109/TSC.2022.3164146 – ident: ref52 doi: 10.1088/1741-4326/ad346e – ident: ref14 doi: 10.1016/j.softx.2024.101869 – volume-title: MAST Catalog year: 2025 ident: ref47 – ident: ref10 doi: 10.1088/0029-5515/49/10/104017 |
| SSID | ssj0014502 |
| Score | 2.4469361 |
| Snippet | The increasing complexity and volume of plasma fusion experimental data, coupled with the growing adoption of machine learning in fusion research, necessitate... |
| SourceID | crossref ieee |
| SourceType | Index Database Publisher |
| StartPage | 2440 |
| SubjectTerms | Application programming interfaces Collaboration Computational modeling Data analysis Fusion data Interoperability Machine learning Metadata Open data scientific data management Tokamak devices Transforms web service application programming interface (API) |
| Title | An Open Data Service for Supporting Research in Machine Learning on Tokamak Data |
| URI | https://ieeexplore.ieee.org/document/11128905 |
| Volume | 53 |
| WOSCitedRecordID | wos001556121000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE/IET Electronic Library customDbUrl: eissn: 1939-9375 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014502 issn: 0093-3813 databaseCode: RIE dateStart: 19730101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELagAomFQimivOSBhSGtm8SvsQIqBqgqUaRu0cWxq6oiQX3w-7Edl8fAwBZZSWTd2b7vfN_dIXTDJSRWszpyCXpRGhcsEiqFiFn4zARXkCvfteSJj0ZiOpXjkKzuc2G01p58prvu0cfyi0pt3FVZz-5LFxeju2iXc14na32FDFJK6tLgMomsGUq2MUkie5Pxi_UEY9pNqHD1y37ZoB9NVbxNGTb_OZsjdBjAIx7U2j5GO7psoWYAkjhs01UL7Xtep1qdoPGgxI4zgu9hDTgcDNgCVezaeVauhMAMb9l3eF7iZ0-u1DjUXZ3hqsSTagFvsPD_aKPX4cPk7jEKTRQiZZ2jdZRC3C9MTpWwhl0ICsQetUmfKpYyTVhhjGQy1TomkENMC8hjHitOi4QCaNVPTlGjrEp9hrC70jFMWEwpIE1MDoYQA0wKDdowLjvodivW7L2ulZF5H4PIzKogcyrIggo6qO0k-v1eEOb5H-MX6MB9XrO7LlFjvdzoK7SnPtbz1fLar4RPfpawBw |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8MwDLZggODCc4jxzIELh0LXJmlyRMAEYkyTGBK3yk3TaZpo0R78fpI043HgwC2qoqiyk9iOP38GOE8kxkazOrAFegGNch4IRTHgxn3mIlGYKde1pJv0euL1VfZ9sbqrhdFaO_CZvrRDl8vPKzW3T2VX5lzavBhbhhVGadSuy7W-kgaUhTU5uIwDY4jiRVYylFeD_rOJBSN2GTNhGcx-WaEfbVWcVels_fN_tmHTu4_kutb3Dizpche2vCtJ_EGd7sKaQ3aq6R70r0tiUSPkFmdI_NVAjKtKbEPPypIIDMkCf0dGJXly8EpNPPPqkFQlGVRjfMOxW6MJL527wc194NsoBMqER7OAYtTOi4wpYUy7EAxDc9nGbaY45TrkeVFILqnWUYgZRizHLEoilbA8ZohateN9aJRVqQ-A2EedggvjVQqkcZFhEYYFcik06oInsgUXC7Gm7zVbRuqijFCmRgWpVUHqVdCCppXo9zwvzMM_vp_B-v3gqZt2H3qPR7Bhl6qxXsfQmE3m-gRW1cdsNJ2cul3xCdZLs04 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+Open+Data+Service+for+Supporting+Research+in+Machine+Learning+on+Tokamak+Data&rft.jtitle=IEEE+transactions+on+plasma+science&rft.au=Jackson%2C+Samuel&rft.au=Khan%2C+Saiful&rft.au=Cummings%2C+Nathan&rft.au=Hodson%2C+James&rft.date=2025-09-01&rft.issn=0093-3813&rft.eissn=1939-9375&rft.volume=53&rft.issue=9&rft.spage=2440&rft.epage=2449&rft_id=info:doi/10.1109%2FTPS.2025.3583419&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TPS_2025_3583419 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0093-3813&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0093-3813&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0093-3813&client=summon |