An Open Data Service for Supporting Research in Machine Learning on Tokamak Data

The increasing complexity and volume of plasma fusion experimental data, coupled with the growing adoption of machine learning in fusion research, necessitate advanced and efficient data management solutions. We propose an open data service for fusion experiments operated by the UKAEA, designed to a...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:IEEE transactions on plasma science Ročník 53; číslo 9; s. 2440 - 2449
Hlavní autori: Jackson, Samuel, Khan, Saiful, Cummings, Nathan, Hodson, James, de Witt, Shaun, Pamela, Stanislas, Akers, Rob, Thiyagalingam, Jeyan
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: IEEE 01.09.2025
Predmet:
ISSN:0093-3813, 1939-9375
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract The increasing complexity and volume of plasma fusion experimental data, coupled with the growing adoption of machine learning in fusion research, necessitate advanced and efficient data management solutions. We propose an open data service for fusion experiments operated by the UKAEA, designed to address the evolving needs of machine-learning-driven fusion research. Our system provides a framework to organize MAST, MAST upgrade (MAST-U), and Joint European Torus (JET) experimental data in accordance with findability, accessibility, interoperability, and reuse (FAIR) principles, using distributed object storage for scalability and a relational database for efficient metadata indexing. In addition, it offers simplified abstractions through an application programming interface (API), facilitating seamless data access and integration with data analysis and machine learning workflows. Performance evaluation of metrics such as data load time and throughput, across varying numbers of parallel workers, demonstrates the data pipeline's optimization for efficient machine learning application development. Our solution significantly enhances support for data-driven research and machine learning applications in fusion by laying the groundwork for open, FAIR-compliant fusion data, which enables cross-machine analysis, prompts international collaboration, and potentially accelerates advancements in fusion energy research.
AbstractList The increasing complexity and volume of plasma fusion experimental data, coupled with the growing adoption of machine learning in fusion research, necessitate advanced and efficient data management solutions. We propose an open data service for fusion experiments operated by the UKAEA, designed to address the evolving needs of machine-learning-driven fusion research. Our system provides a framework to organize MAST, MAST upgrade (MAST-U), and Joint European Torus (JET) experimental data in accordance with findability, accessibility, interoperability, and reuse (FAIR) principles, using distributed object storage for scalability and a relational database for efficient metadata indexing. In addition, it offers simplified abstractions through an application programming interface (API), facilitating seamless data access and integration with data analysis and machine learning workflows. Performance evaluation of metrics such as data load time and throughput, across varying numbers of parallel workers, demonstrates the data pipeline's optimization for efficient machine learning application development. Our solution significantly enhances support for data-driven research and machine learning applications in fusion by laying the groundwork for open, FAIR-compliant fusion data, which enables cross-machine analysis, prompts international collaboration, and potentially accelerates advancements in fusion energy research.
Author Cummings, Nathan
Jackson, Samuel
Akers, Rob
Hodson, James
Thiyagalingam, Jeyan
Khan, Saiful
Pamela, Stanislas
de Witt, Shaun
Author_xml – sequence: 1
  givenname: Samuel
  orcidid: 0000-0001-5301-5095
  surname: Jackson
  fullname: Jackson, Samuel
  email: samuel.jackson@ukaea.uk
  organization: U.K. Atomic Energy Authority (UKAEA), Culham Centre for Fusion Energy, Culham Science Centre, Abingdon, U.K
– sequence: 2
  givenname: Saiful
  orcidid: 0000-0002-6796-5670
  surname: Khan
  fullname: Khan, Saiful
  email: saiful.khan@stfc.ac.uk
  organization: Scientific Computing Department, Science and Technology Facilities Council, Rutherford Appleton Laboratory, Harwell Campus, Didcot, U.K
– sequence: 3
  givenname: Nathan
  orcidid: 0000-0003-4359-6337
  surname: Cummings
  fullname: Cummings, Nathan
  email: nathan.cummings@ukaea.uk
  organization: U.K. Atomic Energy Authority (UKAEA), Culham Centre for Fusion Energy, Culham Science Centre, Abingdon, U.K
– sequence: 4
  givenname: James
  orcidid: 0009-0002-4797-3419
  surname: Hodson
  fullname: Hodson, James
  email: james.hodson@ukaea.uk
  organization: U.K. Atomic Energy Authority (UKAEA), Culham Centre for Fusion Energy, Culham Science Centre, Abingdon, U.K
– sequence: 5
  givenname: Shaun
  orcidid: 0000-0003-4196-3658
  surname: de Witt
  fullname: de Witt, Shaun
  email: shaun.de-witt@ukaea.uk
  organization: U.K. Atomic Energy Authority (UKAEA), Culham Centre for Fusion Energy, Culham Science Centre, Abingdon, U.K
– sequence: 6
  givenname: Stanislas
  orcidid: 0000-0001-8854-1749
  surname: Pamela
  fullname: Pamela, Stanislas
  email: Stanislas.Pamela@ukaea.uk
  organization: U.K. Atomic Energy Authority (UKAEA), Culham Centre for Fusion Energy, Culham Science Centre, Abingdon, U.K
– sequence: 7
  givenname: Rob
  surname: Akers
  fullname: Akers, Rob
  email: rob.akers@ukaea.uk
  organization: U.K. Atomic Energy Authority (UKAEA), Culham Centre for Fusion Energy, Culham Science Centre, Abingdon, U.K
– sequence: 8
  givenname: Jeyan
  orcidid: 0000-0002-2167-1343
  surname: Thiyagalingam
  fullname: Thiyagalingam, Jeyan
  email: t.jeyan@stfc.ac.uk
  organization: Scientific Computing Department, Science and Technology Facilities Council, Rutherford Appleton Laboratory, Harwell Campus, Didcot, U.K
BookMark eNpFkMtOwkAUhicGEwHdu3AxL1A8Z6bTdpYErwkGIrhuDtNTqci0mVYT394iJK7-5L8tvpEY-NqzENcIE0Swt-vlaqJAmYk2mY7RnokhWm0jq1MzEEMAqyOdob4Qo7b9AMDYgBqK5dTLRcNe3lFHcsXhu3IsyzrI1VfT1KGr_Lt85ZYpuK2svHwht608y3nv-ENYe7mud7Sn3d_HpTgv6bPlq5OOxdvD_Xr2FM0Xj8-z6TxyKsEuiklhUW6My5TVWWYImJxG45I4YUiKsrSJjZkV0IaUKWijUuVSU2hDxA71WMDx14W6bQOXeROqPYWfHCE_EMl7IvmBSH4i0k9ujpOKmf_riCqzYPQvQlVenQ
CODEN ITPSBD
Cites_doi 10.1109/TPS.2023.3268170
10.1038/sdata.2016.18
10.1088/1361-6587/ac8618
10.14778/3025111.3025117
10.1016/j.fusengdes.2018.02.003
10.1038/s41597-020-00771-0
10.1080/17538947.2014.1003106
10.1088/1361-6587/acc60f
10.1007/s42488-022-00068-4
10.1088/0029-5515/41/10/310
10.1109/TPS.2022.3223732
10.1007/978-3-642-22351-8_1
10.1016/j.proeng.2016.07.449
10.5334/jors.148
10.1007/978-3-642-33299-9_1
10.1088/0029-5515/45/10/s13
10.1016/j.fusengdes.2007.03.029
10.1088/1741-4326/ab121c
10.1109/ACCESS.2023.3245043
10.1109/TSC.2022.3164146
10.1088/1741-4326/ad346e
10.1016/j.softx.2024.101869
10.1088/0029-5515/49/10/104017
ContentType Journal Article
DBID 97E
ESBDL
RIA
RIE
AAYXX
CITATION
DOI 10.1109/TPS.2025.3583419
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE Xplore Open Access Journals
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE/IET Electronic Library
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
Physics
EISSN 1939-9375
EndPage 2449
ExternalDocumentID 10_1109_TPS_2025_3583419
11128905
Genre orig-research
GrantInformation_xml – fundername: UK Atomic Energy Authority; UKAEA
  funderid: 10.13039/100008516
GroupedDBID -~X
.DC
0R~
29I
4.4
53G
5GY
5VS
6IK
97E
AAJGR
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACGOD
ACIWK
ACNCT
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
ESBDL
HZ~
H~9
IAAWW
IBMZZ
ICLAB
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNS
TAE
TN5
TWZ
VH1
AAYXX
CITATION
ID FETCH-LOGICAL-c261t-4a21dfb5c8293885a0eac315c646e06dff9694ee20aba25dab272c75d35aaec13
IEDL.DBID RIE
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001556121000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0093-3813
IngestDate Sat Nov 29 07:27:18 EST 2025
Wed Oct 01 07:05:15 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 9
Language English
License https://creativecommons.org/licenses/by/4.0/legalcode
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c261t-4a21dfb5c8293885a0eac315c646e06dff9694ee20aba25dab272c75d35aaec13
ORCID 0009-0002-4797-3419
0000-0001-8854-1749
0000-0002-2167-1343
0000-0003-4196-3658
0000-0002-6796-5670
0000-0003-4359-6337
0000-0001-5301-5095
OpenAccessLink https://ieeexplore.ieee.org/document/11128905
PageCount 10
ParticipantIDs crossref_primary_10_1109_TPS_2025_3583419
ieee_primary_11128905
PublicationCentury 2000
PublicationDate 2025-09-01
PublicationDateYYYYMMDD 2025-09-01
PublicationDate_xml – month: 09
  year: 2025
  text: 2025-09-01
  day: 01
PublicationDecade 2020
PublicationTitle IEEE transactions on plasma science
PublicationTitleAbbrev TPS
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
References ref13
Ciattaglia (ref2)
ref14
(ref23) 2025
ref52
ref11
ref10
Pinches (ref24) 2025
ref19
(ref47) 2025
(ref37) 2025
Holtkamp (ref1) 2007; 82
(ref27) 2024
ref45
Bommasani (ref7) 2021
ref44
ref43
Li (ref6) 2023
ref8
(ref22) 2019
(ref36) 2025
ref9
ref4
(ref21) 2023
ref3
(ref41) 2025
ref5
(ref16) 2025
ref30
(ref48) 2025
ref33
ref32
(ref51) 2025
(ref28) 2024
(ref38) 2025
(ref25) 2025
(ref35) 2025
ref39
Wan (ref17) 2022; 50
Sammuli (ref15) 2018; 129
(ref42) 2025
Bednar (ref12)
(ref49) 2025
ref26
(ref29) 2025
(ref34) 2025
(ref40) 2025
Paszke (ref50) 2019
(ref20) 2020
(ref46) 2025
(ref18) 2024
Liu (ref31) 2016; 154
References_xml – volume-title: 25 Years of Massive Fusion Energy Experiment Data Open on the ’cloud’ and Available To Everyone
  year: 2024
  ident: ref18
– volume-title: DataCite
  year: 2019
  ident: ref22
– ident: ref4
  doi: 10.1109/TPS.2023.3268170
– ident: ref13
  doi: 10.1038/sdata.2016.18
– volume-title: Dublin Core
  year: 2020
  ident: ref20
– ident: ref26
  doi: 10.1088/1361-6587/ac8618
– ident: ref32
  doi: 10.14778/3025111.3025117
– volume-title: Kerchunk - Kerchunk Documentation
  year: 2025
  ident: ref40
– volume-title: Ceph
  year: 2025
  ident: ref48
– volume: 129
  start-page: 12
  year: 2018
  ident: ref15
  article-title: TokSearch: A search engine for fusion experimental data
  publication-title: Fusion Eng. Design
  doi: 10.1016/j.fusengdes.2018.02.003
– ident: ref19
  doi: 10.1038/s41597-020-00771-0
– volume-title: DCAT
  year: 2023
  ident: ref21
– ident: ref33
  doi: 10.1080/17538947.2014.1003106
– volume-title: Pint: Makes Units Easy - Pint Documentation
  year: 2025
  ident: ref46
– volume-title: HDF Format
  year: 2025
  ident: ref34
– volume-title: IMAS Data Dictionary Documentation
  year: 2025
  ident: ref23
– year: 2019
  ident: ref50
  article-title: PyTorch: An imperative style, high-performance deep learning library
  publication-title: arXiv:1912.01703
– volume-title: Ukaea/fair-mast
  year: 2024
  ident: ref28
– volume-title: Rasdaman
  year: 2025
  ident: ref29
– volume-title: Ukaea/UDA
  year: 2024
  ident: ref27
– ident: ref5
  doi: 10.1088/1361-6587/acc60f
– volume-title: FAIR Principles
  year: 2025
  ident: ref25
– ident: ref45
  doi: 10.1007/s42488-022-00068-4
– start-page: 1
  volume-title: Proc. IEEE Int. Conf. Environ. Electr. Eng. IEEE Ind. Commercial Power Syst. Eur.
  ident: ref2
  article-title: The European DEMO fusion reactor: Design status and challenges from balance of plant point of view
– start-page: 85
  volume-title: Proc. Python Sci. Conf.
  ident: ref12
  article-title: The pandata scalable open-source analysis stack
– ident: ref8
  doi: 10.1088/0029-5515/41/10/310
– volume: 50
  start-page: 4980
  issue: 12
  year: 2022
  ident: ref17
  article-title: A robust and fast data management system for machine-learning research of tokamaks
  publication-title: IEEE Trans. Plasma Sci.
  doi: 10.1109/TPS.2022.3223732
– ident: ref30
  doi: 10.1007/978-3-642-22351-8_1
– volume: 154
  start-page: 207
  year: 2016
  ident: ref31
  article-title: Managing large multidimensional array hydrologic datasets: A case study comparing NetCDF and SciDB
  publication-title: Proc. Eng.
  doi: 10.1016/j.proeng.2016.07.449
– volume-title: GraphQL
  year: 2025
  ident: ref42
– volume-title: Introduction To the Integrated Modelling & Analysis Suite (IMAS)
  year: 2025
  ident: ref24
– year: 2021
  ident: ref7
  article-title: On the opportunities and risks of foundation models
  publication-title: arXiv:2108.07258
– ident: ref39
  doi: 10.5334/jors.148
– year: 2023
  ident: ref6
  article-title: Multimodal foundation models: From specialists to general-purpose assistants
  publication-title: arXiv:2309.10020
– volume-title: Tiledb Slicing Benchmarks
  year: 2025
  ident: ref38
– volume-title: Cambridge Service for Data Driven Discovery (CSD3)
  year: 2025
  ident: ref51
– ident: ref3
  doi: 10.1007/978-3-642-33299-9_1
– volume-title: PostgreSQL JSON Types
  year: 2025
  ident: ref41
– volume-title: Dask
  year: 2025
  ident: ref49
– volume-title: Zarr File Format
  year: 2025
  ident: ref36
– ident: ref9
  doi: 10.1088/0029-5515/45/10/s13
– volume-title: Amazon Sustainability Data Initiative
  year: 2025
  ident: ref37
– volume: 82
  start-page: 427
  issue: 5
  year: 2007
  ident: ref1
  article-title: An overview of the ITER project
  publication-title: Fusion Eng. Design
  doi: 10.1016/j.fusengdes.2007.03.029
– ident: ref11
  doi: 10.1088/1741-4326/ab121c
– ident: ref44
  doi: 10.1109/ACCESS.2023.3245043
– volume-title: NetCDF Format
  year: 2025
  ident: ref35
– volume-title: TokSearch
  year: 2025
  ident: ref16
– ident: ref43
  doi: 10.1109/TSC.2022.3164146
– ident: ref52
  doi: 10.1088/1741-4326/ad346e
– ident: ref14
  doi: 10.1016/j.softx.2024.101869
– volume-title: MAST Catalog
  year: 2025
  ident: ref47
– ident: ref10
  doi: 10.1088/0029-5515/49/10/104017
SSID ssj0014502
Score 2.4469361
Snippet The increasing complexity and volume of plasma fusion experimental data, coupled with the growing adoption of machine learning in fusion research, necessitate...
SourceID crossref
ieee
SourceType Index Database
Publisher
StartPage 2440
SubjectTerms Application programming interfaces
Collaboration
Computational modeling
Data analysis
Fusion data
Interoperability
Machine learning
Metadata
Open data
scientific data management
Tokamak devices
Transforms
web service application programming interface (API)
Title An Open Data Service for Supporting Research in Machine Learning on Tokamak Data
URI https://ieeexplore.ieee.org/document/11128905
Volume 53
WOSCitedRecordID wos001556121000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE/IET Electronic Library
  customDbUrl:
  eissn: 1939-9375
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014502
  issn: 0093-3813
  databaseCode: RIE
  dateStart: 19730101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELagAomFQimivOSBhSGtm8SvsQIqBqgqUaRu0cWxq6oiQX3w-7Edl8fAwBZZSWTd2b7vfN_dIXTDJSRWszpyCXpRGhcsEiqFiFn4zARXkCvfteSJj0ZiOpXjkKzuc2G01p58prvu0cfyi0pt3FVZz-5LFxeju2iXc14na32FDFJK6tLgMomsGUq2MUkie5Pxi_UEY9pNqHD1y37ZoB9NVbxNGTb_OZsjdBjAIx7U2j5GO7psoWYAkjhs01UL7Xtep1qdoPGgxI4zgu9hDTgcDNgCVezaeVauhMAMb9l3eF7iZ0-u1DjUXZ3hqsSTagFvsPD_aKPX4cPk7jEKTRQiZZ2jdZRC3C9MTpWwhl0ICsQetUmfKpYyTVhhjGQy1TomkENMC8hjHitOi4QCaNVPTlGjrEp9hrC70jFMWEwpIE1MDoYQA0wKDdowLjvodivW7L2ulZF5H4PIzKogcyrIggo6qO0k-v1eEOb5H-MX6MB9XrO7LlFjvdzoK7SnPtbz1fLar4RPfpawBw
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8MwDLZggODCc4jxzIELh0LXJmlyRMAEYkyTGBK3yk3TaZpo0R78fpI043HgwC2qoqiyk9iOP38GOE8kxkazOrAFegGNch4IRTHgxn3mIlGYKde1pJv0euL1VfZ9sbqrhdFaO_CZvrRDl8vPKzW3T2VX5lzavBhbhhVGadSuy7W-kgaUhTU5uIwDY4jiRVYylFeD_rOJBSN2GTNhGcx-WaEfbVWcVels_fN_tmHTu4_kutb3Dizpche2vCtJ_EGd7sKaQ3aq6R70r0tiUSPkFmdI_NVAjKtKbEPPypIIDMkCf0dGJXly8EpNPPPqkFQlGVRjfMOxW6MJL527wc194NsoBMqER7OAYtTOi4wpYUy7EAxDc9nGbaY45TrkeVFILqnWUYgZRizHLEoilbA8ZohateN9aJRVqQ-A2EedggvjVQqkcZFhEYYFcik06oInsgUXC7Gm7zVbRuqijFCmRgWpVUHqVdCCppXo9zwvzMM_vp_B-v3gqZt2H3qPR7Bhl6qxXsfQmE3m-gRW1cdsNJ2cul3xCdZLs04
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+Open+Data+Service+for+Supporting+Research+in+Machine+Learning+on+Tokamak+Data&rft.jtitle=IEEE+transactions+on+plasma+science&rft.au=Jackson%2C+Samuel&rft.au=Khan%2C+Saiful&rft.au=Cummings%2C+Nathan&rft.au=Hodson%2C+James&rft.date=2025-09-01&rft.issn=0093-3813&rft.eissn=1939-9375&rft.volume=53&rft.issue=9&rft.spage=2440&rft.epage=2449&rft_id=info:doi/10.1109%2FTPS.2025.3583419&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TPS_2025_3583419
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0093-3813&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0093-3813&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0093-3813&client=summon