The ontological politics of synthetic data: Normalities, outliers, and intersectional hallucinations

Saved in:
Bibliographic Details
Title: The ontological politics of synthetic data: Normalities, outliers, and intersectional hallucinations
Authors: Lee, Francis, 1974, Hajisharif, Saghi, Johnson, Ericka
Source: Big Data and Society. 12(2)
Subject Terms: intersectionality, data ethics, classification, data bias, Synthetic structured data, ontological politics
Description: Synthetic data is increasingly used as a substitute for real data due to ethical, legal, and logistical reasons. However, the rise of synthetic data also raises critical questions about its entanglement with the politics of classification and the reproduction of social norms and categories. This paper aims to problematize the use of synthetic data by examining how its production is intertwined with the maintenance of certain worldviews and classifications. We argue that synthetic data, like real data, is embedded with societal biases and power structures, leading to the reproduction of existing social inequalities. Through empirical examples, we demonstrate how synthetic data tends to highlight majority elements as the “normal” and minimize minority elements, and that the slight changes to the data structures that create synthetic data will also inevitably result in what we term “intersectional hallucinations.” These hallucinations are inherent to synthetic data and cannot be entirely eliminated without compromising the purpose of creating synthetic datasets. We contend that decisions about synthetic data involve determining which intersections are essential and which can be disregarded, a practice which will imbue these decisions with norms and values. Our study underscores the need for critical engagement with the mathematical and statistical choices in synthetic data production and advocates for careful consideration of the ontological and political implications of these choices during curatorial style production of synthetic structured data.
File Description: electronic
Access URL: https://research.chalmers.se/publication/546035
https://research.chalmers.se/publication/546035/file/546035_Fulltext.pdf
Database: SwePub
FullText Text:
  Availability: 0
CustomLinks:
  – Url: https://research.chalmers.se/publication/546035#
    Name: EDS - SwePub (s4221598)
    Category: fullText
    Text: View record in SwePub
  – Url: https://resolver.ebscohost.com/openurl?sid=EBSCO:edsswe&genre=article&issn=20539517&ISBN=&volume=12&issue=2&date=20250101&spage=&pages=&title=Big Data and Society&atitle=The%20ontological%20politics%20of%20synthetic%20data%3A%20Normalities%2C%20outliers%2C%20and%20intersectional%20hallucinations&aulast=Lee%2C%20Francis&id=DOI:10.1177/20539517251318289
    Name: Full Text Finder
    Category: fullText
    Text: Full Text Finder
    Icon: https://imageserver.ebscohost.com/branding/images/FTF.gif
    MouseOverText: Full Text Finder
  – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Lee%20F
    Name: ISI
    Category: fullText
    Text: Nájsť tento článok vo Web of Science
    Icon: https://imagesrvr.epnet.com/ls/20docs.gif
    MouseOverText: Nájsť tento článok vo Web of Science
Header DbId: edsswe
DbLabel: SwePub
An: edsswe.oai.research.chalmers.se.04caa1ee.3e55.4725.ac5d.fa2e70105971
RelevancyScore: 1065
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 1064.736328125
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: The ontological politics of synthetic data: Normalities, outliers, and intersectional hallucinations
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Lee%2C+Francis%22">Lee, Francis</searchLink>, 1974<br /><searchLink fieldCode="AR" term="%22Hajisharif%2C+Saghi%22">Hajisharif, Saghi</searchLink><br /><searchLink fieldCode="AR" term="%22Johnson%2C+Ericka%22">Johnson, Ericka</searchLink>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <i>Big Data and Society</i>. 12(2)
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22intersectionality%22">intersectionality</searchLink><br /><searchLink fieldCode="DE" term="%22data+ethics%22">data ethics</searchLink><br /><searchLink fieldCode="DE" term="%22classification%22">classification</searchLink><br /><searchLink fieldCode="DE" term="%22data+bias%22">data bias</searchLink><br /><searchLink fieldCode="DE" term="%22Synthetic+structured+data%22">Synthetic structured data</searchLink><br /><searchLink fieldCode="DE" term="%22ontological+politics%22">ontological politics</searchLink>
– Name: Abstract
  Label: Description
  Group: Ab
  Data: Synthetic data is increasingly used as a substitute for real data due to ethical, legal, and logistical reasons. However, the rise of synthetic data also raises critical questions about its entanglement with the politics of classification and the reproduction of social norms and categories. This paper aims to problematize the use of synthetic data by examining how its production is intertwined with the maintenance of certain worldviews and classifications. We argue that synthetic data, like real data, is embedded with societal biases and power structures, leading to the reproduction of existing social inequalities. Through empirical examples, we demonstrate how synthetic data tends to highlight majority elements as the “normal” and minimize minority elements, and that the slight changes to the data structures that create synthetic data will also inevitably result in what we term “intersectional hallucinations.” These hallucinations are inherent to synthetic data and cannot be entirely eliminated without compromising the purpose of creating synthetic datasets. We contend that decisions about synthetic data involve determining which intersections are essential and which can be disregarded, a practice which will imbue these decisions with norms and values. Our study underscores the need for critical engagement with the mathematical and statistical choices in synthetic data production and advocates for careful consideration of the ontological and political implications of these choices during curatorial style production of synthetic structured data.
– Name: Format
  Label: File Description
  Group: SrcInfo
  Data: electronic
– Name: URL
  Label: Access URL
  Group: URL
  Data: <link linkTarget="URL" linkTerm="https://research.chalmers.se/publication/546035" linkWindow="_blank">https://research.chalmers.se/publication/546035</link><br /><link linkTarget="URL" linkTerm="https://research.chalmers.se/publication/546035/file/546035_Fulltext.pdf" linkWindow="_blank">https://research.chalmers.se/publication/546035/file/546035_Fulltext.pdf</link>
PLink https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsswe&AN=edsswe.oai.research.chalmers.se.04caa1ee.3e55.4725.ac5d.fa2e70105971
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1177/20539517251318289
    Languages:
      – Text: English
    Subjects:
      – SubjectFull: intersectionality
        Type: general
      – SubjectFull: data ethics
        Type: general
      – SubjectFull: classification
        Type: general
      – SubjectFull: data bias
        Type: general
      – SubjectFull: Synthetic structured data
        Type: general
      – SubjectFull: ontological politics
        Type: general
    Titles:
      – TitleFull: The ontological politics of synthetic data: Normalities, outliers, and intersectional hallucinations
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Lee, Francis
      – PersonEntity:
          Name:
            NameFull: Hajisharif, Saghi
      – PersonEntity:
          Name:
            NameFull: Johnson, Ericka
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 01
              Type: published
              Y: 2025
          Identifiers:
            – Type: issn-print
              Value: 20539517
            – Type: issn-locals
              Value: SWEPUB_FREE
            – Type: issn-locals
              Value: CTH_SWEPUB
          Numbering:
            – Type: volume
              Value: 12
            – Type: issue
              Value: 2
          Titles:
            – TitleFull: Big Data and Society
              Type: main
ResultId 1