Parallel clustering for visualizing large scientific line data

Scientists often need to extract, visualize and analyze lines from vast amounts of data to understand dynamic structures and interactions. The effectiveness of such a visual validation and analysis process mainly relies on a good strategy to categorize and visualize the lines. However, the sheer siz...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2011 IEEE Symposium on Large Data Analysis and Visualization s. 47 - 55
Hlavní autoři: Jishang Wei, Hongfeng Yu, Chen, J. H., Kwan-Liu Ma
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.10.2011
Témata:
ISBN:9781467301565, 1467301566
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Scientists often need to extract, visualize and analyze lines from vast amounts of data to understand dynamic structures and interactions. The effectiveness of such a visual validation and analysis process mainly relies on a good strategy to categorize and visualize the lines. However, the sheer size of line data produced by state-of-the-art scientific simulations poses great challenges to preparing the data for visualization. In this paper, we present a parallelization design of regression model-based clustering to categorize large line data derived from detailed scientific simulations by leveraging the power of heterogeneous computers. This parallel clustering method employs the Expectation Maximization algorithm to iteratively approximate the optimal data partitioning. First, we use a sorted-balance algorithm to partition and distribute the lines with various lengths among multiple compute nodes. During the following iterative clustering process, regression model parameters are recovered based on the local lines on each individual node, with only a few inter-node message exchanges involved. Meanwhile, the workload of regression model computing is well balanced across the nodes. The experimental results demonstrate that our approach can effectively categorize large line data in a scalable manner to concisely convey dynamic structures and interactions, leading to a visualization that captures salient features and suppresses visual clutter to facilitate scientific exploration of large line data.
AbstractList Scientists often need to extract, visualize and analyze lines from vast amounts of data to understand dynamic structures and interactions. The effectiveness of such a visual validation and analysis process mainly relies on a good strategy to categorize and visualize the lines. However, the sheer size of line data produced by state-of-the-art scientific simulations poses great challenges to preparing the data for visualization. In this paper, we present a parallelization design of regression model-based clustering to categorize large line data derived from detailed scientific simulations by leveraging the power of heterogeneous computers. This parallel clustering method employs the Expectation Maximization algorithm to iteratively approximate the optimal data partitioning. First, we use a sorted-balance algorithm to partition and distribute the lines with various lengths among multiple compute nodes. During the following iterative clustering process, regression model parameters are recovered based on the local lines on each individual node, with only a few inter-node message exchanges involved. Meanwhile, the workload of regression model computing is well balanced across the nodes. The experimental results demonstrate that our approach can effectively categorize large line data in a scalable manner to concisely convey dynamic structures and interactions, leading to a visualization that captures salient features and suppresses visual clutter to facilitate scientific exploration of large line data.
Author Jishang Wei
Hongfeng Yu
Kwan-Liu Ma
Chen, J. H.
Author_xml – sequence: 1
  surname: Jishang Wei
  fullname: Jishang Wei
  email: jswei@ucdavis.edu
  organization: Univ. of California Davis, Davis, CA, USA
– sequence: 2
  surname: Hongfeng Yu
  fullname: Hongfeng Yu
  email: hyu@sandia.gov
  organization: Sandia Nat. Labs., Albuquerque, NM, USA
– sequence: 3
  givenname: J. H.
  surname: Chen
  fullname: Chen, J. H.
  email: jhchen@sandia.gov
  organization: Sandia Nat. Labs., Albuquerque, NM, USA
– sequence: 4
  surname: Kwan-Liu Ma
  fullname: Kwan-Liu Ma
  email: ma@cs.ucdavis.edu
  organization: Univ. of California Davis, Davis, CA, USA
BookMark eNo1j89KAzEYxCMqaOs-gHjJC-yaL9lkNxeh1H-FhXooXkuS_VIicSvJVtCnd8U6l-E3h2FmRs6G_YCEXAOrAJi-7e4XrxVnAJVimgtQJ2QGtWoEAynbU1Lopv1nJS9IkfMbm6SUblt1Se5eTDIxYqQuHvKIKQw76veJfoZ8MDF8_3I0aYc0u4DDGHxwNIYBaW9Gc0XOvYkZi6PPyebxYbN8Lrv102q56Mqg2Vj2qKz0jWPSWm-cR9kDd2DQuykWAkwvGqdB18qCZSh5LTTnyqOdVlst5uTmrzYg4vYjhXeTvrbHx-IHTc9Mbg
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/LDAV.2011.6092316
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1467301558
9781467301558
EndPage 55
ExternalDocumentID 6092316
Genre orig-research
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
AAWTH
ADFMO
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
IEGSK
IERZE
OCL
RIE
RIL
ID FETCH-LOGICAL-i90t-de6b5f7c05bbfacfe5d12c1aefc5f7331ad37c91946b1b0e52439226feb156b93
IEDL.DBID RIE
ISBN 9781467301565
1467301566
IngestDate Wed Aug 27 03:10:02 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i90t-de6b5f7c05bbfacfe5d12c1aefc5f7331ad37c91946b1b0e52439226feb156b93
PageCount 9
ParticipantIDs ieee_primary_6092316
PublicationCentury 2000
PublicationDate 2011-Oct.
PublicationDateYYYYMMDD 2011-10-01
PublicationDate_xml – month: 10
  year: 2011
  text: 2011-Oct.
PublicationDecade 2010
PublicationTitle 2011 IEEE Symposium on Large Data Analysis and Visualization
PublicationTitleAbbrev LDAV
PublicationYear 2011
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000669886
Score 1.5191022
Snippet Scientists often need to extract, visualize and analyze lines from vast amounts of data to understand dynamic structures and interactions. The effectiveness of...
SourceID ieee
SourceType Publisher
StartPage 47
SubjectTerms Clustering algorithms
Computational modeling
Data models
Data visualization
Graphics processing unit
Mathematical model
Vectors
Title Parallel clustering for visualizing large scientific line data
URI https://ieeexplore.ieee.org/document/6092316
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB7a4sGTSiu-ycGjsZvN5nURRC0eSumhlN7KJjsLhdJKX4f-epPstiJ48ZZJICQkZL4k3zcD8OjvHA654VRL7WimbUKt98PUMufHnec6i8lgxn01GOjJxAwb8HTUwiBiJJ_hcyjGv_xi6bbhqawrkwBHZBOaSslKq3V8T_Gu02gto3ZLhm3rccohpFNti_pXkyWm239_HVcBPOtOf2VXic6ld_a_YZ1D50elR4ZH_3MBDVy04WWYr0J6lDlx820IguCbiAemZDdbB_3kPtjzQP8mlRYyUIVIwJokkEU7MOp9jN4-aZ0jgc5MsqEFSitK5RJhbZm7EkXBUsdyLJ2v5pzlBVfOMJNJy2yCIvUAxCOu0h_RQlrDL6G1WC7wCogqLRruUmUEZlHAqg1Kwwt_40Fdmmtoh6lPv6ooGNN61jd_V9_CaXpgy7E7aG1WW7yHE7fbzNarh7h035fKlrI
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEB1qFfSk0orf5uDR2M1mk00ugqil4lp6KKW3ssnOQqG0pV8Hf73J7rYiePGWSSAkJGRekvdmAO7dncMi15wqqSyNlAmocX6YGmbduNNURUUymEESd7tqONS9GjzstDCIWJDP8NEXi7_8bGbX_qmsJQMPR-Qe7IsoCoNSrbV7UXHOUyslC_WW9BvXIZVtUKfKFtW_Jgt0K3l9HpQhPKtuf-VXKdxL-_h_AzuB5o9Oj_R2HugUajhtwFMvXfgEKRNiJ2sfBsE1EQdNyWa89ArKL29PPAGclGpITxYiHm0STxdtQr_91n_p0CpLAh3rYEUzlEbksQ2EMXlqcxQZCy1LMbeumnOWZjy2mulIGmYCFKGDIA5z5e6QFtJofgb16WyK50Di3KDmNoy1wKiQsCqNUvPM3XlQ5foCGn7qo3kZB2NUzfry7-o7OOz0P5NR8t79uIKjcMudY9dQXy3WeAMHdrMaLxe3xTJ-A6QTmfk
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2011+IEEE+Symposium+on+Large+Data+Analysis+and+Visualization&rft.atitle=Parallel+clustering+for+visualizing+large+scientific+line+data&rft.au=Jishang+Wei&rft.au=Hongfeng+Yu&rft.au=Chen%2C+J.+H.&rft.au=Kwan-Liu+Ma&rft.date=2011-10-01&rft.pub=IEEE&rft.isbn=9781467301565&rft.spage=47&rft.epage=55&rft_id=info:doi/10.1109%2FLDAV.2011.6092316&rft.externalDocID=6092316
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467301565/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467301565/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467301565/sc.gif&client=summon&freeimage=true