Scalable Strategies for Computing with Massive Data

This paper presents two complementary statistical computing frameworks that address challenges in parallel processing and the analysis of massive data. First, the foreach package allows users of the R programming environment to define parallel loops that may be run sequentially on a single machine,...

Full description

Saved in:
Bibliographic Details
Published in:Journal of statistical software Vol. 55; no. 14; pp. 1 - 19
Main Authors: Kane, Michael J., Emerson, John, Weston, Stephen
Format: Journal Article
Language:English
Published: Foundation for Open Access Statistics 01.11.2013
ISSN:1548-7660, 1548-7660
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract This paper presents two complementary statistical computing frameworks that address challenges in parallel processing and the analysis of massive data. First, the foreach package allows users of the R programming environment to define parallel loops that may be run sequentially on a single machine, in parallel on a symmetric multiprocessing (SMP) machine, or in cluster environments without platform-specific code. Second, the bigmemory package implements memory- and file-mapped data structures that provide (a) access to arbitrarily large data while retaining a look and feel that is familiar to R users and (b) data structures that are shared across processor cores in order to support efficient parallel computing techniques. Although these packages may be used independently, this paper shows how they can be used in combination to address challenges that have effectively been beyond the reach of researchers who lack specialized software development skills or expensive hardware.
AbstractList This paper presents two complementary statistical computing frameworks that address challenges in parallel processing and the analysis of massive data. First, the foreach package allows users of the R programming environment to define parallel loops that may be run sequentially on a single machine, in parallel on a symmetric multiprocessing (SMP) machine, or in cluster environments without platform-specific code. Second, the bigmemory package implements memory- and file-mapped data structures that provide (a) access to arbitrarily large data while retaining a look and feel that is familiar to R users and (b) data structures that are shared across processor cores in order to support efficient parallel computing techniques. Although these packages may be used independently, this paper shows how they can be used in combination to address challenges that have effectively been beyond the reach of researchers who lack specialized software development skills or expensive hardware.
Author Emerson, John
Kane, Michael J.
Weston, Stephen
Author_xml – sequence: 1
  givenname: Michael J.
  surname: Kane
  fullname: Kane, Michael J.
– sequence: 2
  givenname: John
  surname: Emerson
  fullname: Emerson, John
– sequence: 3
  givenname: Stephen
  surname: Weston
  fullname: Weston, Stephen
BookMark eNp1kMtOwzAQRS1UJNrCkn1-IGUcv5IlKq9KRSwKa2viOMVVGle2KeLvCS1CCInVXI3uPYszIaPe95aQSwozWkqmrjYxzvYgxMxRfkLGVPAyV1LC6Fc-I5MYNwAF8EqMCVsZ7LDubLZKAZNdOxuz1ods7re7t-T6dfbu0mv2iDG6vc1uMOE5OW2xi_bi-07Jy93t8_whXz7dL-bXy9wwXqTcGNGyuuKUKWNRmVIYRCqVUQ22DfISeVMyMDUVVErKBLbCAGNAJYVSFWxKFkdu43Gjd8FtMXxoj04fHj6sNYbkTGc1byUrLCguRcUl2AorYLSQQg0kK2Fg5UeWCT7GYNsfHgV9sKcHe_rLnh7sDX32p29cwuR8P2hy3T-rT4wQdOM
CitedBy_id crossref_primary_10_1177_0081175018796871
crossref_primary_10_5194_hess_23_2939_2019
crossref_primary_10_1038_ncomms12083
crossref_primary_10_1007_s11004_019_09791_y
crossref_primary_10_1007_s00180_019_00950_7
crossref_primary_10_3390_app12136670
crossref_primary_10_1002_pld3_53
crossref_primary_10_1016_j_ecolmodel_2017_12_010
crossref_primary_10_1016_j_jhydrol_2024_132502
crossref_primary_10_7554_eLife_43966
crossref_primary_10_1093_nargab_lqaf084
crossref_primary_10_3390_rs15184450
crossref_primary_10_1016_j_pocean_2019_02_006
crossref_primary_10_3390_fermentation9070672
crossref_primary_10_7717_peerj_cs_175
crossref_primary_10_1016_j_bdr_2017_07_003
crossref_primary_10_1093_molbev_msae098
crossref_primary_10_1093_nsr_nwaa244
crossref_primary_10_1080_03610918_2023_2300747
crossref_primary_10_1016_j_jmva_2022_105128
crossref_primary_10_1101_gr_277525_122
crossref_primary_10_1002_gepi_22605
crossref_primary_10_1007_s11524_018_0259_1
crossref_primary_10_1016_j_jmoldx_2019_08_006
crossref_primary_10_1109_MCI_2016_2532267
crossref_primary_10_1186_s12859_016_1006_9
crossref_primary_10_1534_g3_119_400018
crossref_primary_10_1016_j_nima_2016_10_006
crossref_primary_10_3835_plantgenome2016_09_0089
crossref_primary_10_3897_BDJ_4_e8357
crossref_primary_10_1016_j_ecosta_2021_11_008
crossref_primary_10_1186_s12863_017_0533_3
crossref_primary_10_1016_j_neuroimage_2014_02_024
crossref_primary_10_6339_24_JDS1132
crossref_primary_10_1038_srep10576
crossref_primary_10_1002_pst_2438
crossref_primary_10_3390_brainsci14040325
crossref_primary_10_1002_sam_11283
crossref_primary_10_1016_j_gpb_2020_10_007
crossref_primary_10_1007_s00606_018_1494_3
ContentType Journal Article
DBID AAYXX
CITATION
DOA
DOI 10.18637/jss.v055.i14
DatabaseName CrossRef
Open Access: DOAJ - Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Mathematics
EISSN 1548-7660
EndPage 19
ExternalDocumentID oai_doaj_org_article_4f632e074659460e9a90312657087e60
10_18637_jss_v055_i14
GroupedDBID 29L
2WC
5GY
5VS
AAFWJ
AAKPC
AAYXX
ACGFO
ACIPV
ADBBV
AENEX
AFPKN
ALMA_UNASSIGNED_HOLDINGS
BCNDV
C1A
CITATION
E3Z
EBS
EJD
F5P
GROUPED_DOAJ
GX1
IPNFZ
KQ8
M~E
OK1
OVT
P2P
RIG
RNS
TR2
XSB
ID FETCH-LOGICAL-c342t-cc5f3b94137cea7c85caa167c7dafda48a4d830cb15166135af5c033016108723
IEDL.DBID DOA
ISICitedReferencesCount 95
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000328131700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1548-7660
IngestDate Fri Oct 03 12:42:42 EDT 2025
Sat Nov 29 04:37:59 EST 2025
Tue Nov 18 21:52:33 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 14
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c342t-cc5f3b94137cea7c85caa167c7dafda48a4d830cb15166135af5c033016108723
OpenAccessLink https://doaj.org/article/4f632e074659460e9a90312657087e60
PageCount 19
ParticipantIDs doaj_primary_oai_doaj_org_article_4f632e074659460e9a90312657087e60
crossref_primary_10_18637_jss_v055_i14
crossref_citationtrail_10_18637_jss_v055_i14
PublicationCentury 2000
PublicationDate 2013-11-01
PublicationDateYYYYMMDD 2013-11-01
PublicationDate_xml – month: 11
  year: 2013
  text: 2013-11-01
  day: 01
PublicationDecade 2010
PublicationTitle Journal of statistical software
PublicationYear 2013
Publisher Foundation for Open Access Statistics
Publisher_xml – name: Foundation for Open Access Statistics
SSID ssj0020495
Score 2.4128184
Snippet This paper presents two complementary statistical computing frameworks that address challenges in parallel processing and the analysis of massive data. First,...
SourceID doaj
crossref
SourceType Open Website
Enrichment Source
Index Database
StartPage 1
Title Scalable Strategies for Computing with Massive Data
URI https://doaj.org/article/4f632e074659460e9a90312657087e60
Volume 55
WOSCitedRecordID wos000328131700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 1548-7660
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0020495
  issn: 1548-7660
  databaseCode: DOA
  dateStart: 19960101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 1548-7660
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0020495
  issn: 1548-7660
  databaseCode: M~E
  dateStart: 19960101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8MwELZQxQAD4inKSx4QE2ldP-ORRysGWiHxUDfLsR2pCBXUhI78ds5JWpUBsbBkiE5W8t0p911kfx9C5zQDUitTn3jNXcK950mWUxhVqHa5hxIJlWnfy70ajdLxWD-sWH3FPWG1PHANXJfnktEQXTGE5pIEbTXUIY07NlIVZDWtE6UXw1QzagHvFY2iZiqZ6r4WRWdOhOhMevxHB1oR6q86ymAbbTVUEF_Vj7CD1sJ0F20OlzqqxR5ijwBhPNyEFyqyocBAM3FtxgBtB8cfqXgIFBg-W_jWlnYfPQ_6Tzd3SWNzkDjGaZk4J3KWaegmygWrXCqctT2pnPI295anlvuUEZdBc4ZuyoTNhSOMRbIGGFB2gFrT92k4RJhIEuXrXVA84yIA9_DMsjzztkeCsKKNLhevblyjAR6tKN5MnAUiUgaQMhEpA0i10cUy_KMWv_gt8DriuAyKmtXVDcikaTJp_srk0X8scow2aDSsqE4LnqBWOfsMp2jdzctJMTurigSuw6_-Nz--v7Y
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Scalable+Strategies+for+Computing+with+Massive+Data&rft.jtitle=Journal+of+statistical+software&rft.au=Michael+Kane&rft.au=John+W.+Emerson&rft.au=Stephen+Weston&rft.date=2013-11-01&rft.pub=Foundation+for+Open+Access+Statistics&rft.eissn=1548-7660&rft.volume=55&rft.issue=1&rft.spage=1&rft.epage=19&rft_id=info:doi/10.18637%2Fjss.v055.i14&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_4f632e074659460e9a90312657087e60
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1548-7660&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1548-7660&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1548-7660&client=summon