Content selection and curation for web archiving: The gatekeepers vs. the masses

Any preservation effort must begin with an assessment of what content to preserve, and web archiving is no different. There have historically been two answers to the question "what should we archive?" The Internet Archive's broad entire-web crawls have been supplemented by narrower do...

Full description

Saved in:
Bibliographic Details
Published in:JCDL '16 : proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries : June 19-23, 2016, Newark, NJ, USA pp. 107 - 110
Main Authors: Milligan, Ian, Ruest, Nick, Lin, Jimmy
Format: Conference Proceeding
Language:English
Published: ACM 01.06.2016
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Any preservation effort must begin with an assessment of what content to preserve, and web archiving is no different. There have historically been two answers to the question "what should we archive?" The Internet Archive's broad entire-web crawls have been supplemented by narrower domain-or topic-specific collections gathered by numerous libraries. We can characterize this as content selection and curation by "gatekeepers". In contrast, we have witnessed the emergence of another approach driven by "the masses" - we can archive pages that are contained in social media streams such as Twitter. The interesting question, of course, is how these approaches differ. We provide an answer to this question in the context of a case study about the 2015 Canadian federal elections. Based on our analysis, we recommend a hybrid approach that combines an effort driven by social media and more traditional curatorial methods.
AbstractList Any preservation effort must begin with an assessment of what content to preserve, and web archiving is no different. There have historically been two answers to the question "what should we archive?" The Internet Archive's broad entire-web crawls have been supplemented by narrower domain-or topic-specific collections gathered by numerous libraries. We can characterize this as content selection and curation by "gatekeepers". In contrast, we have witnessed the emergence of another approach driven by "the masses" - we can archive pages that are contained in social media streams such as Twitter. The interesting question, of course, is how these approaches differ. We provide an answer to this question in the context of a case study about the 2015 Canadian federal elections. Based on our analysis, we recommend a hybrid approach that combines an effort driven by social media and more traditional curatorial methods.
Author Lin, Jimmy
Ruest, Nick
Milligan, Ian
Author_xml – sequence: 1
  givenname: Ian
  surname: Milligan
  fullname: Milligan, Ian
  email: i2milligan@uwaterloo.ca
– sequence: 2
  givenname: Nick
  surname: Ruest
  fullname: Ruest, Nick
  email: ruestn@yorku.ca
– sequence: 3
  givenname: Jimmy
  surname: Lin
  fullname: Lin, Jimmy
  email: jimmylin@uwaterloo.ca
BookMark eNotzD1PwzAUhWEjgQQtnRlY_AcSfO04sdlQBQWpEgxlrm6c6zbQOpVtivj3hI_p1XmGM2GnYQjE2BWIEqDSN9KCMLYuf2pBnbDJqEJVUlp7zmYpvQkhJBiQor5gL_MhZAqZJ9qRy_0QOIaOu4-Iv8MPkX9SyzG6bX_sw-aWr7bEN5jpnehAMfFjKnkebY8pUbpkZx53iWb_nbLXh_vV_LFYPi-e5nfLAmVT50K32IEG46AF2QknvdHemJFraEWlKu29AqusM74GTwLJ2saDIYUKu1ZN2fXfb09E60Ps9xi_1o3WVjegvgFiEk6i
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/2910896.2910913
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Library & Information Science
EISBN 1450342299
9781450342292
EndPage 110
ExternalDocumentID 7559571
Genre orig-research
GroupedDBID 6IE
6IL
ACM
ALMA_UNASSIGNED_HOLDINGS
APO
CBEJK
GUFHI
LHSKQ
RIE
RIL
ID FETCH-LOGICAL-a276t-5bad1518c1b12d0c2f85f885ba61b04345ff31939c8f61fe0ae997f18e3a3adb3
IEDL.DBID RIE
ISICitedReferencesCount 9
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000389502300017&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 01:34:05 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a276t-5bad1518c1b12d0c2f85f885ba61b04345ff31939c8f61fe0ae997f18e3a3adb3
OpenAccessLink https://hdl.handle.net/10012/11649
PageCount 4
ParticipantIDs ieee_primary_7559571
PublicationCentury 2000
PublicationDate 2016-June
PublicationDateYYYYMMDD 2016-06-01
PublicationDate_xml – month: 06
  year: 2016
  text: 2016-June
PublicationDecade 2010
PublicationTitle JCDL '16 : proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries : June 19-23, 2016, Newark, NJ, USA
PublicationTitleAbbrev JCDL
PublicationYear 2016
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0002181206
Score 1.7197295
Snippet Any preservation effort must begin with an assessment of what content to preserve, and web archiving is no different. There have historically been two answers...
SourceID ieee
SourceType Publisher
StartPage 107
SubjectTerms Internet
Libraries
Logic gates
Media
Nominations and elections
Tagging
Twitter
Title Content selection and curation for web archiving: The gatekeepers vs. the masses
URI https://ieeexplore.ieee.org/document/7559571
WOSCitedRecordID wos000389502300017&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZKxcDEo0W8im5ATKSNY8exWREVU9UBpG6VH2cJIdKKpv392E5UGFiYEjmyIp1jf5e7--4j5I4a6aWyLKPOVxmPdVQy9r21jjkjIgSiSWIT1WwmFws175GHPRcGEVPxGY7jbcrlu5XdxlDZpArubxkJ4wdVJVqu1j6ekqAqF133HsrLSRGQUCoxjlcV5Qt-yack9Jge_--9J2T4Q8OD-R5gTkkP6zMy6pgGcA8dlSiaFro9OiDz1G-qbmCTFG7iM107sNt2qSFMgXB0QsogxGjCI4RPBWI07QNxHbxB2G3GEPxC-NQxIzwkb9Pn16eXrJNNyHRRiSYrjXYBx6WlhhYut4WXpZcyDAtqcs546X3YeExZ6QX1mGtUqvJUItNMO8POSb9e1XhBwDttpTaGaYPc81Jp6UvOdfirLVBLvCSDaK3luu2MsewMdfX38DU5Cu6GaAutbki_-driiBzaXfO--bpNy_kNl5Ki_Q
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFA9jCnryYxO_pjmIJ7s1TdomXsUxcY4dJuw28vECInZj7fb3m6RlevDiqSUlFF6a_F7fe7_3Q-iOKG650DQixuYR83VU3Pe91YYalXkIBBXEJvLJhM_nYtpCDzsuDACE4jPo-9uQyzdLvfGhskHu3N_UE8b3UsaSuGZr7SIqAazirOnfQ1g6SBwWcpH1_VV4AYNfAioBP4ZH_3vzMer-EPHwdAcxJ6gFxSnqNVwDfI8bMpE3Lm52aQdNQ8eposJl0Ljxz2RhsN7Ui43dFOwOTxxyCD6e8Ijdx4J9PO0TYOX8Qbwt-9h5hvhL-pxwF70Pn2dPo6gRTohkkmdVlCppHJJzTRRJTKwTy1PLuRvOiIoZZam1butRobnNiIVYghC5JRyopNIoeobaxbKAc4StkZpLpahUwCxLheTWGV-6_9oEJIcL1PHWWqzq3hiLxlCXfw_fooPR7G28GL9MXq_QoXM-srrs6hq1q_UGemhfb6uPcn0TlvYbPhCmRA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=JCDL+%2716+%3A+proceedings+of+the+16th+ACM%2FIEEE-CS+Joint+Conference+on+Digital+Libraries+%3A+June+19-23%2C+2016%2C+Newark%2C+NJ%2C+USA&rft.atitle=Content+selection+and+curation+for+web+archiving%3A+The+gatekeepers+vs.+the+masses&rft.au=Milligan%2C+Ian&rft.au=Ruest%2C+Nick&rft.au=Lin%2C+Jimmy&rft.date=2016-06-01&rft.pub=ACM&rft.spage=107&rft.epage=110&rft_id=info:doi/10.1145%2F2910896.2910913&rft.externalDocID=7559571