Large-scale learning with AdaGrad on Spark

Stochastic Gradient Descent (SGD) is a simple yet very efficient online learning algorithm for optimizing convex (and often non-convex) functions and one of the most popular stochastic optimization methods in machine learning today. One drawback of SGD is that it is sensitive to the learning rate hy...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2015 IEEE International Conference on Big Data (Big Data) s. 2828 - 2830
Hlavní autoři: Hadgu, Asmelash Teka, Nigam, Aastha, Diaz-Aviles, Ernesto
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.10.2015
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Stochastic Gradient Descent (SGD) is a simple yet very efficient online learning algorithm for optimizing convex (and often non-convex) functions and one of the most popular stochastic optimization methods in machine learning today. One drawback of SGD is that it is sensitive to the learning rate hyper-parameter. The Adaptive Sub-gradient Descent, AdaGrad, dynamically incorporates knowledge of the geometry of the data observed in earlier iterations to calculate a different learning rate for every feature. In this work, we implement a distributed version of AdaGrad for large-scale machine learning tasks using Apache Spark. Apache Spark is a fast cluster computing engine that provides similar scalability and fault tolerance properties to MapReduce, but in contrast to Hadoop's two-stage disk-based MapReduce paradigm, Spark's multi-stage in-memory primitives allow user programs to load data into a cluster's memory and query it repeatedly, which makes it ideal for building scalable machine learning applications. We empirically evaluate our implementation on large-scale real-world problems in the machine learning canonical tasks of classification and regression. Comparing our implementation of AdaGrad with the SGD scheduler currently available in Spark's Machine Learning Library (MLlib), we experimentally show that AdaGrad saves time by avoiding manually setting a learning-rate hyperparameter, converges fast and can even achieve better generalization errors.
AbstractList Stochastic Gradient Descent (SGD) is a simple yet very efficient online learning algorithm for optimizing convex (and often non-convex) functions and one of the most popular stochastic optimization methods in machine learning today. One drawback of SGD is that it is sensitive to the learning rate hyper-parameter. The Adaptive Sub-gradient Descent, AdaGrad, dynamically incorporates knowledge of the geometry of the data observed in earlier iterations to calculate a different learning rate for every feature. In this work, we implement a distributed version of AdaGrad for large-scale machine learning tasks using Apache Spark. Apache Spark is a fast cluster computing engine that provides similar scalability and fault tolerance properties to MapReduce, but in contrast to Hadoop's two-stage disk-based MapReduce paradigm, Spark's multi-stage in-memory primitives allow user programs to load data into a cluster's memory and query it repeatedly, which makes it ideal for building scalable machine learning applications. We empirically evaluate our implementation on large-scale real-world problems in the machine learning canonical tasks of classification and regression. Comparing our implementation of AdaGrad with the SGD scheduler currently available in Spark's Machine Learning Library (MLlib), we experimentally show that AdaGrad saves time by avoiding manually setting a learning-rate hyperparameter, converges fast and can even achieve better generalization errors.
Author Nigam, Aastha
Diaz-Aviles, Ernesto
Hadgu, Asmelash Teka
Author_xml – sequence: 1
  givenname: Asmelash Teka
  surname: Hadgu
  fullname: Hadgu, Asmelash Teka
  email: teka@L3S.de
  organization: L3S Res. Center, Hannover, Germany
– sequence: 2
  givenname: Aastha
  surname: Nigam
  fullname: Nigam, Aastha
  email: anigam@nd.edu
  organization: Univ. of Notre Dame, Notre Dame, IN, USA
– sequence: 3
  givenname: Ernesto
  surname: Diaz-Aviles
  fullname: Diaz-Aviles, Ernesto
  email: e.diaz-aviles@ie.ibm.com
  organization: IBM Res., Dublin, Ireland
BookMark eNotj01Lw0AUAFdQUGt-gR5yFhLf249s3rHWWoVAD9Vz2WTfxsWYlk1A_PcKdi5zG5hrcT4eRhbiDqFEBHp4jP2Tm10pAU1pVaWB8ExkZGvUlv6QlbwU2TTFFhQAkZb1lbhvXOq5mDo3cD6wS2Mc-_w7zh_50rtNcj4_jPnu6NLnjbgIbpg4O3kh3p_Xb6uXotluXlfLpogo1Vy0uvVGduQhEMpgu46QFXhljA7MNXrFWoYA2IKpNftgwVLFxgdPXYVqIW7_u5GZ98cUv1z62Z-O1C9b5kNQ
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/BigData.2015.7364091
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781479999262
1479999261
EndPage 2830
ExternalDocumentID 7364091
Genre orig-research
GroupedDBID 6IE
6IL
ALMA_UNASSIGNED_HOLDINGS
CBEJK
RIB
RIC
RIE
RIL
ID FETCH-LOGICAL-i123t-b4bd52c9d0f912f7cc91e30d3554fee81d3e42ff01b0584edf70796e5dfd9c613
IEDL.DBID RIE
IngestDate Wed Dec 20 05:19:11 EST 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i123t-b4bd52c9d0f912f7cc91e30d3554fee81d3e42ff01b0584edf70796e5dfd9c613
PageCount 3
ParticipantIDs ieee_primary_7364091
PublicationCentury 2000
PublicationDate 20151001
PublicationDateYYYYMMDD 2015-10-01
PublicationDate_xml – month: 10
  year: 2015
  text: 20151001
  day: 01
PublicationDecade 2010
PublicationTitle 2015 IEEE International Conference on Big Data (Big Data)
PublicationTitleAbbrev BigData
PublicationYear 2015
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib030099428
Score 1.8425597
Snippet Stochastic Gradient Descent (SGD) is a simple yet very efficient online learning algorithm for optimizing convex (and often non-convex) functions and one of...
SourceID ieee
SourceType Publisher
StartPage 2828
SubjectTerms Adaptive gradient
Aggregates
Distributed machine learning
History
Spark
Sparks
Stochastic processes
Support vector machines
Training
Title Large-scale learning with AdaGrad on Spark
URI https://ieeexplore.ieee.org/document/7364091
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA61ePCk0opvcvAkpk02m93N0Vf1IKWgQm8lm8yUImzLduvvN9ldK4IXbyGQ10zCfEnmmyHkSgnUFhQyJ5VmsZP-zHllsNQiRAplhomtk02k43E2nepJh9xsuTAAUDufwSAU6798t7Sb8FQ2TGXiryP-rrOTpknD1freOzJAHQ-lW3ac4Hp4t5g_mCrEFhJq0Db9lUOlNiGj_f8NfkD6P1w8OtlamUPSgaJHrl-CAzdbewEDbRM_zGl4U6W3zjyVxtFlQV9Xpvzok_fR49v9M2uzHrCFtyIVy-Pcqchqx1GLCFNrtQDJXQAGCODxpYQ4QuQi5x49gMMQ5C4B5dBp663zEekWywKOCQU_Z3_o8oijiIURme-FY5RLZbgwaXZCemGds1UT2GLWLvH07-ozshdE2XiynZNuVW7gguzaz2qxLi9rbXwBFzSKiw
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5KFfSk0opvc_AkbptsNt3N0VetWEvBCr2VbDIpRdiW7dbfb7JdK4IXbyGQd4b5JpmZD-BKMCs1ChsYLmQQGe5kzh1GEGuLobA8sR1dkk3Eg0EyHsthDW42sTCIWDqfYcsXy798M9cr_1TWjnnHmSPO1tnyzFlVtNb37eEe7DgwXcXHMSrbd7Ppgyp8diEmWlXjXywqpRLp7v1v-H1o_kTjkeFGzxxADbMGXPe9C3ewdFuMpKJ-mBL_qkpujXrKlSHzjLwtVP7RhPfu4-i-F1S8B8HM6ZEiSKPUiFBLQ61koY21lgw5NR4aWESHMDlGobWUpdThBzTWp7nroDDWSO308yHUs3mGR0DQzdmJXRpSyyKmWOJ6oTZMuVCUqTg5hoZf52SxTm0xqZZ48nf1Jez0Rq_9Sf958HIKu35b135tZ1Av8hWew7b-LGbL_KI8mS-Edo3U
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2015+IEEE+International+Conference+on+Big+Data+%28Big+Data%29&rft.atitle=Large-scale+learning+with+AdaGrad+on+Spark&rft.au=Hadgu%2C+Asmelash+Teka&rft.au=Nigam%2C+Aastha&rft.au=Diaz-Aviles%2C+Ernesto&rft.date=2015-10-01&rft.pub=IEEE&rft.spage=2828&rft.epage=2830&rft_id=info:doi/10.1109%2FBigData.2015.7364091&rft.externalDocID=7364091