Understanding Performance Concerns in the API Documentation of Data Science Libraries

The development of efficient data science applications is often impeded by unbearably long execution time and rapid RAM exhaustion. Since API documentation is the primary information source for troubleshooting, we investigate how performance concerns are documented in popular data science libraries....

Full description

Saved in:
Bibliographic Details
Published in:2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE) pp. 895 - 906
Main Authors: Tao, Yida, Jiang, Jiefang, Liu, Yepang, Xu, Zhiwu, Qin, Shengchao
Format: Conference Proceeding
Language:English
Published: ACM 01.09.2020
Subjects:
ISSN:2643-1572
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract The development of efficient data science applications is often impeded by unbearably long execution time and rapid RAM exhaustion. Since API documentation is the primary information source for troubleshooting, we investigate how performance concerns are documented in popular data science libraries. Our quantitative results reveal the prevalence of data science APIs that are documented in performance-related context and the infrequent maintenance activities on such documentation. Our qualitative analyses further reveal that crowd documentation like Stack Overflow and GitHub are highly complementary to official documentation in terms of the API coverage, the knowledge distribution, as well as the specific information conveyed through performance-related content. Data science practitioners could benefit from our findings by learning a more targeted search strategy for resolving performance issues. Researchers can be more assured of the advantages of integrating both the official and the crowd documentation to achieve a holistic view on the performance concerns in data science development.
AbstractList The development of efficient data science applications is often impeded by unbearably long execution time and rapid RAM exhaustion. Since API documentation is the primary information source for troubleshooting, we investigate how performance concerns are documented in popular data science libraries. Our quantitative results reveal the prevalence of data science APIs that are documented in performance-related context and the infrequent maintenance activities on such documentation. Our qualitative analyses further reveal that crowd documentation like Stack Overflow and GitHub are highly complementary to official documentation in terms of the API coverage, the knowledge distribution, as well as the specific information conveyed through performance-related content. Data science practitioners could benefit from our findings by learning a more targeted search strategy for resolving performance issues. Researchers can be more assured of the advantages of integrating both the official and the crowd documentation to achieve a holistic view on the performance concerns in data science development.
Author Qin, Shengchao
Xu, Zhiwu
Liu, Yepang
Jiang, Jiefang
Tao, Yida
Author_xml – sequence: 1
  givenname: Yida
  surname: Tao
  fullname: Tao, Yida
  email: yidatao@szu.edu.cn
  organization: College of Computer Science and Software, Engineering Shenzhen University,China
– sequence: 2
  givenname: Jiefang
  surname: Jiang
  fullname: Jiang, Jiefang
  email: jiangjiefang2018@email.szu.edu.cn
  organization: College of Computer Science and Software, Engineering Shenzhen University,China
– sequence: 3
  givenname: Yepang
  surname: Liu
  fullname: Liu, Yepang
  email: liuyp1@sustech.edu.cn
  organization: Southern University of Science and Technology,Department of Computer Science and Engineering,China
– sequence: 4
  givenname: Zhiwu
  surname: Xu
  fullname: Xu, Zhiwu
  email: xuzhiwu@szu.edu.cn
  organization: College of Computer Science and Software, Engineering Shenzhen University,China
– sequence: 5
  givenname: Shengchao
  surname: Qin
  fullname: Qin, Shengchao
  email: s.qin@tees.ac.uk
  organization: School of Computing, Engr. & Digital Technologies, Teesside University, UK College of Computer Sci. & Software Engr., Shenzhen University,China
BookMark eNotjMtOwzAURA0CibZ0zYKNfyDF8eM6WVYphUqRqARZVzfJNRgRGzlhwd8TBJsZjY7OLNlFiIEYu8nFJs-1uVNK6qLQG6VzMFqdsXVpixkIBRYKfc4WErTKcmPlFVuO47sQZh52wZom9JTGCUPvwys_UnIxDRg64lWcM4WR-8CnN-Lb44HvYvc1UJhw8jHw6PgOJ-TPnadfo_ZtwuRpvGaXDj9GWv_3ijX7-5fqMaufHg7Vts5QajtlEhW0bS7AtRoJbSeVM6qgDi0YI8BaJzSAAyOplW3fW9mXpeidk1BqNGrFbv9-PRGdPpMfMH2fSlnA7KkfTfJR_Q
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3324884.3416543
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781450367684
1450367682
EISSN 2643-1572
EndPage 906
ExternalDocumentID 9286046
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 61972260,61772347,61836005,61932021,61802164
  funderid: 10.13039/501100001809
GroupedDBID 29I
6IE
6IF
6IH
6IK
6IL
6IM
6IN
6J9
AAJGR
AAWTH
ABLEC
ACREN
ADYOE
ADZIZ
AFYQB
ALMA_UNASSIGNED_HOLDINGS
AMTXH
APO
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
ID FETCH-LOGICAL-a247t-2a36bb106fb4aea7c23f538eca76550677f0466f652eb2bdd72d990dff2694a53
IEDL.DBID RIE
ISICitedReferencesCount 3
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000651313500075&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:33:27 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a247t-2a36bb106fb4aea7c23f538eca76550677f0466f652eb2bdd72d990dff2694a53
PageCount 12
ParticipantIDs ieee_primary_9286046
PublicationCentury 2000
PublicationDate 2020-Sept.
PublicationDateYYYYMMDD 2020-09-01
PublicationDate_xml – month: 09
  year: 2020
  text: 2020-Sept.
PublicationDecade 2020
PublicationTitle 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)
PublicationTitleAbbrev ASE
PublicationYear 2020
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0051577
ssj0002871035
Score 2.1716602
Snippet The development of efficient data science applications is often impeded by unbearably long execution time and rapid RAM exhaustion. Since API documentation is...
SourceID ieee
SourceType Publisher
StartPage 895
SubjectTerms API documentation
Data science
Documentation
empirical study
Libraries
Maintenance engineering
Search problems
Software development management
Software engineering
Title Understanding Performance Concerns in the API Documentation of Data Science Libraries
URI https://ieeexplore.ieee.org/document/9286046
WOSCitedRecordID wos000651313500075&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09b8IwED0B6tCJtlD1Wx461tA48V0yVlDUSghlKBIbchxbYgkVhP7-2iGEVurSzfLgb_vds_3uAB5JYhYqHXAtIuQObwVPlDQ81qQDbSK3DCo_s1OazeLFIklb8NRoYYwx1eczM_DJ6i0_X-udvyobJiJGx-fa0CbCvVaruU_xlv9z2Ji-DqaJalc-QSSHoTMc4jgauEPbqyl_xVKpoGTS_V8jzqB_1OSxtEGbc2iZ4gK6h6AMrN6jPZjPf8pVWHoUBrCRlyhuii1bFczZfewlfWfjusJqgtjasrEq1aE0Nj2Q6T7MJ68fozdex07gSkRUcqFCzDLH92wWKaNIi9C6s81oRehICRJZ1wm0KIXj1lmek8gdMOXWemmrkuEldIp1Ya6AYUSJlCIx3lke5iZWiDkZwkCLTKK-hp4fpeXn3j3Gsh6gm7-zb-FUeMpafdO6g0652Zl7ONFf5Wq7eajm9Bs9paD2
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT8IwGH6DaKInVDB-24NHC67rx3Y0IoE4yQ6QcCNd2yVchoHh77ctY2jixVvTQ7_b533aPu8L8CgYz0KpAqwI5djiLcGxZAZHSqhAGWqXgfczm4jxOJrN4rQBT7UWxhjjP5-Zrkv6t3y9VBt3VdaLScQtnzuAQxc5K9iqteobFWf7P4e18WuBWojKmU9AWS-0pkMU0a49tp2e8lc0FQ8mg9b_mnEKnb0qD6U13pxBwxTn0NqFZUDVLm3DdPpTsILSvTQAvTqR4qpYo0WBrOWHXtIR6lcV-ilCyxz1ZSl3paFkR6c7MB28TV6HuIqegCWhosREhjzLLOPLMyqNFIqEuT3djJKCW1rChchtJ3jOGbHsOtNaEG2hSee5E7dKFl5As1gW5hIQpyJmjMTGucvj2kSScy2M4IEiGePqCtpulOafWwcZ82qArv_OfoDj4eQjmSej8fsNnBBHYP2nrVtolquNuYMj9VUu1qt7P7_fzRKkPQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2020+35th+IEEE%2FACM+International+Conference+on+Automated+Software+Engineering+%28ASE%29&rft.atitle=Understanding+Performance+Concerns+in+the+API+Documentation+of+Data+Science+Libraries&rft.au=Tao%2C+Yida&rft.au=Jiang%2C+Jiefang&rft.au=Liu%2C+Yepang&rft.au=Xu%2C+Zhiwu&rft.date=2020-09-01&rft.pub=ACM&rft.eissn=2643-1572&rft.spage=895&rft.epage=906&rft_id=info:doi/10.1145%2F3324884.3416543&rft.externalDocID=9286046