Understanding Performance Concerns in the API Documentation of Data Science Libraries
The development of efficient data science applications is often impeded by unbearably long execution time and rapid RAM exhaustion. Since API documentation is the primary information source for troubleshooting, we investigate how performance concerns are documented in popular data science libraries....
Saved in:
| Published in: | 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE) pp. 895 - 906 |
|---|---|
| Main Authors: | , , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
ACM
01.09.2020
|
| Subjects: | |
| ISSN: | 2643-1572 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | The development of efficient data science applications is often impeded by unbearably long execution time and rapid RAM exhaustion. Since API documentation is the primary information source for troubleshooting, we investigate how performance concerns are documented in popular data science libraries. Our quantitative results reveal the prevalence of data science APIs that are documented in performance-related context and the infrequent maintenance activities on such documentation. Our qualitative analyses further reveal that crowd documentation like Stack Overflow and GitHub are highly complementary to official documentation in terms of the API coverage, the knowledge distribution, as well as the specific information conveyed through performance-related content. Data science practitioners could benefit from our findings by learning a more targeted search strategy for resolving performance issues. Researchers can be more assured of the advantages of integrating both the official and the crowd documentation to achieve a holistic view on the performance concerns in data science development. |
|---|---|
| AbstractList | The development of efficient data science applications is often impeded by unbearably long execution time and rapid RAM exhaustion. Since API documentation is the primary information source for troubleshooting, we investigate how performance concerns are documented in popular data science libraries. Our quantitative results reveal the prevalence of data science APIs that are documented in performance-related context and the infrequent maintenance activities on such documentation. Our qualitative analyses further reveal that crowd documentation like Stack Overflow and GitHub are highly complementary to official documentation in terms of the API coverage, the knowledge distribution, as well as the specific information conveyed through performance-related content. Data science practitioners could benefit from our findings by learning a more targeted search strategy for resolving performance issues. Researchers can be more assured of the advantages of integrating both the official and the crowd documentation to achieve a holistic view on the performance concerns in data science development. |
| Author | Qin, Shengchao Xu, Zhiwu Liu, Yepang Jiang, Jiefang Tao, Yida |
| Author_xml | – sequence: 1 givenname: Yida surname: Tao fullname: Tao, Yida email: yidatao@szu.edu.cn organization: College of Computer Science and Software, Engineering Shenzhen University,China – sequence: 2 givenname: Jiefang surname: Jiang fullname: Jiang, Jiefang email: jiangjiefang2018@email.szu.edu.cn organization: College of Computer Science and Software, Engineering Shenzhen University,China – sequence: 3 givenname: Yepang surname: Liu fullname: Liu, Yepang email: liuyp1@sustech.edu.cn organization: Southern University of Science and Technology,Department of Computer Science and Engineering,China – sequence: 4 givenname: Zhiwu surname: Xu fullname: Xu, Zhiwu email: xuzhiwu@szu.edu.cn organization: College of Computer Science and Software, Engineering Shenzhen University,China – sequence: 5 givenname: Shengchao surname: Qin fullname: Qin, Shengchao email: s.qin@tees.ac.uk organization: School of Computing, Engr. & Digital Technologies, Teesside University, UK College of Computer Sci. & Software Engr., Shenzhen University,China |
| BookMark | eNotjMtOwzAURA0CibZ0zYKNfyDF8eM6WVYphUqRqARZVzfJNRgRGzlhwd8TBJsZjY7OLNlFiIEYu8nFJs-1uVNK6qLQG6VzMFqdsXVpixkIBRYKfc4WErTKcmPlFVuO47sQZh52wZom9JTGCUPvwys_UnIxDRg64lWcM4WR-8CnN-Lb44HvYvc1UJhw8jHw6PgOJ-TPnadfo_ZtwuRpvGaXDj9GWv_3ijX7-5fqMaufHg7Vts5QajtlEhW0bS7AtRoJbSeVM6qgDi0YI8BaJzSAAyOplW3fW9mXpeidk1BqNGrFbv9-PRGdPpMfMH2fSlnA7KkfTfJR_Q |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/3324884.3416543 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9781450367684 1450367682 |
| EISSN | 2643-1572 |
| EndPage | 906 |
| ExternalDocumentID | 9286046 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Natural Science Foundation of China grantid: 61972260,61772347,61836005,61932021,61802164 funderid: 10.13039/501100001809 |
| GroupedDBID | 29I 6IE 6IF 6IH 6IK 6IL 6IM 6IN 6J9 AAJGR AAWTH ABLEC ACREN ADYOE ADZIZ AFYQB ALMA_UNASSIGNED_HOLDINGS AMTXH APO BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL |
| ID | FETCH-LOGICAL-a247t-2a36bb106fb4aea7c23f538eca76550677f0466f652eb2bdd72d990dff2694a53 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 3 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000651313500075&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:33:27 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a247t-2a36bb106fb4aea7c23f538eca76550677f0466f652eb2bdd72d990dff2694a53 |
| PageCount | 12 |
| ParticipantIDs | ieee_primary_9286046 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-Sept. |
| PublicationDateYYYYMMDD | 2020-09-01 |
| PublicationDate_xml | – month: 09 year: 2020 text: 2020-Sept. |
| PublicationDecade | 2020 |
| PublicationTitle | 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE) |
| PublicationTitleAbbrev | ASE |
| PublicationYear | 2020 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssj0051577 ssj0002871035 |
| Score | 2.1716602 |
| Snippet | The development of efficient data science applications is often impeded by unbearably long execution time and rapid RAM exhaustion. Since API documentation is... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 895 |
| SubjectTerms | API documentation Data science Documentation empirical study Libraries Maintenance engineering Search problems Software development management Software engineering |
| Title | Understanding Performance Concerns in the API Documentation of Data Science Libraries |
| URI | https://ieeexplore.ieee.org/document/9286046 |
| WOSCitedRecordID | wos000651313500075&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09b8IwED0B6tCJtlD1Wx461tA48V0yVlDUSghlKBIbchxbYgkVhP7-2iGEVurSzfLgb_vds_3uAB5JYhYqHXAtIuQObwVPlDQ81qQDbSK3DCo_s1OazeLFIklb8NRoYYwx1eczM_DJ6i0_X-udvyobJiJGx-fa0CbCvVaruU_xlv9z2Ji-DqaJalc-QSSHoTMc4jgauEPbqyl_xVKpoGTS_V8jzqB_1OSxtEGbc2iZ4gK6h6AMrN6jPZjPf8pVWHoUBrCRlyhuii1bFczZfewlfWfjusJqgtjasrEq1aE0Nj2Q6T7MJ68fozdex07gSkRUcqFCzDLH92wWKaNIi9C6s81oRehICRJZ1wm0KIXj1lmek8gdMOXWemmrkuEldIp1Ya6AYUSJlCIx3lke5iZWiDkZwkCLTKK-hp4fpeXn3j3Gsh6gm7-zb-FUeMpafdO6g0652Zl7ONFf5Wq7eajm9Bs9paD2 |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT8IwGH6DaKInVDB-24NHC67rx3Y0IoE4yQ6QcCNd2yVchoHh77ctY2jixVvTQ7_b533aPu8L8CgYz0KpAqwI5djiLcGxZAZHSqhAGWqXgfczm4jxOJrN4rQBT7UWxhjjP5-Zrkv6t3y9VBt3VdaLScQtnzuAQxc5K9iqteobFWf7P4e18WuBWojKmU9AWS-0pkMU0a49tp2e8lc0FQ8mg9b_mnEKnb0qD6U13pxBwxTn0NqFZUDVLm3DdPpTsILSvTQAvTqR4qpYo0WBrOWHXtIR6lcV-ilCyxz1ZSl3paFkR6c7MB28TV6HuIqegCWhosREhjzLLOPLMyqNFIqEuT3djJKCW1rChchtJ3jOGbHsOtNaEG2hSee5E7dKFl5As1gW5hIQpyJmjMTGucvj2kSScy2M4IEiGePqCtpulOafWwcZ82qArv_OfoDj4eQjmSej8fsNnBBHYP2nrVtolquNuYMj9VUu1qt7P7_fzRKkPQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2020+35th+IEEE%2FACM+International+Conference+on+Automated+Software+Engineering+%28ASE%29&rft.atitle=Understanding+Performance+Concerns+in+the+API+Documentation+of+Data+Science+Libraries&rft.au=Tao%2C+Yida&rft.au=Jiang%2C+Jiefang&rft.au=Liu%2C+Yepang&rft.au=Xu%2C+Zhiwu&rft.date=2020-09-01&rft.pub=ACM&rft.eissn=2643-1572&rft.spage=895&rft.epage=906&rft_id=info:doi/10.1145%2F3324884.3416543&rft.externalDocID=9286046 |