InspectJS: Leveraging Code Similarity and User-Feedback for Effective Taint Specification Inference for JavaScript

Static analysis has established itself as a weapon of choice for detecting security vulnerabilities. Taint analysis in particular is a very general and powerful technique, where security policies are expressed in terms of forbidden flows, either from untrusted input sources to sensitive sinks (in in...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP) s. 165 - 174
Hlavní autoři: Dutta, Saikat, Garbervetsky, Diego, Lahiri, Shuvendu K., Schafer, Max
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.05.2022
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Static analysis has established itself as a weapon of choice for detecting security vulnerabilities. Taint analysis in particular is a very general and powerful technique, where security policies are expressed in terms of forbidden flows, either from untrusted input sources to sensitive sinks (in integrity policies) or from sensitive sources to untrusted sinks (in confidentiality policies). The appeal of this approach is that the taint-tracking mechanism has to be implemented only once, and can then be parameterized with different taint specifications (that is, sets of sources and sinks, as well as any sanitizers that render otherwise problematic flows innocuous) to detect many different kinds of vulnerabilities. But while techniques for implementing scalable inter-procedural static taint tracking are fairly well established, crafting taint specifications is still more of an art than a science, and in practice tends to involve a lot of manual effort. Past work has focussed on automated techniques for inferring taint specifications for libraries either from their implementation or from the way they tend to be used in client code. Among the latter, machine learning-based approaches have shown great promise. In this work we present our experience combining an existing machine-learning approach to mining sink specifications for JavaScript libraries with manual taint modelling in the context of GitHub's CodeQL analysis framework. We show that the machine-learning component can successfully infer many new taint sinks that either are not part of the manual modelling or are not detected due to analysis incompleteness. Moreover, we present techniques for organizing sink predictions using automated ranking and code-similarity metrics that allow an analysis engineer to efficiently sift through large numbers of predictions to identify true positives.
AbstractList Static analysis has established itself as a weapon of choice for detecting security vulnerabilities. Taint analysis in particular is a very general and powerful technique, where security policies are expressed in terms of forbidden flows, either from untrusted input sources to sensitive sinks (in integrity policies) or from sensitive sources to untrusted sinks (in confidentiality policies). The appeal of this approach is that the taint-tracking mechanism has to be implemented only once, and can then be parameterized with different taint specifications (that is, sets of sources and sinks, as well as any sanitizers that render otherwise problematic flows innocuous) to detect many different kinds of vulnerabilities. But while techniques for implementing scalable inter-procedural static taint tracking are fairly well established, crafting taint specifications is still more of an art than a science, and in practice tends to involve a lot of manual effort. Past work has focussed on automated techniques for inferring taint specifications for libraries either from their implementation or from the way they tend to be used in client code. Among the latter, machine learning-based approaches have shown great promise. In this work we present our experience combining an existing machine-learning approach to mining sink specifications for JavaScript libraries with manual taint modelling in the context of GitHub's CodeQL analysis framework. We show that the machine-learning component can successfully infer many new taint sinks that either are not part of the manual modelling or are not detected due to analysis incompleteness. Moreover, we present techniques for organizing sink predictions using automated ranking and code-similarity metrics that allow an analysis engineer to efficiently sift through large numbers of predictions to identify true positives.
Author Schafer, Max
Lahiri, Shuvendu K.
Dutta, Saikat
Garbervetsky, Diego
Author_xml – sequence: 1
  givenname: Saikat
  surname: Dutta
  fullname: Dutta, Saikat
  email: saikatd2@illinois.edu
  organization: UIUC,Urbana,USA
– sequence: 2
  givenname: Diego
  surname: Garbervetsky
  fullname: Garbervetsky, Diego
  email: diegog@dc.uba.ar
  organization: DC/UBA. ICC/CONICET,Buenos Aires,Argentina
– sequence: 3
  givenname: Shuvendu K.
  surname: Lahiri
  fullname: Lahiri, Shuvendu K.
  email: shuvendu.lahiri@microsoft.com
  organization: Microsoft Research,Seattle,USA
– sequence: 4
  givenname: Max
  surname: Schafer
  fullname: Schafer, Max
  email: max-schaefer@github.com
  organization: GitHub,Oxford,UK
BookMark eNotjL1OwzAYAI0EAy3MDCx-gRS7_kvYUNRCqkgMaefqi_25smidyIki9e2JoNMtd7cg97GLSMgLZyvOpXoTijOpzGqmYDK_IwuutZKFKhh_JKmKQ4923DXvtMYJE5xCPNGyc0ibcAlnSGG8UoiOHgZM2RbRtWB_qO8S3Xg_p2FCuocQR9rMp-CDhTF0kVbRY8Jo8c_dwQSNTaEfn8iDh_OAzzcuyWG72ZdfWf39WZUfdQbcmDGTXnPFQRcKcoQWW21QtDlzLM-lUwastIyZtbBCCwOmZVblrl37wnvrtRNL8vr_DYh47FO4QLoeC1NIxpX4BRucV7A
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3510457.3513048
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
Accès UT - IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1665495901
9781665495905
EndPage 174
ExternalDocumentID 9794015
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-a177t-4f6151a695a8eabeb67e3b80d0884d57ac4c00723c3637a7b0c58db2f9ffcf6d3
IEDL.DBID RIE
ISICitedReferencesCount 4
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000850199400028&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Thu Jan 18 11:14:43 EST 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a177t-4f6151a695a8eabeb67e3b80d0884d57ac4c00723c3637a7b0c58db2f9ffcf6d3
PageCount 10
ParticipantIDs ieee_primary_9794015
PublicationCentury 2000
PublicationDate 2022-May
PublicationDateYYYYMMDD 2022-05-01
PublicationDate_xml – month: 05
  year: 2022
  text: 2022-May
PublicationDecade 2020
PublicationTitle 2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)
PublicationTitleAbbrev ICSE-SEIP
PublicationYear 2022
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.8237991
Snippet Static analysis has established itself as a weapon of choice for detecting security vulnerabilities. Taint analysis in particular is a very general and...
SourceID ieee
SourceType Publisher
StartPage 165
SubjectTerms Analytical models
Codes
JavaScript
Machine learning
Manuals
Measurement
Static analysis
Taint Analysis
Weapons
Title InspectJS: Leveraging Code Similarity and User-Feedback for Effective Taint Specification Inference for JavaScript
URI https://ieeexplore.ieee.org/document/9794015
WOSCitedRecordID wos000850199400028&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA61ePCk0opvcvBo2n3ktV7FYouUQlvorUxesIhbqdv-fpPsUhG8eEoIGQaSYTKT5JsPoQeeGsFY5gioVBGaaU6UtIaA8PErcMsFb8gmxHQqV6ti1kGPByyMtTZ-PrOD0I1v-Wajd-GqbFh440kCovxICN5gtdpqPSllw9ybF2Vi4Fufo8tfdCnxtBid_k_PGer_wO7w7HCgnKOOrXpoO64iGnIyf8Jv1htepBXCzxtj8bz8KH1m6gNpDJXBS29OZOSlFeh37KNR3NQm9g4NL6CsahzZ5l17TYfHB61h7gT2MI8-pI-Wo5fF8ytpiRIIpELUhLoQlwAvGEgLyioubK5kYrwLoYYJ0FSHEuG5znkuQKhEM2lU5grntOMmv0DdalPZS4SLNPfyythMAWWSAxinfVqRaNBaSXqFemG91p9NLYx1u1TXfw_foJMswAXiB8Fb1K23O3uHjvW-Lr-293EDvwFCo6Fj
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5FBT2ptOLbHDyadh95rddiaWsthbbQW5k8Foq4lbrt7zfJLhXBi6eEkGEgGSYzSb75EHrksRGMJTkBFStCE82JktYQEC5-BW654BXZhBiP5WKRTRroaY-FsdaGz2e27bvhLd-s9dZflXUyZzyRR5QfMkqTqEJr1fV6Yso6qTMwykTbtS5Ll78IU8J50Tv9n6Yz1PoB3uHJ_kg5Rw1bNNFmUAQ85HD6jEfWmV4gFsLdtbF4uvpYudzUhdIYCoPnzqBIz0kr0O_YxaO4qk7sXBqewaooceCbz-uLOjzYa_Vzh7CDafAiLTTvvcy6fVJTJRCIhSgJzX1kAjxjIC0oq7iwqZKRcU6EGiZAU-2LhKc65akAoSLNpFFJnuW5zrlJL9BBsS7sJcJZnDp5ZWyigDLJAUyuXWIRadBaSXqFmn69lp9VNYxlvVTXfw8_oOP-7G20HA3GrzfoJPHggfBd8BYdlJutvUNHeleuvjb3YTO_AdEspKo
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+IEEE%2FACM+44th+International+Conference+on+Software+Engineering%3A+Software+Engineering+in+Practice+%28ICSE-SEIP%29&rft.atitle=InspectJS%3A+Leveraging+Code+Similarity+and+User-Feedback+for+Effective+Taint+Specification+Inference+for+JavaScript&rft.au=Dutta%2C+Saikat&rft.au=Garbervetsky%2C+Diego&rft.au=Lahiri%2C+Shuvendu+K.&rft.au=Schafer%2C+Max&rft.date=2022-05-01&rft.pub=IEEE&rft.spage=165&rft.epage=174&rft_id=info:doi/10.1145%2F3510457.3513048&rft.externalDocID=9794015