InspectJS: Leveraging Code Similarity and User-Feedback for Effective Taint Specification Inference for JavaScript

Static analysis has established itself as a weapon of choice for detecting security vulnerabilities. Taint analysis in particular is a very general and powerful technique, where security policies are expressed in terms of forbidden flows, either from untrusted input sources to sensitive sinks (in in...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP) s. 165 - 174
Hlavní autoři:	Dutta, Saikat, Garbervetsky, Diego, Lahiri, Shuvendu K., Schafer, Max
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 01.05.2022
Témata:	Analytical models Codes JavaScript Machine learning Manuals Measurement Static analysis Taint Analysis Weapons
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	Static analysis has established itself as a weapon of choice for detecting security vulnerabilities. Taint analysis in particular is a very general and powerful technique, where security policies are expressed in terms of forbidden flows, either from untrusted input sources to sensitive sinks (in integrity policies) or from sensitive sources to untrusted sinks (in confidentiality policies). The appeal of this approach is that the taint-tracking mechanism has to be implemented only once, and can then be parameterized with different taint specifications (that is, sets of sources and sinks, as well as any sanitizers that render otherwise problematic flows innocuous) to detect many different kinds of vulnerabilities. But while techniques for implementing scalable inter-procedural static taint tracking are fairly well established, crafting taint specifications is still more of an art than a science, and in practice tends to involve a lot of manual effort. Past work has focussed on automated techniques for inferring taint specifications for libraries either from their implementation or from the way they tend to be used in client code. Among the latter, machine learning-based approaches have shown great promise. In this work we present our experience combining an existing machine-learning approach to mining sink specifications for JavaScript libraries with manual taint modelling in the context of GitHub's CodeQL analysis framework. We show that the machine-learning component can successfully infer many new taint sinks that either are not part of the manual modelling or are not detected due to analysis incompleteness. Moreover, we present techniques for organizing sink predictions using automated ranking and code-similarity metrics that allow an analysis engineer to efficiently sift through large numbers of predictions to identify true positives.
AbstractList	Static analysis has established itself as a weapon of choice for detecting security vulnerabilities. Taint analysis in particular is a very general and powerful technique, where security policies are expressed in terms of forbidden flows, either from untrusted input sources to sensitive sinks (in integrity policies) or from sensitive sources to untrusted sinks (in confidentiality policies). The appeal of this approach is that the taint-tracking mechanism has to be implemented only once, and can then be parameterized with different taint specifications (that is, sets of sources and sinks, as well as any sanitizers that render otherwise problematic flows innocuous) to detect many different kinds of vulnerabilities. But while techniques for implementing scalable inter-procedural static taint tracking are fairly well established, crafting taint specifications is still more of an art than a science, and in practice tends to involve a lot of manual effort. Past work has focussed on automated techniques for inferring taint specifications for libraries either from their implementation or from the way they tend to be used in client code. Among the latter, machine learning-based approaches have shown great promise. In this work we present our experience combining an existing machine-learning approach to mining sink specifications for JavaScript libraries with manual taint modelling in the context of GitHub's CodeQL analysis framework. We show that the machine-learning component can successfully infer many new taint sinks that either are not part of the manual modelling or are not detected due to analysis incompleteness. Moreover, we present techniques for organizing sink predictions using automated ranking and code-similarity metrics that allow an analysis engineer to efficiently sift through large numbers of predictions to identify true positives.
Author	Schafer, Max Lahiri, Shuvendu K. Dutta, Saikat Garbervetsky, Diego
Author_xml	– sequence: 1 givenname: Saikat surname: Dutta fullname: Dutta, Saikat email: saikatd2@illinois.edu organization: UIUC,Urbana,USA – sequence: 2 givenname: Diego surname: Garbervetsky fullname: Garbervetsky, Diego email: diegog@dc.uba.ar organization: DC/UBA. ICC/CONICET,Buenos Aires,Argentina – sequence: 3 givenname: Shuvendu K. surname: Lahiri fullname: Lahiri, Shuvendu K. email: shuvendu.lahiri@microsoft.com organization: Microsoft Research,Seattle,USA – sequence: 4 givenname: Max surname: Schafer fullname: Schafer, Max email: max-schaefer@github.com organization: GitHub,Oxford,UK
BookMark	eNotjL1OwzAYAI0EAy3MDCx-gRS7_kvYUNRCqkgMaefqi_25smidyIki9e2JoNMtd7cg97GLSMgLZyvOpXoTijOpzGqmYDK_IwuutZKFKhh_JKmKQ4923DXvtMYJE5xCPNGyc0ibcAlnSGG8UoiOHgZM2RbRtWB_qO8S3Xg_p2FCuocQR9rMp-CDhTF0kVbRY8Jo8c_dwQSNTaEfn8iDh_OAzzcuyWG72ZdfWf39WZUfdQbcmDGTXnPFQRcKcoQWW21QtDlzLM-lUwastIyZtbBCCwOmZVblrl37wnvrtRNL8vr_DYh47FO4QLoeC1NIxpX4BRucV7A
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1145/3510457.3513048
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings Accès UT - IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	1665495901 9781665495905
EndPage	174
ExternalDocumentID	9794015
Genre	orig-research
GroupedDBID	6IE 6IL CBEJK RIE RIL
ID	FETCH-LOGICAL-a177t-4f6151a695a8eabeb67e3b80d0884d57ac4c00723c3637a7b0c58db2f9ffcf6d3
IEDL.DBID	RIE
ISICitedReferencesCount	4
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000850199400028&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Thu Jan 18 11:14:43 EST 2024
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a177t-4f6151a695a8eabeb67e3b80d0884d57ac4c00723c3637a7b0c58db2f9ffcf6d3
PageCount	10
ParticipantIDs	ieee_primary_9794015
PublicationCentury	2000
PublicationDate	2022-May
PublicationDateYYYYMMDD	2022-05-01
PublicationDate_xml	– month: 05 year: 2022 text: 2022-May
PublicationDecade	2020
PublicationTitle	2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)
PublicationTitleAbbrev	ICSE-SEIP
PublicationYear	2022
Publisher	IEEE
Publisher_xml	– name: IEEE
Score	1.8237991
Snippet	Static analysis has established itself as a weapon of choice for detecting security vulnerabilities. Taint analysis in particular is a very general and...
SourceID	ieee
SourceType	Publisher
StartPage	165
SubjectTerms	Analytical models Codes JavaScript Machine learning Manuals Measurement Static analysis Taint Analysis Weapons
Title	InspectJS: Leveraging Code Similarity and User-Feedback for Effective Taint Specification Inference for JavaScript
URI	https://ieeexplore.ieee.org/document/9794015
WOSCitedRecordID	wos000850199400028&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA61ePCk0opvcvBo2n3ktV7FYouUQlvorUxesIhbqdv-fpPsUhG8eEoIGQaSYTKT5JsPoQeeGsFY5gioVBGaaU6UtIaA8PErcMsFb8gmxHQqV6ti1kGPByyMtTZ-PrOD0I1v-Wajd-GqbFh440kCovxICN5gtdpqPSllw9ybF2Vi4Fufo8tfdCnxtBid_k_PGer_wO7w7HCgnKOOrXpoO64iGnIyf8Jv1htepBXCzxtj8bz8KH1m6gNpDJXBS29OZOSlFeh37KNR3NQm9g4NL6CsahzZ5l17TYfHB61h7gT2MI8-pI-Wo5fF8ytpiRIIpELUhLoQlwAvGEgLyioubK5kYrwLoYYJ0FSHEuG5znkuQKhEM2lU5grntOMmv0DdalPZS4SLNPfyythMAWWSAxinfVqRaNBaSXqFemG91p9NLYx1u1TXfw_foJMswAXiB8Fb1K23O3uHjvW-Lr-293EDvwFCo6Fj
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5FBT2ptOLbHDyadh95rddiaWsthbbQW5k8Foq4lbrt7zfJLhXBi6eEkGEgGSYzSb75EHrksRGMJTkBFStCE82JktYQEC5-BW654BXZhBiP5WKRTRroaY-FsdaGz2e27bvhLd-s9dZflXUyZzyRR5QfMkqTqEJr1fV6Yso6qTMwykTbtS5Ll78IU8J50Tv9n6Yz1PoB3uHJ_kg5Rw1bNNFmUAQ85HD6jEfWmV4gFsLdtbF4uvpYudzUhdIYCoPnzqBIz0kr0O_YxaO4qk7sXBqewaooceCbz-uLOjzYa_Vzh7CDafAiLTTvvcy6fVJTJRCIhSgJzX1kAjxjIC0oq7iwqZKRcU6EGiZAU-2LhKc65akAoSLNpFFJnuW5zrlJL9BBsS7sJcJZnDp5ZWyigDLJAUyuXWIRadBaSXqFmn69lp9VNYxlvVTXfw8_oOP-7G20HA3GrzfoJPHggfBd8BYdlJutvUNHeleuvjb3YTO_AdEspKo
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+IEEE%2FACM+44th+International+Conference+on+Software+Engineering%3A+Software+Engineering+in+Practice+%28ICSE-SEIP%29&rft.atitle=InspectJS%3A+Leveraging+Code+Similarity+and+User-Feedback+for+Effective+Taint+Specification+Inference+for+JavaScript&rft.au=Dutta%2C+Saikat&rft.au=Garbervetsky%2C+Diego&rft.au=Lahiri%2C+Shuvendu+K.&rft.au=Schafer%2C+Max&rft.date=2022-05-01&rft.pub=IEEE&rft.spage=165&rft.epage=174&rft_id=info:doi/10.1145%2F3510457.3513048&rft.externalDocID=9794015