Black Boxes, White Noise: Similarity Detection for Neural Functions

Similarity, or clone, detection has important applications in copyright violation, software theft, code search, and the detection of malicious components. There is now a good number of open source and proprietary clone detectors for programs written in traditional programming languages. However, the...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:arXiv.org
Hlavní autoři: Farmahinifarahani, Farima, Lopes, Cristina V
Médium: Paper
Jazyk:angličtina
Vydáno: Ithaca Cornell University Library, arXiv.org 20.02.2023
Témata:
ISSN:2331-8422
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Similarity, or clone, detection has important applications in copyright violation, software theft, code search, and the detection of malicious components. There is now a good number of open source and proprietary clone detectors for programs written in traditional programming languages. However, the increasing adoption of deep learning models in software poses a challenge to these tools: these models implement functions that are inscrutable black boxes. As more software includes these DNN functions, new techniques are needed in order to assess the similarity between deep learning components of software. Previous work has unveiled techniques for comparing the representations learned at various layers of deep neural network models by feeding canonical inputs to the models. Our goal is to be able to compare DNN functions when canonical inputs are not available -- because they may not be in many application scenarios. The challenge, then, is to generate appropriate inputs and to identify a metric that, for those inputs, is capable of representing the degree of functional similarity between two comparable DNN functions. Our approach uses random input with values between -1 and 1, in a shape that is compatible with what the DNN models expect. We then compare the outputs by performing correlation analysis. Our study shows how it is possible to perform similarity analysis even in the absence of meaningful canonical inputs. The response to random inputs of two comparable DNN functions exposes those functions' similarity, or lack thereof. Of all the metrics tried, we find that Spearman's rank correlation coefficient is the most powerful and versatile, although in special cases other methods and metrics are more expressive. We present a systematic empirical study comparing the effectiveness of several similarity metrics using a dataset of 56,355 classifiers collected from GitHub. This is accompanied by a sensitivity analysis that reveals how certain models' training related properties affect the effectiveness of the similarity metrics. To the best of our knowledge, this is the first work that shows how similarity of DNN functions can be detected by using random inputs. Our study of correlation metrics, and the identification of Spearman correlation coefficient as the most powerful among them for this purpose, establishes a complete and practical method for DNN clone detection that can be used in the design of new tools. It may also serve as inspiration for other program analysis tasks whose approaches break in the presence of DNN components.
AbstractList Similarity, or clone, detection has important applications in copyright violation, software theft, code search, and the detection of malicious components. There is now a good number of open source and proprietary clone detectors for programs written in traditional programming languages. However, the increasing adoption of deep learning models in software poses a challenge to these tools: these models implement functions that are inscrutable black boxes. As more software includes these DNN functions, new techniques are needed in order to assess the similarity between deep learning components of software. Previous work has unveiled techniques for comparing the representations learned at various layers of deep neural network models by feeding canonical inputs to the models. Our goal is to be able to compare DNN functions when canonical inputs are not available -- because they may not be in many application scenarios. The challenge, then, is to generate appropriate inputs and to identify a metric that, for those inputs, is capable of representing the degree of functional similarity between two comparable DNN functions. Our approach uses random input with values between -1 and 1, in a shape that is compatible with what the DNN models expect. We then compare the outputs by performing correlation analysis. Our study shows how it is possible to perform similarity analysis even in the absence of meaningful canonical inputs. The response to random inputs of two comparable DNN functions exposes those functions' similarity, or lack thereof. Of all the metrics tried, we find that Spearman's rank correlation coefficient is the most powerful and versatile, although in special cases other methods and metrics are more expressive. We present a systematic empirical study comparing the effectiveness of several similarity metrics using a dataset of 56,355 classifiers collected from GitHub. This is accompanied by a sensitivity analysis that reveals how certain models' training related properties affect the effectiveness of the similarity metrics. To the best of our knowledge, this is the first work that shows how similarity of DNN functions can be detected by using random inputs. Our study of correlation metrics, and the identification of Spearman correlation coefficient as the most powerful among them for this purpose, establishes a complete and practical method for DNN clone detection that can be used in the design of new tools. It may also serve as inspiration for other program analysis tasks whose approaches break in the presence of DNN components.
Author Lopes, Cristina V
Farmahinifarahani, Farima
Author_xml – sequence: 1
  givenname: Farima
  surname: Farmahinifarahani
  fullname: Farmahinifarahani, Farima
– sequence: 2
  givenname: Cristina
  surname: Lopes
  middlename: V
  fullname: Lopes, Cristina V
BookMark eNotjc1KAzEYAIMoWGsfwFvAq7tmv_yuN9taFUo9WPBYsmk-TF03muxKfXuLehqYw8wZOe5i5wm5qFgpjJTs2qZ9-CqBMygrxpg8IiPgvCqMADglk5x3BwtKg5R8RGbT1ro3Oo17n6_oy2voPV3FkP0NfQ7vobUp9N907nvv-hA7ijHRlR-Sbeli6H5dPicnaNvsJ_8ck_Xibj17KJZP94-z22VhJUAhnaoaj5oZg6apzbZxwgkpQfMaBedbLT0CU4iNcuCANWgcGqNcjWCZ4mNy-Zf9SPFz8Lnf7OKQusNxA1obUXOtgf8AxipNQQ
ContentType Paper
Copyright 2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID 8FE
8FG
ABJCF
ABUWG
AFKRA
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
HCIFZ
L6V
M7S
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
DOI 10.48550/arxiv.2302.10005
DatabaseName ProQuest SciTech Collection
ProQuest Technology Collection
Materials Science & Engineering Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest Central Essentials
ProQuest Central
Technology Collection
ProQuest One Community College
ProQuest Central
SciTech Premium Collection
ProQuest Engineering Collection
Engineering Database
Proquest Central Premium
ProQuest One Academic (New)
ProQuest Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering Collection
DatabaseTitle Publicly Available Content Database
Engineering Database
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Central Essentials
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Engineering Collection
ProQuest One Academic UKI Edition
ProQuest Central Korea
Materials Science & Engineering Collection
ProQuest Central (New)
ProQuest One Academic
ProQuest One Academic (New)
Engineering Collection
DatabaseTitleList Publicly Available Content Database
Database_xml – sequence: 1
  dbid: PIMPY
  name: Publicly Available Content Database
  url: http://search.proquest.com/publiccontent
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Physics
EISSN 2331-8422
Genre Working Paper/Pre-Print
GroupedDBID 8FE
8FG
ABJCF
ABUWG
AFKRA
ALMA_UNASSIGNED_HOLDINGS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
FRJ
HCIFZ
L6V
M7S
M~E
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
ID FETCH-LOGICAL-a522-5c61bef7088f8b98dbc4c4552739f433d75ef206ffb6c2c20bf8cf886c9f2a063
IEDL.DBID M7S
IngestDate Mon Jun 30 09:08:55 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a522-5c61bef7088f8b98dbc4c4552739f433d75ef206ffb6c2c20bf8cf886c9f2a063
Notes SourceType-Working Papers-1
ObjectType-Working Paper/Pre-Print-1
content type line 50
OpenAccessLink https://www.proquest.com/docview/2778493772?pq-origsite=%requestingapplication%
PQID 2778493772
PQPubID 2050157
ParticipantIDs proquest_journals_2778493772
PublicationCentury 2000
PublicationDate 20230220
PublicationDateYYYYMMDD 2023-02-20
PublicationDate_xml – month: 02
  year: 2023
  text: 20230220
  day: 20
PublicationDecade 2020
PublicationPlace Ithaca
PublicationPlace_xml – name: Ithaca
PublicationTitle arXiv.org
PublicationYear 2023
Publisher Cornell University Library, arXiv.org
Publisher_xml – name: Cornell University Library, arXiv.org
SSID ssj0002672553
Score 1.8229346
SecondaryResourceType preprint
Snippet Similarity, or clone, detection has important applications in copyright violation, software theft, code search, and the detection of malicious components....
SourceID proquest
SourceType Aggregation Database
SubjectTerms Applications programs
Artificial neural networks
Correlation analysis
Correlation coefficients
Deep learning
Effectiveness
Empirical analysis
Functionals
Machine learning
Programming languages
Sensitivity analysis
Similarity
Software
Theft
White noise
Title Black Boxes, White Noise: Similarity Detection for Neural Functions
URI https://www.proquest.com/docview/2778493772
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07T8MwELagBYmJt3iUygMjUR3XsR0WpJZWMBBFtEOZKscPKRIkJSlVfz62SWFAYmG0vFh31t1358_3AXCtsOSGhSKgWuqAaKQDHnMaiDBD1AiGiCFebIIlCZ_N4rRpuNUNrXITE32gVqV0PfIeZowTm0sZvlu8B041yr2uNhIa26DtpiSEnro3-e6xYMosYu5_PWb60V09Ua3zlWM_Y8cPQNGvEOzzynj_vyc6AO1ULHR1CLZ0cQR2PZ9T1sdg6BtzcFCudX0DvQoeTMq81rdwkr_ltpy16Bve66VnYhXQQlfo5nSIVzi2mc5fxhMwHY-mw4eg0UsIRORKSknDTBtm44bhWcxVJokkfsJabEi_r1ikDbYeMBmVWGKUGS4N51TGBgsLVU5BqygLfQZghDkiRAqcRdoiDMWVsQuJokhRraU6B52NSebNna_nP_a4-Hv7Euw50Xb_MRx1QGtZfegrsCNXy7yuuqA9GCXpc9e70q7Sx6f05RNLiKpA
linkProvider ProQuest
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1LSwMxEB60VfTkGx9Vc9CbS7dpdpMVRNBaLK2L0B56K9nsBBZ0t3br60f5H02i1YPgzYPHEAghk3nmy3wARylVQvOG9EJU6DH00RORCD3ZSPxQS-4zzRzZBI9jMRxGt3PwNvsLY2GVM5voDHVaKFsjr1POBTO-lNPz8YNnWaPs6-qMQuPjWnTx9dmkbOVZp2Xke0xp-2pwee19sgp4MrCJlwobCWputEuLJBJpophirg9ZpFmzmfIANTX71EmoqKJ-ooXSQoQq0lQah26WnYeqiSJo5JCC_a-SDg25CdCbH2-nrlNYXU5esicLtqYWjuAHPyy-c2PtlX92AKtQvZVjnKzBHObrsOjQqqrcgEtXdiQXxQuWJ8Rx_JG4yEo8Jf3sPjPJusktSAunDmeWExOYE9uFRN6RtvHjTtU2YfAX296CSl7kuA0koMJnTEmaBGjip1Sk2gyUHwRpiKjSHajNJDD61Ohy9H38u79PH8LS9eCmN-p14u4eLFt6evcF3q9BZTp5xH1YUE_TrJwcuNtDYPTHwnoHNVgDrw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Black+Boxes%2C+White+Noise%3A+Similarity+Detection+for+Neural+Functions&rft.jtitle=arXiv.org&rft.au=Farmahinifarahani%2C+Farima&rft.au=Lopes%2C+Cristina+V&rft.date=2023-02-20&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422&rft_id=info:doi/10.48550%2Farxiv.2302.10005