Trust evaluation of health websites by eliminating phishing websites and using similarity techniques

Summary Every user uses a search engine to find health information from websites. Content‐rich health websites are considered in our research as wrong information in these websites can threaten life. Search engines give a list of URLs related to their search keyword. Generally, the user follows the...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Concurrency and computation Ročník 35; číslo 21
Hlavní autori: Gupta, Sarika, Bansal, Himani
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Hoboken, USA John Wiley & Sons, Inc 25.09.2023
Wiley Subscription Services, Inc
Predmet:
ISSN:1532-0626, 1532-0634
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Summary Every user uses a search engine to find health information from websites. Content‐rich health websites are considered in our research as wrong information in these websites can threaten life. Search engines give a list of URLs related to their search keyword. Generally, the user follows the top websites displayed by the search engine. Newly constructed websites do not have ratings, hit counts, and reviews. The search engine does not display newly constructed websites in their top rank. In such a case, the newly constructed website with the same content as the website displayed at the top of the search engine loses the user's trust. Another problem is; the phishing website URLs are also displayed by the Google Search engine, which appear similar to the genuine websites. To solve the problem and enhance the trust of health websites which is not at the top of the search engine among users, we have proposed an approach that extracts all URLs based on the keyword. It identifies all legitimate URLs using a Machine Learning classifier. Address bar features, Domain name features, HTML, and JavaScript features were identified for the dataset of getting legitimate URLs. Three classifiers (Decision Tree, Random Forest, and Support Vector Machine) were trained and evaluated. Decision Tree has the highest training accuracy, 94.125, testing accuracy, 92.75, and precision score of 96.97. The cross‐validation score of all three models is almost 93. Therefore, Decision tree is used to identify legitimate websites. After getting the list of legitimate URLs, all the content of the legitimate website is extracted. A Semantic Similarity between top‐rank legitimate website content and legitimate websites is found using Natural language processing techniques. Then the websites are ranked based on similarity and the value of the trust is assigned from highly trustable to less trustable. We have compared and correlated our results with the Web of Trust, a reputation tool for trust analysis, and have achieved a positive correlation. Thus, our approach removes phishing websites and enhances the trust in other websites that are not at the top of the search engine.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1532-0626
1532-0634
DOI:10.1002/cpe.7695