Recognition Method of Component Names in Patent Documents Based on the Algorithm of Word Frequency Difference and Library of Left-segmentation Words

Mechanical patent literature contains a large amount of domain knowledge where component names exist as information units.Being flexible and changeable, the word formatting of component name represents the characteristics of uniqueness, complexity and lesser-known expressions.The challenge of accura...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Ji suan ji ke xue Jg. 50; H. 7; S. 229 - 236
Hauptverfasser: Kong, Jiabin, Lyu, Jianwen, Liu, Jiangnan, Du, Wenxuan
Format: Journal Article
Sprache:Chinesisch
Veröffentlicht: Chongqing Guojia Kexue Jishu Bu 01.07.2023
Editorial office of Computer Science
Schlagworte:
ISSN:1002-137X
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Mechanical patent literature contains a large amount of domain knowledge where component names exist as information units.Being flexible and changeable, the word formatting of component name represents the characteristics of uniqueness, complexity and lesser-known expressions.The challenge of accurate recognition of component names by computers becomes an obstacle to patent knowledge mining.In order to propose an efficient method to recognize component names, the features of word formation in patent text statements are analyzed and extracted.Starting with external words related to component names, characters on the left side of the appended drawing reference signs(ADRS) are identified.Accordingly, candidate names are automatically retrieved from texts, and the set of candidate names are constructed.An algorithm of word frequency difference is proposed to filter redundant characters in the set of candidate names.By building left-segmentation library(LSL) dynamically, redundant characters which are not filtered
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1002-137X
DOI:10.11896/jsjkx.220500068