Recognition Method of Component Names in Patent Documents Based on the Algorithm of Word Frequency Difference and Library of Left-segmentation Words

Mechanical patent literature contains a large amount of domain knowledge where component names exist as information units.Being flexible and changeable, the word formatting of component name represents the characteristics of uniqueness, complexity and lesser-known expressions.The challenge of accura...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Ji suan ji ke xue Ročník 50; číslo 7; s. 229 - 236
Hlavní autori: Kong, Jiabin, Lyu, Jianwen, Liu, Jiangnan, Du, Wenxuan
Médium: Journal Article
Jazyk:Chinese
Vydavateľské údaje: Chongqing Guojia Kexue Jishu Bu 01.07.2023
Editorial office of Computer Science
Predmet:
ISSN:1002-137X
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Mechanical patent literature contains a large amount of domain knowledge where component names exist as information units.Being flexible and changeable, the word formatting of component name represents the characteristics of uniqueness, complexity and lesser-known expressions.The challenge of accurate recognition of component names by computers becomes an obstacle to patent knowledge mining.In order to propose an efficient method to recognize component names, the features of word formation in patent text statements are analyzed and extracted.Starting with external words related to component names, characters on the left side of the appended drawing reference signs(ADRS) are identified.Accordingly, candidate names are automatically retrieved from texts, and the set of candidate names are constructed.An algorithm of word frequency difference is proposed to filter redundant characters in the set of candidate names.By building left-segmentation library(LSL) dynamically, redundant characters which are not filtered
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1002-137X
DOI:10.11896/jsjkx.220500068