Recognition Method of Component Names in Patent Documents Based on the Algorithm of Word Frequency Difference and Library of Left-segmentation Words

Mechanical patent literature contains a large amount of domain knowledge where component names exist as information units.Being flexible and changeable, the word formatting of component name represents the characteristics of uniqueness, complexity and lesser-known expressions.The challenge of accura...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Ji suan ji ke xue Ročník 50; číslo 7; s. 229 - 236
Hlavní autoři: Kong, Jiabin, Lyu, Jianwen, Liu, Jiangnan, Du, Wenxuan
Médium: Journal Article
Jazyk:čínština
Vydáno: Chongqing Guojia Kexue Jishu Bu 01.07.2023
Editorial office of Computer Science
Témata:
ISSN:1002-137X
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Mechanical patent literature contains a large amount of domain knowledge where component names exist as information units.Being flexible and changeable, the word formatting of component name represents the characteristics of uniqueness, complexity and lesser-known expressions.The challenge of accurate recognition of component names by computers becomes an obstacle to patent knowledge mining.In order to propose an efficient method to recognize component names, the features of word formation in patent text statements are analyzed and extracted.Starting with external words related to component names, characters on the left side of the appended drawing reference signs(ADRS) are identified.Accordingly, candidate names are automatically retrieved from texts, and the set of candidate names are constructed.An algorithm of word frequency difference is proposed to filter redundant characters in the set of candidate names.By building left-segmentation library(LSL) dynamically, redundant characters which are not filtered
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1002-137X
DOI:10.11896/jsjkx.220500068