A Method of Web Information Automatic Extraction Based on XML
With the increasingly high-speed of the internet as well as the increase in the amount of data it contains, users are finding it more and more difficult to gain useful information from the web. How to extract accurate information from the Web efficiently has become an urgent problem. Web information...
Uložené v:
| Vydané v: | Applied Mechanics and Materials Ročník 20-23; s. 178 - 183 |
|---|---|
| Hlavní autori: | , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Zurich
Trans Tech Publications Ltd
01.01.2010
|
| Predmet: | |
| ISBN: | 0878492879, 9780878492879 |
| ISSN: | 1660-9336, 1662-7482, 1662-7482 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | With the increasingly high-speed of the internet as well as the increase in the amount of data it contains, users are finding it more and more difficult to gain useful information from the web. How to extract accurate information from the Web efficiently has become an urgent problem. Web information extraction technology has emerged to solve this kind of problem. The method of Web information auto-extraction based on XML is designed through standardizing the HTML document using data translation algorism, forming an extracting rule base by learning the XPath expression of samples, and using extraction rule base to realize auto-extraction of pages of same kind. The results show that this approach should lead to a higher recall ratio and precision ratio, and the result should have a self-description, making it convenient for founding data extraction system of each domain. |
|---|---|
| AbstractList | With the increasingly high-speed of the internet as well as the increase in the amount of data it contains, users are finding it more and more difficult to gain useful information from the web. How to extract accurate information from the Web efficiently has become an urgent problem. Web information extraction technology has emerged to solve this kind of problem. The method of Web information auto-extraction based on XML is designed through standardizing the HTML document using data translation algorism, forming an extracting rule base by learning the XPath expression of samples, and using extraction rule base to realize auto-extraction of pages of same kind. The results show that this approach should lead to a higher recall ratio and precision ratio, and the result should have a self-description, making it convenient for founding data extraction system of each domain. |
| Author | Liu, Yan Liu Zhang, Na Song, Jie Gu, Jun Hua |
| Author_xml | – givenname: Na surname: Zhang fullname: Zhang, Na email: zhang_na00@163.com organization: Hebei University of Technology : School of Computer Science and Engineering – givenname: Jun Hua surname: Gu fullname: Gu, Jun Hua email: jhgu@hebut.edu.cn organization: Hebei University of Technology : School of Computer Science and Engineering – givenname: Jie surname: Song fullname: Song, Jie email: songjie@scse.hebut.edu.cn organization: Hebei University of Technology : School of Computer Science and Engineering – givenname: Yan Liu surname: Liu fullname: Liu, Yan Liu email: hbsdliuyanliu@163.com organization: Hebei University of Technology : School of Computer Science and Engineering |
| BookMark | eNqNkM1LwzAchoNOcJv-DwUvXtrlq2l6EOzG_IAVL4reQpsmrMMlM8mY_vdmm6BHT0l-v5fnJc8IDIw1CoBrBDMKMZ_sdrvMy16Z0OteZkaFSVXXGYYpJhkq-AkYIsZwWlCOT8EI8oLTEvOiHBwWMC0JYedg5P0KQkYR5UNwUyW1CkvbJVYnr6pNHo22bt2E3pqk2ga7v8pk_hlcIw_DaeNVTJvkrV5cgDPdvHt1-XOOwcvd_Hn2kC6e7h9n1SKVuCxDSjqYK00pJS1lGuO8ZbQlmDeYdpRBBFvJCkS0QpJ3qlWQ5F1D45syUuSakTG4OnI3zn5slQ9iZbfOxEqBIpXDIs9JTN0eU9JZ753SYuP6deO-BIJib1BEg-LXoIgGRTQoMBSYiGgwIqZHRPyu8UHJ5Z-m_0K-AY8VgRY |
| Cites_doi | 10.1109/icdsc.2001.918966 |
| ContentType | Journal Article |
| Copyright | 2010 Trans Tech Publications Ltd Copyright Trans Tech Publications Ltd. Jan 2010 |
| Copyright_xml | – notice: 2010 Trans Tech Publications Ltd – notice: Copyright Trans Tech Publications Ltd. Jan 2010 |
| DBID | AAYXX CITATION 7SR 7TB 8BQ 8FD 8FE 8FG ABJCF ABUWG AFKRA BENPR BFMQW BGLVJ CCPQU D1I DWQXO FR3 HCIFZ JG9 KB. KR7 L6V M7S PDBOC PHGZM PHGZT PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS |
| DOI | 10.4028/www.scientific.net/AMM.20-23.178 |
| DatabaseName | CrossRef Engineered Materials Abstracts Mechanical & Transportation Engineering Abstracts METADEX Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Continental Europe Database Technology Collection ProQuest One Community College ProQuest Materials Science Collection ProQuest Central Engineering Research Database SciTech Premium Collection Materials Research Database ProQuest Materials Science Database (NC LIVE) Civil Engineering Abstracts ProQuest Engineering Collection Engineering Database Materials Science Collection ProQuest Central Premium ProQuest One Academic (New) ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection |
| DatabaseTitle | CrossRef Materials Research Database Technology Collection Technology Research Database ProQuest One Academic Middle East (New) Mechanical & Transportation Engineering Abstracts Materials Science Collection ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences Engineered Materials Abstracts ProQuest Engineering Collection ProQuest Central Korea Materials Science Database ProQuest Central (New) Engineering Collection ProQuest Materials Science Collection Civil Engineering Abstracts Engineering Database ProQuest One Academic Eastern Edition ProQuest Technology Collection Continental Europe Database ProQuest SciTech Collection METADEX ProQuest One Academic UKI Edition Materials Science & Engineering Collection Engineering Research Database ProQuest One Academic ProQuest One Academic (New) |
| DatabaseTitleList | CrossRef Materials Research Database |
| Database_xml | – sequence: 1 dbid: KB. name: ProQuest Materials Science Database (NC LIVE) url: http://search.proquest.com/materialsscijournals sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 1662-7482 |
| EndPage | 183 |
| ExternalDocumentID | 3105528841 10_4028_www_scientific_net_AMM_20_23_178 |
| GroupedDBID | 4.4 6J9 8FE 8FG ABHXD ABJCF ABJNI ABUWG ACGFO ACGFS ACIWK AFFHD AFKRA ALMA_UNASSIGNED_HOLDINGS BENPR BFMQW BGLVJ BPHCQ CCPQU CZ9 D1I DB1 DKFMR EBS EJD HCIFZ KB. KC. L6V M7S P2P PDBOC PHGZM PHGZT PQGLB PQQKQ PROAC PTHSS RNS RTP .DC AAYXX ABDNZ ACYGS CITATION 7SR 7TB 8BQ 8FD DWQXO FR3 JG9 KR7 PKEHL PQEST PQUKI PRINS |
| ID | FETCH-LOGICAL-c299t-3d05ef4443b46f225b64b328a24d46010bc6713fe1c8debe035da43fe46375f63 |
| IEDL.DBID | M7S |
| ISBN | 0878492879 9780878492879 |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000277153300031&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1660-9336 1662-7482 |
| IngestDate | Fri Jul 25 12:02:31 EDT 2025 Sat Nov 29 01:44:07 EST 2025 Fri Dec 05 20:30:20 EST 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | XSL Information Extraction XPath Learning XML |
| Language | English |
| License | https://www.scientific.net/PolicyAndEthics/PublishingPolicies https://www.scientific.net/license/TDM_Licenser.pdf |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c299t-3d05ef4443b46f225b64b328a24d46010bc6713fe1c8debe035da43fe46375f63 |
| Notes | Selected, peer reviewed papers from the 2010 International Conference on Information Technology for Manufacturing Systems (ITMS 2010), Macao, China, Jan. 30-31, 2010 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| PQID | 1443807553 |
| PQPubID | 2029177 |
| PageCount | 6 |
| ParticipantIDs | proquest_journals_1443807553 crossref_primary_10_4028_www_scientific_net_AMM_20_23_178 transtech_journals_10_4028_www_scientific_net_AMM_20_23_178 |
| PublicationCentury | 2000 |
| PublicationDate | 2010-01-01 |
| PublicationDateYYYYMMDD | 2010-01-01 |
| PublicationDate_xml | – month: 01 year: 2010 text: 2010-01-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | Zurich |
| PublicationPlace_xml | – name: Zurich |
| PublicationTitle | Applied Mechanics and Materials |
| PublicationYear | 2010 |
| Publisher | Trans Tech Publications Ltd |
| Publisher_xml | – name: Trans Tech Publications Ltd |
| References | 2898803 2898804 2898809 2898805 2898806 2898807 2898808 |
| References_xml | – ident: 2898807 – ident: 2898808 – ident: 2898806 doi: 10.1109/icdsc.2001.918966 – ident: 2898809 – ident: 2898803 – ident: 2898804 – ident: 2898805 |
| SSID | ssj0064148 ssj0000760444 |
| Score | 1.766594 |
| Snippet | With the increasingly high-speed of the internet as well as the increase in the amount of data it contains, users are finding it more and more difficult to... |
| SourceID | proquest crossref transtech |
| SourceType | Aggregation Database Index Database Publisher |
| StartPage | 178 |
| Title | A Method of Web Information Automatic Extraction Based on XML |
| URI | https://www.scientific.net/AMM.20-23.178 https://www.proquest.com/docview/1443807553 |
| Volume | 20-23 |
| WOSCitedRecordID | wos000277153300031&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVPQU databaseName: Continental Europe Database isbn: 0878492879 customDbUrl: eissn: 1662-7482 dateEnd: 20200630 omitProxy: false ssIdentifier: ssj0064148 issn: 1660-9336 databaseCode: BFMQW dateStart: 20040901 isFulltext: true titleUrlDefault: https://search.proquest.com/conteurope providerName: ProQuest – providerCode: PRVPQU databaseName: Engineering Database isbn: 0878492879 customDbUrl: eissn: 1662-7482 dateEnd: 20200630 omitProxy: false ssIdentifier: ssj0064148 issn: 1660-9336 databaseCode: M7S dateStart: 20040901 isFulltext: true titleUrlDefault: http://search.proquest.com providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central isbn: 0878492879 customDbUrl: eissn: 1662-7482 dateEnd: 20200630 omitProxy: false ssIdentifier: ssj0064148 issn: 1660-9336 databaseCode: BENPR dateStart: 20040901 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Materials Science Database (NC LIVE) isbn: 0878492879 customDbUrl: eissn: 1662-7482 dateEnd: 20200630 omitProxy: false ssIdentifier: ssj0064148 issn: 1660-9336 databaseCode: KB. dateStart: 20040901 isFulltext: true titleUrlDefault: http://search.proquest.com/materialsscijournals providerName: ProQuest |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8NAEB60FR8H32K1lj148JKaZDebBBFppUXQlOIDe1uyeYCXpLap-POd3aSPkwgel4QlmdmdmW929huAyxhDYsr9yJBJiADFxS0lTR4aqS-pLb0IvUysm024g4E3GvnDKuE2rcoq5zZRG-o4j1SO_BoDf8WN7jj0bvxpqK5R6nS1aqGxDnXFkmDp0r2XRY5FnTppOrTSMnNm6W5aFuemgUCeaz5I12M-wga_YuNZjDfhCi0KwitPf1R5P1GV7-icQScIEFwaNm1bqkPbqk9bBqpbhfI3iox1xWf19_77t_uwW0WrpFMurwNYS7JD2FnhMDyC2w4JdBdqkqfkPZGkuuCkFE46syLXnLCk911MyjsUpIuOE9_OyCh4Ooa3fu_1_sGoejIYETquwqCx6SQpCpVKxlM0BpIz1KoX2ixmCtzJiCPuTRMr8mJcICZ14pDhmHHqOimnJ1DL8iw5BeL4iUsjhvbVkgzxsGf6IVeEgZK7tidpA_y5VMW4pN4QCFmURgQKTiw1IlAjAjUibFPYVKBGGtCcy1VUm3IqlkJtwM1CNSvP_zj52e-Tn8N2WWKg8jRNqBWTWXIBG9FX8TGdtKDe7Q2Gzy1Yf-y2W3q9_gDJFer8 |
| linkProvider | ProQuest |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1NT-MwEB2xBS1wAHYBUWAXH3YlLoEkdpxYCKGyWwSiqdAKRG8mThyJSwtt-PpT_EbGTtL2hLhw2KOVyHIyzzPz_PEG4FeGKTHlInWUTpCghDillMsTJxeK-ipKMcpktthE2O1GvZ64mIHX-i6MOVZZ-0TrqLNBatbI9zHxN9roQUCP7u4dUzXK7K7WJTRKWJzrlyekbKPDs79o39--f9K-_HPqVFUFnBRdb-HQzA10zrA7xXiOcFac4biixGcZM_REpRyZW669NMrwE10aZAnDNuM0DHJOsd8vMMso40EDZo_b3Yt_41Uds89lBdjKWMCZZ-t3eZy7jqCUWwXKMGICiYqo9H_G7a-wiz4MCV1kf0N5I9IcGLKrFK04Rjrr-HTPMzXhpqPoJDWeL0yEM_KvU1HyZPl_-78rsFTl46RVTqBvMKP732FxSqVxFQ5bJLZ1tskgJ9dakeoKl4E0aT0UA6t6S9rPxbC8JUKOMTXAt_ukF3fW4OpTxr8Ojf6grzeABEKHNGUYQTzFkPFHrki4kURUPPQjRZsgaivKu1JcRCIpMwiQaCg5QYBEBEhEgPRd6VOJCGjCdm1HWbmdkZwYsQkHYyhMPf9g55vvd74D86eXcUd2zrrnW7BQHqgwq1Lb0CiGD_oHzKWPxe1o-LOaHwRuPhsybz7-RdA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Method+of+Web+Information+Automatic+Extraction+Based+on+XML&rft.jtitle=Applied+mechanics+and+materials&rft.au=Zhang%2C+Na&rft.au=Gu%2C+Jun+Hua&rft.au=Song%2C+Jie&rft.au=Liu%2C+Yan+Liu&rft.date=2010-01-01&rft.pub=Trans+Tech+Publications+Ltd&rft.issn=1660-9336&rft.eissn=1662-7482&rft.volume=20-23&rft.spage=178&rft.epage=183&rft_id=info:doi/10.4028%2Fwww.scientific.net%2FAMM.20-23.178&rft.externalDocID=10_4028_www_scientific_net_AMM_20_23_178 |
| thumbnail_s | http://cvtisr.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fwww.scientific.net%2FImage%2FTitleCover%2F893%3Fwidth%3D600 |

