Evaluating BERT's Encoding of Intrinsic Semantic Features of OCR'd Digital Library Collections

The uncertainty caused by optical character recognition (OCR) noise has been a primary barrier for digital libraries (DL) to promote their curated datasets for research purposes, particularly when the datasets are fed into advanced language models with less transparency. To shed some light on this i...

Full description

Saved in:
Bibliographic Details
Published in:2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL) pp. 308 - 309
Main Authors: Jiang, Ming, Hu, Yuerong, Worthey, Glen, Dubnicek, Ryan C, Underwood, Ted, Downie, J Stephen
Format: Conference Proceeding
Language:English
Published: IEEE 01.09.2021
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Be the first to leave a comment!
You must be logged in first