Evaluating BERT's Encoding of Intrinsic Semantic Features of OCR'd Digital Library Collections
The uncertainty caused by optical character recognition (OCR) noise has been a primary barrier for digital libraries (DL) to promote their curated datasets for research purposes, particularly when the datasets are fed into advanced language models with less transparency. To shed some light on this i...
Saved in:
| Published in: | 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL) pp. 308 - 309 |
|---|---|
| Main Authors: | , , , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
01.09.2021
|
| Subjects: | |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Be the first to leave a comment!