View in EDS

Comparative Evaluation of GPT-4o, GPT-OSS-120B and Llama-3.1-8B-Instruct Language Models in a Reproducible CV-to-JSON Extraction Pipeline.

Saved in:

Bibliographic Details
Title:	Comparative Evaluation of GPT-4o, GPT-OSS-120B and Llama-3.1-8B-Instruct Language Models in a Reproducible CV-to-JSON Extraction Pipeline.
Authors:	Nawalny, Marcin, Łępicki, Mateusz, Latkowski, Tomasz, Bujak, Sebastian, Bukowski, Michał, Świderski, Bartosz, Baranik, Grzegorz, Nowak, Bogusz, Zakowicz, Robert, Dobrakowski, Łukasz, Oczeretko, Agnieszka, Sadowski, Piotr, Szlaga, Konrad, Kubica, Bartłomiej, Kurek, Jarosław
Source:	Applied Sciences (2076-3417); Jan2026, Vol. 16 Issue 1, p217, 36p
Subject Terms:	LANGUAGE models, EMPLOYEE recruitment, JOB resumes, ANONYMITY, GENERATIVE pre-trained transformers, DATA protection laws
Abstract:	Recruitment automation increasingly relies on Large Language Models (LLMs) for extracting structured information from unstructured CVs and job postings. However, production data often arrive as heterogeneous, privacy-sensitive PDFs, limiting reproducibility and compliance. This study introduces a deterministic, GDPR-aligned pipeline that converts recruitment documents into structured, anonymized Markdown and subsequently into validated JSON ready for downstream AI processing. The workflow combines the Docling PDF-to-Markdown converter with a two-pass anonymization protocol and evaluates three LLM back-ends—GPT-4o (Azure, frozen proprietary), GPT-OSS-120B and Llama-3.1-8B-Instruct—using identical prompts and schema constraints under near-zero-temperature decoding. Each model's output was assessed across 2280 multilingual CVs using two complementary metrics: reference-based completeness and content similarity. The proprietary GPT-4o achieved perfect schema coverage and served as the reproducibility baseline, while the open-weight models reached 73–79% completeness and 59–72% content similarity depending on section complexity. Llama-3.1-8B-Instruct performed strongly on standardized sections such as contact and legal, whereas GPT-OSS-120B better-handled less frequent narrative fields. The results demonstrate that fully deterministic, auditable document extraction is achievable with both proprietary and open LLMs when guided by strong schema validation and anonymization. The proposed pipeline bridges the gap between document ingestion and reliable, bias-aware data preparation for AI-driven recruitment systems. [ABSTRACT FROM AUTHOR]
	Copyright of Applied Sciences (2076-3417) is the property of MDPI and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database:	Complementary Index

Full Text Finder

Nájsť tento článok vo Web of Science

Description
Abstract:	Recruitment automation increasingly relies on Large Language Models (LLMs) for extracting structured information from unstructured CVs and job postings. However, production data often arrive as heterogeneous, privacy-sensitive PDFs, limiting reproducibility and compliance. This study introduces a deterministic, GDPR-aligned pipeline that converts recruitment documents into structured, anonymized Markdown and subsequently into validated JSON ready for downstream AI processing. The workflow combines the Docling PDF-to-Markdown converter with a two-pass anonymization protocol and evaluates three LLM back-ends—GPT-4o (Azure, frozen proprietary), GPT-OSS-120B and Llama-3.1-8B-Instruct—using identical prompts and schema constraints under near-zero-temperature decoding. Each model's output was assessed across 2280 multilingual CVs using two complementary metrics: reference-based completeness and content similarity. The proprietary GPT-4o achieved perfect schema coverage and served as the reproducibility baseline, while the open-weight models reached 73–79% completeness and 59–72% content similarity depending on section complexity. Llama-3.1-8B-Instruct performed strongly on standardized sections such as contact and legal, whereas GPT-OSS-120B better-handled less frequent narrative fields. The results demonstrate that fully deterministic, auditable document extraction is achievable with both proprietary and open LLMs when guided by strong schema validation and anonymization. The proposed pipeline bridges the gap between document ingestion and reliable, bias-aware data preparation for AI-driven recruitment systems. [ABSTRACT FROM AUTHOR]
ISSN:	20763417
DOI:	10.3390/app16010217