Memory-based Language Models: An Efficient, Explainable, and Eco-friendly Approach to Large Language Modeling

Uloženo v:
Podrobná bibliografie
Název: Memory-based Language Models: An Efficient, Explainable, and Eco-friendly Approach to Large Language Modeling
Autoři: van den Bosch, Antal, Risco Patón, Ainhoa, Buijse, Teun, Berck, Peter, van Gompel, Maarten
Přispěvatelé: Lund University, Joint Faculties of Humanities and Theology, Units, Lund University Humanities Lab, Lunds universitet, Humanistiska och teologiska fakulteterna, Fakultetsgemensamma verksamheter, Humanistlaboratoriet, Originator
Témata: Natural Sciences, Computer and Information Sciences, Computer Sciences, Naturvetenskap, Data- och informationsvetenskap (Datateknik), Datavetenskap (Datalogi)
Popis: We present memory-based language modeling as an efficient, eco-friendly alternative to deep neural network-based language modeling. It offers log-linearly scalable next-token prediction performance and strong memorization capabilities. Implementing fast approximations of k-nearest neighbor classification, memory-based language modeling leaves a relatively small ecological footprint both in training and in inference mode, as it relies fully on CPUs and attains low token latencies. Its internal workings are simple and fully transparent. We compare our implementation of memory-based language modeling, OLIFANT, with GPT-2 and GPT-Neo on next-token prediction accuracy, estimated emissions and speeds, and offer some deeper analyses of the model.
Přístupová URL adresa: https://arxiv.org/abs/2510.22317
Databáze: SwePub
Popis
Abstrakt:We present memory-based language modeling as an efficient, eco-friendly alternative to deep neural network-based language modeling. It offers log-linearly scalable next-token prediction performance and strong memorization capabilities. Implementing fast approximations of k-nearest neighbor classification, memory-based language modeling leaves a relatively small ecological footprint both in training and in inference mode, as it relies fully on CPUs and attains low token latencies. Its internal workings are simple and fully transparent. We compare our implementation of memory-based language modeling, OLIFANT, with GPT-2 and GPT-Neo on next-token prediction accuracy, estimated emissions and speeds, and offer some deeper analyses of the model.