Linguistic structure from a bottleneck on sequential information processing.
Uložené v:
| Názov: | Linguistic structure from a bottleneck on sequential information processing. |
|---|---|
| Autori: | Futrell R; University of California, Irvine, Irvine, CA, USA. rfutrell@uci.edu., Hahn M; Saarland University, Saarbrücken, Germany. |
| Zdroj: | Nature human behaviour [Nat Hum Behav] 2025 Nov 24. Date of Electronic Publication: 2025 Nov 24. |
| Publication Model: | Ahead of Print |
| Spôsob vydávania: | Journal Article |
| Jazyk: | English |
| Informácie o časopise: | Publisher: Springer Nature Publishing Country of Publication: England NLM ID: 101697750 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 2397-3374 (Electronic) Linking ISSN: 23973374 NLM ISO Abbreviation: Nat Hum Behav Subsets: MEDLINE |
| Imprint Name(s): | Original Publication: [London] : Springer Nature Publishing, [2017]- |
| Abstrakt: | Human language has a distinct systematic structure, where utterances break into individually meaningful words that are combined to form phrases. Here we show that natural-language-like systematicity arises in codes that are constrained by a statistical measure of complexity called predictive information, also known as excess entropy. Predictive information is the mutual information between the past and future of a stochastic process. In simulations, we find that codes that minimize predictive information break messages into groups of approximately independent features that are expressed systematically and locally, corresponding to words and phrases. Next, drawing on cross-linguistic text corpora, we find that actual human languages are structured in a way that yields low predictive information compared with baselines at the levels of phonology, morphology, syntax and lexical semantics. Our results establish a link between the statistical and algebraic structure of language and reinforce the idea that these structures are shaped by communication under general cognitive constraints. (© 2025. The Author(s).) |
| Competing Interests: | Competing interests: The authors declare no competing interests. |
| References: | Frege, G. Gedankengefüge. Beitr. Philos. Deutsch. Ideal. 3, 36–51 (1923). Jespersen, O. Language: Its Nature, Development, and Origin (W. W. Norton and Company, 1922). Wray, A. Protolanguage as a holistic system for social interaction. Lang. Commun. 18, 47–67 (1998). (PMID: 10.1016/S0271-5309(97)00033-5) Huffman, D. A. A method for the construction of minimum-redundancy codes. Proc. IRE 40, 1098–1101 (1952). (PMID: 10.1109/JRPROC.1952.273898) Futrell, R. & Hahn, M. Information theory as a bridge between language function and language form. Front. Commun. 7, 657725 (2022). (PMID: 10.3389/fcomm.2022.657725) Goldman-Eisler, F. Speech production and language statistics. Nature 180, 1497–1497 (1957). (PMID: 1349358610.1038/1801497a0) Ferreira, F. & Swets, B. How incremental is language production? Evidence from the production of utterances requiring the computation of arithmetic sums. J. Mem. Lang. 46, 57–84 (2002). (PMID: 10.1006/jmla.2001.2797) Bell, A., Brenier, J. M., Gregory, M., Girand, C. & Jurafsky, D. Predictability effects on durations of content and function words in conversational English. J. Mem. Lang. 60, 92–111 (2009). (PMID: 10.1016/j.jml.2008.06.003) Smith, N. J. & Levy, R. P. The effect of word predictability on reading time is logarithmic. Cognition 128, 302–319 (2013). (PMID: 23747651370900110.1016/j.cognition.2013.02.013) Heilbron, M., Armeni, K., Schoffelen, J.-M., Hagoort, P. & De Lange, F. P. A hierarchy of linguistic predictions during natural language comprehension. Proc. Natl Acad. Sci. USA 119, e2201968119 (2022). (PMID: 35921434937174510.1073/pnas.2201968119) Ryskin, R. & Nieuwland, M. S. Prediction during language comprehension: what is next? Trends Cogn. Sci. 27, 1032–1052 (2023). (PMID: 377044561161435010.1016/j.tics.2023.08.003) Miller, G. A. & Chomsky, N. Finitary models of language users. Handb. Math. Psychol. 2, 419–491 (1963). Bratman, J., Shvartsman, M., Lewis, R. L. & Singh, S. A new approach to exploring language emergence as boundedly optimal control in the face of environmental and cognitive constraints. In Proc. 10th International Conference on Cognitive Modeling (eds Salvucci, D. D. & Gunzelmann, G.) 7–12 (Drexel University, 2010). Christiansen, M. H. & Chater, N. The now-or-never bottleneck: a fundamental constraint on language. Behav. Brain Sci. 39, e62 (2016). (PMID: 2586961810.1017/S0140525X1500031X) Futrell, R., Gibson, E. & Levy, R. P. Lossy-context surprisal: an information-theoretic model of memory effects in sentence processing. Cogn. Sci. 44, e12814 (2020). (PMID: 32100918706500510.1111/cogs.12814) Ferdinand, V., Yu, A. & Marzen, S. Humans are resource-rational predictors in a sequence learning task. Preprint at bioRxiv https://doi.org/10.1101/2024.10.21.619537 (2024). Grassberger, P. Toward a quantitative theory of self-generated complexity. Int. J. Theor. Phys. 25, 907–938 (1986). (PMID: 10.1007/BF00668821) Bialek, W., Nemenman, I. & Tishby, N. Predictability, complexity, and learning. Neural Comput. 13, 2409–2463 (2001). (PMID: 1167484510.1162/089976601753195969) Crutchfield, J. P. & Feldman, D. P. Regularities unseen, randomness observed: levels of entropy convergence. Chaos 13, 25–54 (2003). (PMID: 1267540810.1063/1.1530990) Dębowski, Ł. Information Theory Meets Power Laws: Stochastic Processes and Language Models (John Wiley & Sons, 2020). Montague, R. Universal grammar. Theoria 36, 373–398 (1970). (PMID: 10.1111/j.1755-2567.1970.tb00434.x) Janssen, T. M. V. & Partee, B. H. Compositionality. In Handbook of Logic and Language (eds van Benthem, J. & ter Meulen, A. G. B.) 417–473 (Elsevier, 1997). Kirby, S. Syntax out of learning: the cultural evolution of structured communication in a population of induction algorithms. In Advances in Artificial Life (eds Floreano, D., Nicoud, J.-D. & Mondada, F.) 694–703 (Springer, 1999). Smith, K., Brighton, H. & Kirby, S. Complex systems in language evolution: the cultural emergence of compositional structure. Adv. Complex Syst. 6, 537–558 (2003). (PMID: 10.1142/S0219525903001055) Franke, M. Creative compositionality from reinforcement learning in signaling games. In Evolution of Language: Proc. 10th International Conference (EVOLANG10) (eds Cartmill, E. A. et al.) 82–89 (World Scientific, 2014). Kirby, S., Tamariz, M., Cornish, H. & Smith, K. Compression and communication in the cultural evolution of linguistic structure. Cognition 141, 87–102 (2015). (PMID: 2596684010.1016/j.cognition.2015.03.016) Zadrozny, W. From compositional to systematic semantics. Ling. Philos. 17, 329–342 (1994). (PMID: 10.1007/BF00985572) Batali, J. Computational simulations of the emergence of grammar. In Approaches to the Evolution of Language: Social and Cognitive Bases (eds Hurford, J. R., Studdert-Kennedy, M. & Knight, C.) 405–426 (Cambridge Univ. Press, 1998). Ke, J. & Holland, J. H. Language origin from an emergentist perspective. Appl. Ling. 27, 691–716 (2006). (PMID: 10.1093/applin/aml033) Tria, F., Galantucci, B. & Loreto, V. Naming a structured world: a cultural route to duality of patterning. PLoS ONE 7, 1–8 (2012). (PMID: 10.1371/journal.pone.0037744) Lazaridou, A., Peysakhovich, A. & Baroni, M. Multi-agent cooperation and the emergence of (natural) language. In 5th International Conference on Learning Representations (2017). Mordatch, I. & Abbeel, P. Emergence of grounded compositional language in multi-agent populations. In The Thirty-Second AAAI Conference on Artificial Intelligence (eds Weinberger, K. Q. & McIlraith, S. A.) 1495–1502 (AAAI Press, 2018). Steinert-Threlkeld, S. Toward the emergence of nontrivial compositionality. Philos. Sci. 87, 897–909 (2020). (PMID: 10.1086/710628) Kuciński, Ł., Korbak, T., Kołodziej, P. & Miłoś, P. Catalytic role of noise and necessity of inductive biases in the emergence of compositional communication. Adv. Neural Inf. Process. Syst. 34, 23075–23088 (2021). Beguš, G., Lu, T. and Wang, Z. Basic syntax from speech: spontaneous concatenation in unsupervised deep neural networks. In Proc. Annual Meeting of the Cognitive Science Society Vol. 46 (2024); https://escholarship.org/uc/item/1ks8q4q9. Nowak, M. A., Plotkin, J. B. & Jansen, V. A. A. The evolution of syntactic communication. Nature 404, 495–498 (2000). (PMID: 1076191710.1038/35006635) Barrett, J. A. Dynamic partitioning and the conventionality of kinds. Philos. Sci. 74, 527–546 (2007). (PMID: 10.1086/524714) Franke, M. The evolution of compositionality in signaling games. J. Logic Lang. Inf. 25, 355–377 (2016). (PMID: 10.1007/s10849-015-9232-5) Barrett, J. A., Cochran, C. & Skyrms, B. On the evolution of compositional language. Philos. Sci. 87, 910–920 (2020). (PMID: 10.1086/710367) Culbertson, J., Schouwstra, M. & Kirby, S. From the world to word order: deriving biases in noun phrase order from statistical properties of the world. Language 96, 696–717 (2020). (PMID: 10.1353/lan.2020.0045) Cover, T. M. & Thomas, J. A. Elements of Information Theory (John Wiley & Sons, 2006). de Saussure, F. Cours de linguistique générale (Payot, 1916). Rosch, E. Principles of categorization. In Cognition and Categorization (eds Rosch, E. & Lloyd, B. B.) 27–48 (Lawrence Elbaum Associates, 1978). McCarthy, J. J. A prosodic theory of nonconcatenative morphology. Ling. Inquiry 12, 373–418 (1981). Chomsky, N. Syntactic Structures (Walter de Gruyter, 1957). Mansfield, J. & Kemp, C. The emergence of grammatical structure from inter-predictability. In A Festschrift for Jane Simpson (eds O’Shannessy, C. & Gray, J.) 100–120 (ANU Press, 2025). Chomsky, N. & Halle, M. The Sound Pattern of English (Harper and Row, 1968). Rathi, N., Hahn, M. & Futrell, R. An information-theoretic characterization of morphological fusion. In Proc. 2021 Conference on Empirical Methods in Natural Language Processing (eds Moens, M.-F. et al.) 10115–10120 (Association for Computational Linguistics, 2021); https://doi.org/10.18653/v1/2021.emnlp-main.793. Dryer, M. S. On the order of demonstrative, numeral, adjective, and noun. Language 94, 798–833 (2018). (PMID: 10.1353/lan.2018.0054) Futrell, R. Information-theoretic locality properties of natural language. In Proc. First Workshop on Quantitative Syntax (eds Chen, X. & Ferrer-i-Cancho, R.) 2–15 (Association for Computational Linguistics, 2019); https://www.aclweb.org/anthology/W19-7902. Corbett, G. G. Number (Cambridge Univ. Press, 2000). Garner, W. R. The Processing of Information and Structure (Lawrence Earlbaum Associates, 1978). Lynott, D., Connell, L., Brysbaert, M., Brand, J. & Carney, J. The Lancaster Sensorimotor Norms: multidimensional measures of perceptual and action strength for 40,000 English words. Behav. Res. Methods 52, 1271–1291 (2020). (PMID: 3183287910.3758/s13428-019-01316-z) Ferrer-i-Cancho, R. & Solé, R. V. Least effort and the origins of scaling in human language. Proc. Natl Acad. Sci. USA 100, 788 (2003). (PMID: 1254082610.1073/pnas.0335980100) Jaeger, T. F. & Tily, H. J. On language ‘utility’: processing complexity and communicative efficiency. Wiley Interdisc. Rev. Cogn. Sci. 2, 323–335 (2011). (PMID: 10.1002/wcs.126) Kemp, C. & Regier, T. Kinship categories across languages reflect general communicative principles. Science 336, 1049–1054 (2012). (PMID: 2262865810.1126/science.1218811) Zaslavsky, N., Kemp, C., Regier, T. & Tishby, N. Efficient compression in color naming and its evolution. Proc. Natl Acad. Sci. USA 115, 7937–7942 (2018). (PMID: 30021851607771610.1073/pnas.1800521115) Gibson, E. et al. How efficiency shapes human language. Trends Cogn. Sci. 23, 389–407 (2019). (PMID: 3100662610.1016/j.tics.2019.02.003) Levshina, N. Communicative Efficiency (Cambridge Univ. Press, 2022). Mitchell, J. & Bowers, J. Priorless recurrent networks learn curiously. In Proc. 28th International Conference on Computational Linguistics (eds Scott, D., Bel, N. & Zong, C.) 5147–5158 (International Committee on Computational Linguistics, 2020); https://doi.org/10.18653/v1/2020.coling-main.451. Chomsky, N., Roberts, I. & Watumull, J. Noam Chomsky: the false promise of ChatGPT. The New York Times https://www.nytimes.com/2023/03/08/opinion/noam-chomsky-chatgpt-ai.html (2023). Moro, A., Greco, M. & Cappa, S. F. Large languages, impossible languages and human brains. Cortex 167, 82–85 (2023). (PMID: 3754095310.1016/j.cortex.2023.07.003) Kallini, J., Papadimitriou, I., Futrell, R., Mahowald, K. & Potts, C. Mission: impossible language models. In Proc. 62nd Annual Meeting of the Association for Computational Linguistics (eds Ku, L.-W., Martins, A. & Srikumar, V.) 14691–14714 (Association for Computational Linguistics, 2024); https://doi.org/10.18653/v1/2024.acl-long.787. Someya, T. et al. Information locality as an inductive bias for neural language models. In Proc. 63rd Annual Meeting of the Association for Computational Linguistics (eds Che, W. et al.) 27995–28013 (Association for Computational Linguistics, 2025); https://doi.org/10.18653/v1/2025.acl-long.1357. Ans, B., Hérault, J. & Jutten, C. Architectures neuromimétiques adaptatives: détection de primitives. Cognitiva 85, 593–597 (1985). Bell, A. J. & Sejnowski, T. J. An information-maximization approach to blind separation and blind deconvolution. Neural Comput. 7, 1129–1159 (1995). (PMID: 758489310.1162/neco.1995.7.6.1129) Bengio, Y., Courville, A. & Vincent, P. Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2013). (PMID: 2378733810.1109/TPAMI.2013.50) Isola, P., Zoran, D., Krishnan, D. & Adelson, E. H. Crisp boundary detection using pointwise mutual information. In Computer Vision–ECCV 2014: 13th European Conference, Proceedings, Part III 13 (eds Fleet, D. et al.) 799–814 (Springer, 2014). Hyvärinen, A. & Pajunen, P. Nonlinear independent component analysis: existence and uniqueness results. Neural Netw. 12, 429–439 (1999). (PMID: 1266268610.1016/S0893-6080(98)00140-3) Linsker, R. Self-organization in a perceptual network. Computer 21, 105–117 (1988). (PMID: 10.1109/2.36) Stone, J. V. Principles of Neural Information Theory: Computational Neuroscience and Metabolic Efficiency (Sebtel Press, 2018). Bialek, W., De Ruyter Van Steveninck, R. R. & Tishby, N. Efficient representation as a design principle for neural coding and computation. In 2006 IEEE International Symposium on Information Theory 659–663 (IEEE, 2006). Palmer, S. E., Marre, O., Berry, M. J. & Bialek, W. Predictive information in a sensory population. Proc. Natl Acad. Sci. USA 112, 6908–6913 (2015). (PMID: 26038544446044910.1073/pnas.1506855112) Barlow, H. B. Unsupervised learning. Neural Comput. 1, 295–311 (1989). (PMID: 10.1162/neco.1989.1.3.295) Schrimpf, M. et al. The neural architecture of language: integrative modeling converges on predictive processing. Proc. Natl Acad. Sci. USA 118, e2105646118 (2021). (PMID: 34737231869405210.1073/pnas.2105646118) Hahn, M., Degen, J. & Futrell, R. Modeling word and morpheme order in natural language as an efficient tradeoff of memory and surprisal. Psychol. Rev. 128, 726–756 (2021). (PMID: 3379325910.1037/rev0000269) Still, S. Information bottleneck approach to predictive inference. Entropy 16, 968–989 (2014). (PMID: 10.3390/e16020968) Dębowski, Ł. On the vocabulary of grammar-based codes and the logical consistency of texts. IEEE Trans. Inf. Theory 57, 4589–4599 (2011). (PMID: 10.1109/TIT.2011.2145170) Dębowski, Ł. Excess entropy in natural language: present state and perspectives. Chaos 21, 037105 (2011). (PMID: 2197466810.1063/1.3630929) Dębowski, Ł. The relaxed Hilberg conjecture: a review and new experimental support. J. Quant. Ling. 22, 311–337 (2015). (PMID: 10.1080/09296174.2015.1106268) Hahn, M., Futrell, R., Levy, R. & Gibson, E. A resource-rational model of human processing of recursive linguistic structure. Proc. Natl Acad. Sci. USA 119, e2122602119 (2022). (PMID: 36260742961813010.1073/pnas.2122602119) Dell, G. S. & Gordon, J. K. Neighbors in the lexicon: friends or foes? In Phonetics and Phonology in Language Comprehension and Production: Differences and Similarities (eds Schiller, N. O. & Meyer, A.) 9–38 (Mouton De Gruyter, 2003). Hawkins, R. D. et al. From partners to populations: a hierarchical Bayesian account of coordination and convention. Psychol. Rev. 130, 977 (2023). (PMID: 3542085010.1037/rev0000348) Newport, E. L. Maturational constraints on language learning. Cogn. Sci. 14, 11–28 (1990). (PMID: 10.1207/s15516709cog1401_2) Cochran, B. P., McDonald, J. L. & Parault, S. J. Too smart for their own good: the disadvantage of a superior processing capacity for adult language learners. J. Mem. Lang. 41, 30–58 (1999). (PMID: 10.1006/jmla.1999.2633) Jackendoff, R. Linguistics in cognitive science: the state of the art. Ling. Rev. 24, 347–402 (2007). Goldberg, A. E. Constructions work. Cogn. Ling. 20, 201–224 (2009). (PMID: 10.1515/COGL.2009.013) Behaghel, O. Deutsche Syntax: Eine geschichtliche Darstellung. Band IV: Wortstellung (Carl Winter, 1932). Bybee, J. L. Morphology: A Study of the Relation between Meaning and Form (John Benjamins, 1985). (PMID: 10.1075/tsl.9) Givón, T. Isomorphism in the grammatical code: cognitive and biological considerations. Stud. Lang. 15, 85–114 (1991). (PMID: 10.1075/sl.15.1.04giv) Hawkins, J. A. Efficiency and Complexity in Grammars (Oxford Univ. Press, 2004). (PMID: 10.1093/acprof:oso/9780199252695.001.0001) Liu, H., Xu, C. & Liang, J. Dependency distance: a new perspective on syntactic patterns in natural languages. Phys. Life Rev. 21, 171–193 (2017). (PMID: 2862458910.1016/j.plrev.2017.03.002) Temperley, D. & Gildea, D. Minimizing syntactic dependency lengths: typological/cognitive universal? Annu. Rev. Ling. 4, 1–15 (2018). Futrell, R., Levy, R. P. & Gibson, E. Dependency locality as an explanatory principle for word order. Language 96, 371–413 (2020). (PMID: 10.1353/lan.2020.0024) Mansfield, J. The word as a unit of internal predictability. Linguistics 59, 1427–1472 (2021). (PMID: 10.1515/ling-2020-0118) Chafe, W. L. Givenness, contrastiveness, definiteness, subjects, topics and points of view. In Subject and Topic (ed. Li, C. N.) 27–55 (Academic Press, 1976). Bock, J. K. Toward a cognitive psychology of syntax: Information processing contributions to sentence formulation. Psychol. Rev. 89, 1–47 (1982). (PMID: 10.1037/0033-295X.89.1.1) Bresnan, J., Cueni, A., Nikitina, T. & Baayen, H. Predicting the dative alternation. In Cognitive Foundations of Interpretation (eds Bouma, G., Krämer, I. & Zwarts, J.) 69–94 (Royal Netherlands Academy of Science, 2007). Chen, S. F. & Goodman, J. An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13, 359–393 (1999). (PMID: 10.1006/csla.1999.0128) Nivre, J. et al. Universal Dependencies 1.0 (Universal Dependencies Consortium, 2015); http://hdl.handle.net/11234/1-1464. Thackston, W. M. An Introduction to Koranic and Classical Arabic: An Elementary Grammar of the Language (IBEX Publishers, 1994). Zeldes, A. The GUM Corpus: creating multilayer resources in the classroom. Lang. Resour. Eval. 51, 581–612 (2017). (PMID: 10.1007/s10579-016-9343-x) Behzad, S. & Zeldes, A. A cross-genre ensemble approach to robust Reddit part of speech tagging. In Proc. 12th Web as Corpus Workshop, (eds Barbaresi, A. et al.) 50–56 (European Language Resources Association, 2020); https://aclanthology.org/2020.wac-1.7. Silveira, N. et al. A gold standard dependency corpus for English. In Proc. Ninth International Conference on Language Resources and Evaluation (eds Calzolari, N. et al.) 2897–2904 (European Language Resources Association, 2014). Graff, P. Communicative Efficiency in the Lexicon. PhD thesis, Massachusetts Institute of Technology (2012). Vincze, V. et al. Hungarian dependency treebank. In Proc. Seventh International Conference on Language Resources and Evaluation (eds Calzolari, N. et al.) (European Language Resources Association, 2010); http://www.lrec-conf.org/proceedings/lrec2010/pdf/465_Paper.pdf. Buck, C., Heafield, K. & van Ooyen, B. N-gram counts and language models from the Common Crawl. In Proc. Ninth International Conference on Language Resources and Evaluation (eds Calzolari, N. et al.) 3579–3584 (European Language Resources Association, 2014); http://www.lrec-conf.org/proceedings/lrec2014/pdf/1097_Paper.pdf. |
| Entry Date(s): | Date Created: 20251125 Latest Revision: 20251125 |
| Update Code: | 20251125 |
| DOI: | 10.1038/s41562-025-02336-w |
| PMID: | 41286011 |
| Databáza: | MEDLINE |
| Abstrakt: | Human language has a distinct systematic structure, where utterances break into individually meaningful words that are combined to form phrases. Here we show that natural-language-like systematicity arises in codes that are constrained by a statistical measure of complexity called predictive information, also known as excess entropy. Predictive information is the mutual information between the past and future of a stochastic process. In simulations, we find that codes that minimize predictive information break messages into groups of approximately independent features that are expressed systematically and locally, corresponding to words and phrases. Next, drawing on cross-linguistic text corpora, we find that actual human languages are structured in a way that yields low predictive information compared with baselines at the levels of phonology, morphology, syntax and lexical semantics. Our results establish a link between the statistical and algebraic structure of language and reinforce the idea that these structures are shaped by communication under general cognitive constraints.<br /> (© 2025. The Author(s).) |
|---|---|
| ISSN: | 2397-3374 |
| DOI: | 10.1038/s41562-025-02336-w |
Full Text Finder
Nájsť tento článok vo Web of Science