Understanding encoder–decoder structures in machine learning using information measures
We present a theory of representation learning to model and understand the role of encoder–decoder design in machine learning (ML) from an information-theoretic angle. We use two main information concepts, information sufficiency (IS) and mutual information loss to represent predictive structures in...
Gespeichert in:
| Veröffentlicht in: | Signal processing Jg. 234; S. 109983 |
|---|---|
| Hauptverfasser: | , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Elsevier B.V
01.09.2025
|
| Schlagworte: | |
| ISSN: | 0165-1684 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | We present a theory of representation learning to model and understand the role of encoder–decoder design in machine learning (ML) from an information-theoretic angle. We use two main information concepts, information sufficiency (IS) and mutual information loss to represent predictive structures in machine learning. Our first main result provides a functional expression that characterizes the class of probabilistic models consistent with an IS encoder–decoder latent predictive structure. This result formally justifies the encoder–decoder forward stages many modern ML architectures adopt to learn latent (compressed) representations for classification. To illustrate IS as a realistic and relevant model assumption, we revisit some known ML concepts and present some interesting new examples: invariant, robust, sparse, and digital models. Furthermore, our IS characterization allows us to tackle the fundamental question of how much performance could be lost, using the cross entropy risk, when a given encoder–decoder architecture is adopted in a learning setting. Here, our second main result shows that a mutual information loss quantifies the lack of expressiveness attributed to the choice of a (biased) encoder–decoder ML design. Finally, we address the problem of universal cross-entropy learning with an encoder–decoder design where necessary and sufficiency conditions are established to meet this requirement. In all these results, Shannon’s information measures offer new interpretations and explanations for representation learning.
•A new theory of representation learning to understand encoder–decoder design.•Information sufficiency to model and characterize the predictive structures in learning.•Shannon’s information loss proposes to measure the encoder’s lack of expressiveness.•New results for universal cross-entropy learning.•On the appropriateness of digital encoders and information bottleneck for learning. |
|---|---|
| AbstractList | We present a theory of representation learning to model and understand the role of encoder–decoder design in machine learning (ML) from an information-theoretic angle. We use two main information concepts, information sufficiency (IS) and mutual information loss to represent predictive structures in machine learning. Our first main result provides a functional expression that characterizes the class of probabilistic models consistent with an IS encoder–decoder latent predictive structure. This result formally justifies the encoder–decoder forward stages many modern ML architectures adopt to learn latent (compressed) representations for classification. To illustrate IS as a realistic and relevant model assumption, we revisit some known ML concepts and present some interesting new examples: invariant, robust, sparse, and digital models. Furthermore, our IS characterization allows us to tackle the fundamental question of how much performance could be lost, using the cross entropy risk, when a given encoder–decoder architecture is adopted in a learning setting. Here, our second main result shows that a mutual information loss quantifies the lack of expressiveness attributed to the choice of a (biased) encoder–decoder ML design. Finally, we address the problem of universal cross-entropy learning with an encoder–decoder design where necessary and sufficiency conditions are established to meet this requirement. In all these results, Shannon’s information measures offer new interpretations and explanations for representation learning.
•A new theory of representation learning to understand encoder–decoder design.•Information sufficiency to model and characterize the predictive structures in learning.•Shannon’s information loss proposes to measure the encoder’s lack of expressiveness.•New results for universal cross-entropy learning.•On the appropriateness of digital encoders and information bottleneck for learning. |
| ArticleNumber | 109983 |
| Author | Silva, Jorge F. Egaña, Alvaro Ramirez, Camilo Pavez, Eduardo Faraggi, Victor |
| Author_xml | – sequence: 1 givenname: Jorge F. orcidid: 0000-0002-0256-282X surname: Silva fullname: Silva, Jorge F. email: josilva@ing.uchile.cl organization: Information and Decision System Group (IDS), Universidad de Chile, Chile – sequence: 2 givenname: Victor orcidid: 0000-0001-8167-0310 surname: Faraggi fullname: Faraggi, Victor email: victor.faraggi@ug.uchile.cl organization: Information and Decision System Group (IDS), Universidad de Chile, Chile – sequence: 3 givenname: Camilo orcidid: 0000-0002-2774-0012 surname: Ramirez fullname: Ramirez, Camilo email: camilo.ramirez@ug.uchile.cl organization: Information and Decision System Group (IDS), Universidad de Chile, Chile – sequence: 4 givenname: Alvaro orcidid: 0000-0001-8720-4783 surname: Egaña fullname: Egaña, Alvaro email: aegana@alges.cl organization: Advanced Laboratory for Geostatistical Supercomputing (ALGES), Universidad de Chile, Chile – sequence: 5 givenname: Eduardo orcidid: 0000-0001-8985-2872 surname: Pavez fullname: Pavez, Eduardo email: pavezcar@usc.edu organization: Department of Electrical and Computer Engineering, University of Southern California, United States of America |
| BookMark | eNqFkM1OAyEQgDnUxLb6Bh72BbZCYSn1YGIa_5ImXuzBE2GH2UrTsg1QE2--g2_ok8h2PXnQCzDMfJOZb0QGvvVIyAWjE0aZvNxMolvvQzuZ0mmVv-ZzxQdkmFNVyaQSp2QU44ZSyrikQ_Ky8hZDTMZb59cFemhz_PXxafH4KmIKB0iHgLFwvtgZeHUeiy2a4DvgELvT-aYNO5Ncm0vQxK78jJw0Zhvx_Ocek9Xd7fPioVw-3T8ubpYlcCpTqURtJWO8FjVHUEo0rFJMWSO4qBBmYBiiQStmDafMMjC1lECpnQLIHPMxuer7QmhjDNhocOk4SgrGbTWjuhOjN7oXozsxuheTYfEL3ge3M-H9P-y6xzAv9uYw6Aguu0PrAkLStnV_N_gGj7SH3g |
| CitedBy_id | crossref_primary_10_3390_en18184867 crossref_primary_10_1016_j_jmapro_2025_02_053 crossref_primary_10_1016_j_sigpro_2025_110208 |
| Cites_doi | 10.1109/18.272494 10.1016/j.patcog.2011.11.015 10.1109/TIT.2011.2177771 10.1109/TPAMI.2017.2784440 10.1111/j.2517-6161.1996.tb02080.x 10.1016/j.jspi.2010.04.011 10.1007/s10444-022-09991-x 10.1016/j.acha.2019.06.004 10.1109/18.720541 10.1016/0893-6080(89)90020-8 10.1002/cpa.20124 10.1109/TSP.2010.2046077 10.1109/TIT.2005.858979 10.1214/cbms/1462061029 10.1109/TIT.2005.862083 10.1162/089976603321780272 10.1111/j.2517-6161.1979.tb01052.x 10.1090/S0894-0347-08-00610-3 10.1016/j.crma.2008.03.014 10.3390/e20060397 10.1109/TIT.2006.871582 10.1109/MLSP.2007.4414331 10.1109/ITW.2015.7133169 10.1016/j.acha.2021.08.002 10.1109/TSP.2015.2419183 10.1007/BF02551274 10.1002/j.1538-7305.1948.tb01338.x 10.1109/TIT.2010.2080891 |
| ContentType | Journal Article |
| Copyright | 2025 |
| Copyright_xml | – notice: 2025 |
| DBID | AAYXX CITATION |
| DOI | 10.1016/j.sigpro.2025.109983 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| ExternalDocumentID | 10_1016_j_sigpro_2025_109983 S0165168425000970 |
| GroupedDBID | --K --M -~X .DC .~1 0R~ 123 1B1 1~. 1~5 4.4 457 4G. 53G 5VS 7-5 71M 8P~ 9JN AABNK AACTN AAEDT AAEDW AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AATTM AAXKI AAXUO AAYFN ABBOA ABDPE ABFNM ABFRF ABJNI ABMAC ABWVN ABXDB ACDAQ ACGFO ACGFS ACNNM ACRLP ACRPL ACZNC ADBBV ADEZE ADJOM ADMUD ADNMO ADTZH AEBSH AECPX AEFWE AEIPS AEKER AENEX AFJKZ AFTJW AFXIZ AGCQF AGHFR AGQPQ AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AKRWK ALMA_UNASSIGNED_HOLDINGS AMRAJ ANKPU AOUOD APXCP ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC BNPGV CS3 DU5 EBS EFJIC EJD EO8 EO9 EP2 EP3 F0J F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q GBLVA GBOLZ HLZ HVGLF HZ~ IHE J1W JJJVA KOM LG9 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG ROL RPZ SBC SDF SDG SDP SES SEW SPC SPCBC SSH SST SSV SSZ T5K TAE TN5 WUQ XPP ZMT ~02 ~G- 9DU AAYWO AAYXX ACLOT ACVFH ADCNI AEUPX AFPUW AIGII AIIUN AKBMS AKYEP CITATION EFKBS EFLBG ~HD |
| ID | FETCH-LOGICAL-c306t-84bd6113b4b3ec884f15818da4345ec7ca1eeaed47f301d1cab66c00d2cc61d13 |
| ISICitedReferencesCount | 2 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001448562800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0165-1684 |
| IngestDate | Sat Nov 29 06:56:43 EST 2025 Tue Nov 18 22:08:23 EST 2025 Sat Apr 26 15:41:40 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Representation learning Information bottleneck Invariant models Encoder expressiveness Digital models Encoder–decoder design Cross-entropy loss Information sufficiency Sparse models Explainability |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c306t-84bd6113b4b3ec884f15818da4345ec7ca1eeaed47f301d1cab66c00d2cc61d13 |
| ORCID | 0000-0002-2774-0012 0000-0001-8720-4783 0000-0001-8167-0310 0000-0001-8985-2872 0000-0002-0256-282X |
| ParticipantIDs | crossref_citationtrail_10_1016_j_sigpro_2025_109983 crossref_primary_10_1016_j_sigpro_2025_109983 elsevier_sciencedirect_doi_10_1016_j_sigpro_2025_109983 |
| PublicationCentury | 2000 |
| PublicationDate | September 2025 2025-09-00 |
| PublicationDateYYYYMMDD | 2025-09-01 |
| PublicationDate_xml | – month: 09 year: 2025 text: September 2025 |
| PublicationDecade | 2020 |
| PublicationTitle | Signal processing |
| PublicationYear | 2025 |
| Publisher | Elsevier B.V |
| Publisher_xml | – name: Elsevier B.V |
| References | Shannon (b21) 1948; 27 Cover, Thomas (b28) 2006 Zaheer, Kottur, Ravanbakhsh, Poczos, Salakhutdinov, Smola (b17) 2017 Zhou (b25) 2020; 48 Delalleau, Bengio (b54) 2011 Silva, Narayanan (b64) 2010; 140 Strouse, Schwab (b8) 2017; 26 Eaton (b34) 1989 D.J. Rezende, S. Mohamed, D. Wierstra, Stochastic back-propagation and approximate inferencxe in deep generative models, in: International Conference on Machine Learning, 2014, pp. 1278–1286. Cybenko (b51) 1989; 2 Halmos (b27) 1950 Cohen, Dahmen, DeVore (b45) 2009; 22 N. Tishby, F. Pereira, W. Biale, The Information Botleneck Method, in: The 37th Annual Allerton Conf. on Communication, Control, and Computing, 1999, pp. 368–377. Silva, Tobar, Vicuna, Cordova (b22) 2024; 25 M. Telgarsky, Benefits of depth in neural networks, in: Conference on Learning Theory, 2016, pp. 1517–1539. Candes, Tao (b44) 2005; 51 Dawid (b18) 1979; 41 Silva, Narayanan (b62) 2010; 58 Y. Dubois, B. Bloem-Reddy, K. Ullrich, C.J. Maddison, Lossy Compression for Losless Prediction, in: At ICLR 2021 Neural Compression Workshop, 2021, pp. 1–26. Tegmark, Wu (b9) 2019; 22 Candes, Romberg, Tao (b41) 2006; 52 Candes, Romberg, Tao (b43) 2006; 59 Mao, Zhou (b58) 2022; 48 Devroye, Györfi, Lugosi (b16) 1996 Silva (b61) 2018; 20 Bengio, Courville, Vincent (b1) 2013; 35 Csiszár, Shields (b30) 2004 Silva, Tobar (b15) 2022; vol. 151 N. Tishby, N. Zaslavsky, Deep learning and the information bottleneck principle, in: Information Theory Workshop, 2015, pp. 1–5. Breiman (b33) 1968 D. Kingma, M. Welling, Auto-encoding variational Bayes, in: Proc. Int. Conf. Learn. Represent., ICLR, 2014. Molchanov, Tyree, Karras, Aila, Kautz (b50) 2016 J. Silva, S. Narayanan, Minimum Probability of Error Signal Representation, in: IEEE Workshop Machine Learning for Signal Processing, 2007. Gray, Neuhoff (b36) 1998; 44 Gray (b26) 2009 Candes (b42) 2008; I 346 Poole, Lahiri, Raghu, Dickstein, Ganguli (b53) 2016 Silva (b47) 2022; 56 Silva, Derpich (b46) 2015; 63 A. Alemi, I. Fischer, J. Dillon, K. Murphy, Deep variational information bottleneck, in: Proc. Int. Conf. Learn. Represent., ICLR, 2017, pp. 368–377. Bloem-Reddy, Teh (b10) 2020; 21 Kim, Kim, Woo, Kim (b48) 2021 Xu, Mannor (b11) 2012; 86 Hornik, Stinchcombe, White (b24) 1989; 2 Gray (b29) 1990 Silva, Narayanan (b65) 2012; 58 Amjad, Geiger (b3) 2019; 10.1109/TPAMI.2019.2909031 Gray (b39) 1990 Silva, Narayanan (b13) 2012; 45 Rotman (b35) 1995 Montufar, Pascanu, Cho, Bengio (b52) 2014 Feder, Merhav (b31) 1994; 40 Achille, Soatto (b2) 2018; 40 Donoho (b40) 2006; 52 Ho, Verdú (b32) 2010; 56 Gersho, Gray (b38) 1992 Shen, Jiao, Lin, Huang (b57) 2022; 35 Mhaskar, Poggio (b55) 2016; 6 Vera, Piantanida, Vega (b7) 2018 Duda, Hart (b19) 1983 Kullback (b59) 1958 Silva, Narayanan (b63) 2007 Michie, Spiegelhalter, Taylor (b20) 1994 Tibshirani (b49) 1996; 58 Paninski (b60) 2003; 15 10.1016/j.sigpro.2025.109983_b37 Amjad (10.1016/j.sigpro.2025.109983_b3) 2019; 10.1109/TPAMI.2019.2909031 Silva (10.1016/j.sigpro.2025.109983_b61) 2018; 20 Bengio (10.1016/j.sigpro.2025.109983_b1) 2013; 35 Devroye (10.1016/j.sigpro.2025.109983_b16) 1996 Mhaskar (10.1016/j.sigpro.2025.109983_b55) 2016; 6 Zaheer (10.1016/j.sigpro.2025.109983_b17) 2017 Halmos (10.1016/j.sigpro.2025.109983_b27) 1950 Donoho (10.1016/j.sigpro.2025.109983_b40) 2006; 52 Dawid (10.1016/j.sigpro.2025.109983_b18) 1979; 41 Tibshirani (10.1016/j.sigpro.2025.109983_b49) 1996; 58 Shen (10.1016/j.sigpro.2025.109983_b57) 2022; 35 Ho (10.1016/j.sigpro.2025.109983_b32) 2010; 56 Candes (10.1016/j.sigpro.2025.109983_b42) 2008; I 346 Silva (10.1016/j.sigpro.2025.109983_b15) 2022; vol. 151 Poole (10.1016/j.sigpro.2025.109983_b53) 2016 Silva (10.1016/j.sigpro.2025.109983_b46) 2015; 63 Gray (10.1016/j.sigpro.2025.109983_b36) 1998; 44 Vera (10.1016/j.sigpro.2025.109983_b7) 2018 10.1016/j.sigpro.2025.109983_b23 Feder (10.1016/j.sigpro.2025.109983_b31) 1994; 40 Silva (10.1016/j.sigpro.2025.109983_b13) 2012; 45 Michie (10.1016/j.sigpro.2025.109983_b20) 1994 Silva (10.1016/j.sigpro.2025.109983_b64) 2010; 140 Cohen (10.1016/j.sigpro.2025.109983_b45) 2009; 22 Montufar (10.1016/j.sigpro.2025.109983_b52) 2014 Eaton (10.1016/j.sigpro.2025.109983_b34) 1989 Cover (10.1016/j.sigpro.2025.109983_b28) 2006 Gray (10.1016/j.sigpro.2025.109983_b29) 1990 Mao (10.1016/j.sigpro.2025.109983_b58) 2022; 48 Csiszár (10.1016/j.sigpro.2025.109983_b30) 2004 Shannon (10.1016/j.sigpro.2025.109983_b21) 1948; 27 10.1016/j.sigpro.2025.109983_b6 10.1016/j.sigpro.2025.109983_b12 10.1016/j.sigpro.2025.109983_b56 10.1016/j.sigpro.2025.109983_b4 10.1016/j.sigpro.2025.109983_b5 10.1016/j.sigpro.2025.109983_b14 Rotman (10.1016/j.sigpro.2025.109983_b35) 1995 Candes (10.1016/j.sigpro.2025.109983_b41) 2006; 52 Delalleau (10.1016/j.sigpro.2025.109983_b54) 2011 Gray (10.1016/j.sigpro.2025.109983_b26) 2009 Achille (10.1016/j.sigpro.2025.109983_b2) 2018; 40 Kim (10.1016/j.sigpro.2025.109983_b48) 2021 Zhou (10.1016/j.sigpro.2025.109983_b25) 2020; 48 Molchanov (10.1016/j.sigpro.2025.109983_b50) 2016 Tegmark (10.1016/j.sigpro.2025.109983_b9) 2019; 22 Kullback (10.1016/j.sigpro.2025.109983_b59) 1958 Breiman (10.1016/j.sigpro.2025.109983_b33) 1968 Xu (10.1016/j.sigpro.2025.109983_b11) 2012; 86 Strouse (10.1016/j.sigpro.2025.109983_b8) 2017; 26 Paninski (10.1016/j.sigpro.2025.109983_b60) 2003; 15 Silva (10.1016/j.sigpro.2025.109983_b22) 2024; 25 Cybenko (10.1016/j.sigpro.2025.109983_b51) 1989; 2 Silva (10.1016/j.sigpro.2025.109983_b65) 2012; 58 Gersho (10.1016/j.sigpro.2025.109983_b38) 1992 Candes (10.1016/j.sigpro.2025.109983_b43) 2006; 59 Candes (10.1016/j.sigpro.2025.109983_b44) 2005; 51 Silva (10.1016/j.sigpro.2025.109983_b63) 2007 Hornik (10.1016/j.sigpro.2025.109983_b24) 1989; 2 Silva (10.1016/j.sigpro.2025.109983_b47) 2022; 56 Bloem-Reddy (10.1016/j.sigpro.2025.109983_b10) 2020; 21 Gray (10.1016/j.sigpro.2025.109983_b39) 1990 Duda (10.1016/j.sigpro.2025.109983_b19) 1983 Silva (10.1016/j.sigpro.2025.109983_b62) 2010; 58 |
| References_xml | – volume: 25 start-page: 1 year: 2024 end-page: 71 ident: b22 article-title: Studying the interplay between information loss and operation loss in representations for classification publication-title: J. Mach. Learn. Res. – volume: 35 start-page: 798 year: 2013 end-page: 1828 ident: b1 article-title: Representation learning: A review and new perspective publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – reference: J. Silva, S. Narayanan, Minimum Probability of Error Signal Representation, in: IEEE Workshop Machine Learning for Signal Processing, 2007. – volume: 40 start-page: 2897 year: 2018 end-page: 2905 ident: b2 article-title: Information dropout: Learning optimal representations through noisy computation publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – year: 2016 ident: b50 article-title: Pruning convolutional neural networks for resource efficient transfer learning – reference: N. Tishby, F. Pereira, W. Biale, The Information Botleneck Method, in: The 37th Annual Allerton Conf. on Communication, Control, and Computing, 1999, pp. 368–377. – volume: 56 start-page: 5930 year: 2010 end-page: 5942 ident: b32 article-title: On the interplay between conditional entropy and error probability publication-title: IEEE Trans. Inform. Theory – volume: 35 year: 2022 ident: b57 article-title: Approximation with CNNs in Sobolev space: with applications to classification publication-title: Adv. Neural Inf. Process. Syst. – volume: 140 start-page: 3180 year: 2010 end-page: 3198 ident: b64 article-title: Information divergence estimation based on data-dependent partitions publication-title: J. Statist. Plann. Inference – year: 2004 ident: b30 article-title: Information Theory and Statistics: A Tutorial – volume: 56 start-page: 61 year: 2022 end-page: 97 ident: b47 article-title: Compressibility analysis of asymptotically mean stationary processes publication-title: Appl. Comput. Harmon. Anal. – volume: 27 year: 1948 ident: b21 article-title: A mathematical theory of communication publication-title: Bell Syst. Tech. J. – year: 1989 ident: b34 article-title: Group invariance in applicartions in statistics publication-title: Regional Conference Series in Probability and Statistics, Volume 1 – reference: D.J. Rezende, S. Mohamed, D. Wierstra, Stochastic back-propagation and approximate inferencxe in deep generative models, in: International Conference on Machine Learning, 2014, pp. 1278–1286. – year: 2006 ident: b28 article-title: Elements of Information Theory – volume: 6 year: 2016 ident: b55 article-title: Deep vs. shallow networks: An approximation theory perspective publication-title: Anal. Appl. – volume: 52 start-page: 489 year: 2006 end-page: 509 ident: b41 article-title: Robust uncertanty principle: exact signal reconstruction from highly imcomplete frequency information publication-title: IEEE Trans. Inform. Theory – year: 1996 ident: b16 article-title: A Probabilistic Theory of Pattern Recognition – volume: 48 start-page: 787 year: 2020 end-page: 794 ident: b25 article-title: Universality of deep convolutional neural networks publication-title: Appl. Comput. Harmon. Anal. – year: 2021 ident: b48 article-title: Drop-bottleneck: Learning discrete compressed representation for noise-robust exploration publication-title: International Conference on Learning Representations – volume: 58 start-page: 1940 year: 2012 end-page: 1952 ident: b65 article-title: Complexity-regularized tree-structured partition for mutual information estimation publication-title: IEEE Trans. Inform. Theory – year: 1950 ident: b27 article-title: Measure Theory – volume: 15 start-page: 1191 year: 2003 end-page: 1253 ident: b60 article-title: Estimation of entropy and mutual information publication-title: Neural Comput. – year: 1990 ident: b29 article-title: Entropy and Information Theory – volume: 59 start-page: 1207 year: 2006 end-page: 1223 ident: b43 article-title: Stable signal recovery from incomplete and inaccurate measurements publication-title: Commun. Pure Appl. Math. – year: 1958 ident: b59 article-title: Information theory and statistics – start-page: 3391 year: 2017 end-page: 3401 ident: b17 article-title: Deep sets publication-title: Advances in Neural Information Processing System 30 – start-page: 666 year: 2011 end-page: 674 ident: b54 article-title: Shallow vs. deep sum-product networks publication-title: Advances in Neural Information Processing System 24 – volume: 22 start-page: 1 year: 2019 end-page: 27 ident: b9 article-title: Pareto-optimal data compression for binary classifiaction tasks publication-title: Entropy – volume: 44 start-page: 2325 year: 1998 end-page: 2384 ident: b36 article-title: Quantization publication-title: IEEE Trans. Inform. Theory – volume: 22 start-page: 211 year: 2009 end-page: 231 ident: b45 article-title: Compressed sensing and best publication-title: J. Amer. Math. Soc. – reference: D. Kingma, M. Welling, Auto-encoding variational Bayes, in: Proc. Int. Conf. Learn. Represent., ICLR, 2014. – volume: vol. 151 start-page: 4853 year: 2022 end-page: 4871 ident: b15 article-title: On the interplay between information loss and operation loss in representations for classification publication-title: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics – year: 2009 ident: b26 article-title: Probability, Random Processes, and Ergodic Properties – volume: 26 year: 2017 ident: b8 article-title: The deterministic information bottleneck publication-title: Mass. Inst. Tech. Neural Comput. – reference: M. Telgarsky, Benefits of depth in neural networks, in: Conference on Learning Theory, 2016, pp. 1517–1539. – reference: A. Alemi, I. Fischer, J. Dillon, K. Murphy, Deep variational information bottleneck, in: Proc. Int. Conf. Learn. Represent., ICLR, 2017, pp. 368–377. – volume: 2 start-page: 303 year: 1989 end-page: 314 ident: b51 article-title: Approximation by superpositions of a sigmoidal function publication-title: Math. Conytrol, Signal Syst. – volume: 48 start-page: 84 year: 2022 ident: b58 article-title: Approximation of functions from Korobov spaces by deep convolutional neural networks publication-title: Adv. Comput. Math. – year: 2007 ident: b63 article-title: Universal consistency of data-driven partitions for divergence estimation publication-title: IEEE International Symposium on Information Theory – volume: 2 start-page: 359 year: 1989 end-page: 366 ident: b24 article-title: Multilayer feedforward networks are univeral approximators publication-title: Neural Netw. – volume: 58 start-page: 3497 year: 2010 end-page: 3511 ident: b62 article-title: Non-product data-dependent partitions for mutual information estimation: Strong consistency and applications publication-title: IEEE Trans. Signal Process. – year: 1992 ident: b38 article-title: Vector Quantization and Signal Compression – year: 1990 ident: b39 article-title: Source Coding Theory – year: 2018 ident: b7 article-title: The role of information complexity and randomization in representation learning – start-page: 2924 year: 2014 end-page: 2932 ident: b52 article-title: On the number of linear regions of deep neural network publication-title: Advances in Neural Information Processing System 27 – volume: 21 start-page: 1 year: 2020 end-page: 61 ident: b10 article-title: Probabilistic symmetry and invariant neural networks publication-title: J. Mach. Learn. Res. – volume: 51 start-page: 4203 year: 2005 end-page: 4215 ident: b44 article-title: Decoding by linear programing publication-title: IEEE Trans. Inform. Theory – volume: 20 start-page: 1 year: 2018 end-page: 28 ident: b61 article-title: Shannon entropy estimation in publication-title: Entropy – year: 1995 ident: b35 article-title: An Introduction to the Theory of Groups publication-title: Graduate Texts in Mathematics – volume: I 346 start-page: 589 year: 2008 end-page: 592 ident: b42 article-title: The restricted isometry property and its applications for compressed sensing publication-title: C. R. Acad. Sci. Paris – volume: 45 start-page: 1853 year: 2012 end-page: 1865 ident: b13 article-title: On signal representations within the Bayes decision framework publication-title: Pattern Recognit. – volume: 58 start-page: 267 year: 1996 end-page: 288 ident: b49 article-title: Regression shrinkage and selection via the lasso publication-title: J. R. Stat. Soc. Ser. B Stat. Methodol. – year: 1968 ident: b33 article-title: Probability – volume: 10.1109/TPAMI.2019.2909031 year: 2019 ident: b3 article-title: Learning representations for neural network-based classification using the information bottleneck principle publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – volume: 63 start-page: 2915 year: 2015 end-page: 2928 ident: b46 article-title: On the characterization of publication-title: IEEE Trans. Signal Process. – reference: Y. Dubois, B. Bloem-Reddy, K. Ullrich, C.J. Maddison, Lossy Compression for Losless Prediction, in: At ICLR 2021 Neural Compression Workshop, 2021, pp. 1–26. – volume: 41 start-page: 1 year: 1979 end-page: 31 ident: b18 article-title: Conditional independence in statistical theory publication-title: J. R. Stat. Soc. Ser. B Stat. Methodol. – volume: 40 start-page: 259 year: 1994 end-page: 266 ident: b31 article-title: Relationship between entropy and error probability publication-title: IEEE Trans. Inform. Theory – year: 1983 ident: b19 article-title: Pattern Classification and Scene Analysis – year: 1994 ident: b20 article-title: Machine Learning, Neural and Statistical Classification – start-page: 3360 year: 2016 end-page: 3368 ident: b53 article-title: Exponential expresiviness in deep learning networks through transient chaos publication-title: Advances in Neural Information Processing System 29 – volume: 86 year: 2012 ident: b11 article-title: Robustness and generalization publication-title: Mach. Learn. – volume: 52 start-page: 1289 year: 2006 end-page: 1306 ident: b40 article-title: Compressed sensing publication-title: IEEE Trans. Inform. Theory – reference: N. Tishby, N. Zaslavsky, Deep learning and the information bottleneck principle, in: Information Theory Workshop, 2015, pp. 1–5. – year: 2006 ident: 10.1016/j.sigpro.2025.109983_b28 – year: 1950 ident: 10.1016/j.sigpro.2025.109983_b27 – volume: 40 start-page: 259 issue: 1 year: 1994 ident: 10.1016/j.sigpro.2025.109983_b31 article-title: Relationship between entropy and error probability publication-title: IEEE Trans. Inform. Theory doi: 10.1109/18.272494 – volume: vol. 151 start-page: 4853 year: 2022 ident: 10.1016/j.sigpro.2025.109983_b15 article-title: On the interplay between information loss and operation loss in representations for classification – volume: 26 issue: 1611–1630 year: 2017 ident: 10.1016/j.sigpro.2025.109983_b8 article-title: The deterministic information bottleneck publication-title: Mass. Inst. Tech. Neural Comput. – volume: 21 start-page: 1 year: 2020 ident: 10.1016/j.sigpro.2025.109983_b10 article-title: Probabilistic symmetry and invariant neural networks publication-title: J. Mach. Learn. Res. – volume: 45 start-page: 1853 issue: 5 year: 2012 ident: 10.1016/j.sigpro.2025.109983_b13 article-title: On signal representations within the Bayes decision framework publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2011.11.015 – year: 1996 ident: 10.1016/j.sigpro.2025.109983_b16 – volume: 35 start-page: 798 issue: 1 year: 2013 ident: 10.1016/j.sigpro.2025.109983_b1 article-title: Representation learning: A review and new perspective publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – ident: 10.1016/j.sigpro.2025.109983_b37 – volume: 58 start-page: 1940 issue: 3 year: 2012 ident: 10.1016/j.sigpro.2025.109983_b65 article-title: Complexity-regularized tree-structured partition for mutual information estimation publication-title: IEEE Trans. Inform. Theory doi: 10.1109/TIT.2011.2177771 – volume: 40 start-page: 2897 issue: 12 year: 2018 ident: 10.1016/j.sigpro.2025.109983_b2 article-title: Information dropout: Learning optimal representations through noisy computation publication-title: IEEE Trans. Pattern Anal. Mach. Intell. doi: 10.1109/TPAMI.2017.2784440 – start-page: 666 year: 2011 ident: 10.1016/j.sigpro.2025.109983_b54 article-title: Shallow vs. deep sum-product networks – volume: 25 start-page: 1 year: 2024 ident: 10.1016/j.sigpro.2025.109983_b22 article-title: Studying the interplay between information loss and operation loss in representations for classification publication-title: J. Mach. Learn. Res. – ident: 10.1016/j.sigpro.2025.109983_b14 – volume: 58 start-page: 267 issue: 1 year: 1996 ident: 10.1016/j.sigpro.2025.109983_b49 article-title: Regression shrinkage and selection via the lasso publication-title: J. R. Stat. Soc. Ser. B Stat. Methodol. doi: 10.1111/j.2517-6161.1996.tb02080.x – volume: 140 start-page: 3180 issue: 11 year: 2010 ident: 10.1016/j.sigpro.2025.109983_b64 article-title: Information divergence estimation based on data-dependent partitions publication-title: J. Statist. Plann. Inference doi: 10.1016/j.jspi.2010.04.011 – volume: 48 start-page: 84 issue: 6 year: 2022 ident: 10.1016/j.sigpro.2025.109983_b58 article-title: Approximation of functions from Korobov spaces by deep convolutional neural networks publication-title: Adv. Comput. Math. doi: 10.1007/s10444-022-09991-x – volume: 48 start-page: 787 issue: 2 year: 2020 ident: 10.1016/j.sigpro.2025.109983_b25 article-title: Universality of deep convolutional neural networks publication-title: Appl. Comput. Harmon. Anal. doi: 10.1016/j.acha.2019.06.004 – year: 2018 ident: 10.1016/j.sigpro.2025.109983_b7 – year: 2009 ident: 10.1016/j.sigpro.2025.109983_b26 – start-page: 3360 year: 2016 ident: 10.1016/j.sigpro.2025.109983_b53 article-title: Exponential expresiviness in deep learning networks through transient chaos – volume: 44 start-page: 2325 year: 1998 ident: 10.1016/j.sigpro.2025.109983_b36 article-title: Quantization publication-title: IEEE Trans. Inform. Theory doi: 10.1109/18.720541 – volume: 2 start-page: 359 year: 1989 ident: 10.1016/j.sigpro.2025.109983_b24 article-title: Multilayer feedforward networks are univeral approximators publication-title: Neural Netw. doi: 10.1016/0893-6080(89)90020-8 – volume: 59 start-page: 1207 year: 2006 ident: 10.1016/j.sigpro.2025.109983_b43 article-title: Stable signal recovery from incomplete and inaccurate measurements publication-title: Commun. Pure Appl. Math. doi: 10.1002/cpa.20124 – start-page: 3391 year: 2017 ident: 10.1016/j.sigpro.2025.109983_b17 article-title: Deep sets – year: 1995 ident: 10.1016/j.sigpro.2025.109983_b35 article-title: An Introduction to the Theory of Groups – year: 2004 ident: 10.1016/j.sigpro.2025.109983_b30 – volume: 58 start-page: 3497 issue: 7 year: 2010 ident: 10.1016/j.sigpro.2025.109983_b62 article-title: Non-product data-dependent partitions for mutual information estimation: Strong consistency and applications publication-title: IEEE Trans. Signal Process. doi: 10.1109/TSP.2010.2046077 – volume: 51 start-page: 4203 issue: 12 year: 2005 ident: 10.1016/j.sigpro.2025.109983_b44 article-title: Decoding by linear programing publication-title: IEEE Trans. Inform. Theory doi: 10.1109/TIT.2005.858979 – year: 1994 ident: 10.1016/j.sigpro.2025.109983_b20 – ident: 10.1016/j.sigpro.2025.109983_b4 – year: 1989 ident: 10.1016/j.sigpro.2025.109983_b34 article-title: Group invariance in applicartions in statistics doi: 10.1214/cbms/1462061029 – volume: 22 start-page: 1 issue: 7 year: 2019 ident: 10.1016/j.sigpro.2025.109983_b9 article-title: Pareto-optimal data compression for binary classifiaction tasks publication-title: Entropy – volume: 52 start-page: 489 issue: 2 year: 2006 ident: 10.1016/j.sigpro.2025.109983_b41 article-title: Robust uncertanty principle: exact signal reconstruction from highly imcomplete frequency information publication-title: IEEE Trans. Inform. Theory doi: 10.1109/TIT.2005.862083 – volume: 35 year: 2022 ident: 10.1016/j.sigpro.2025.109983_b57 article-title: Approximation with CNNs in Sobolev space: with applications to classification publication-title: Adv. Neural Inf. Process. Syst. – year: 1983 ident: 10.1016/j.sigpro.2025.109983_b19 – volume: 15 start-page: 1191 year: 2003 ident: 10.1016/j.sigpro.2025.109983_b60 article-title: Estimation of entropy and mutual information publication-title: Neural Comput. doi: 10.1162/089976603321780272 – volume: 41 start-page: 1 issue: 1 year: 1979 ident: 10.1016/j.sigpro.2025.109983_b18 article-title: Conditional independence in statistical theory publication-title: J. R. Stat. Soc. Ser. B Stat. Methodol. doi: 10.1111/j.2517-6161.1979.tb01052.x – volume: 22 start-page: 211 issue: 1 year: 2009 ident: 10.1016/j.sigpro.2025.109983_b45 article-title: Compressed sensing and best k-term approximation publication-title: J. Amer. Math. Soc. doi: 10.1090/S0894-0347-08-00610-3 – volume: I 346 start-page: 589 year: 2008 ident: 10.1016/j.sigpro.2025.109983_b42 article-title: The restricted isometry property and its applications for compressed sensing publication-title: C. R. Acad. Sci. Paris doi: 10.1016/j.crma.2008.03.014 – volume: 20 start-page: 1 issue: 6 year: 2018 ident: 10.1016/j.sigpro.2025.109983_b61 article-title: Shannon entropy estimation in ∞-alphabets from convergence results: Studying plug-in estimators publication-title: Entropy doi: 10.3390/e20060397 – volume: 52 start-page: 1289 year: 2006 ident: 10.1016/j.sigpro.2025.109983_b40 article-title: Compressed sensing publication-title: IEEE Trans. Inform. Theory doi: 10.1109/TIT.2006.871582 – year: 1958 ident: 10.1016/j.sigpro.2025.109983_b59 – ident: 10.1016/j.sigpro.2025.109983_b12 doi: 10.1109/MLSP.2007.4414331 – year: 1990 ident: 10.1016/j.sigpro.2025.109983_b29 – year: 1968 ident: 10.1016/j.sigpro.2025.109983_b33 – ident: 10.1016/j.sigpro.2025.109983_b5 – ident: 10.1016/j.sigpro.2025.109983_b23 doi: 10.1109/ITW.2015.7133169 – year: 1990 ident: 10.1016/j.sigpro.2025.109983_b39 – ident: 10.1016/j.sigpro.2025.109983_b56 – year: 2007 ident: 10.1016/j.sigpro.2025.109983_b63 article-title: Universal consistency of data-driven partitions for divergence estimation – volume: 6 issue: 14 year: 2016 ident: 10.1016/j.sigpro.2025.109983_b55 article-title: Deep vs. shallow networks: An approximation theory perspective publication-title: Anal. Appl. – ident: 10.1016/j.sigpro.2025.109983_b6 – volume: 56 start-page: 61 year: 2022 ident: 10.1016/j.sigpro.2025.109983_b47 article-title: Compressibility analysis of asymptotically mean stationary processes publication-title: Appl. Comput. Harmon. Anal. doi: 10.1016/j.acha.2021.08.002 – volume: 10.1109/TPAMI.2019.2909031 year: 2019 ident: 10.1016/j.sigpro.2025.109983_b3 article-title: Learning representations for neural network-based classification using the information bottleneck principle publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – year: 1992 ident: 10.1016/j.sigpro.2025.109983_b38 – volume: 63 start-page: 2915 issue: 11 year: 2015 ident: 10.1016/j.sigpro.2025.109983_b46 article-title: On the characterization of ℓp-compressible ergodic sequences publication-title: IEEE Trans. Signal Process. doi: 10.1109/TSP.2015.2419183 – volume: 86 issue: 391–423 year: 2012 ident: 10.1016/j.sigpro.2025.109983_b11 article-title: Robustness and generalization publication-title: Mach. Learn. – year: 2021 ident: 10.1016/j.sigpro.2025.109983_b48 article-title: Drop-bottleneck: Learning discrete compressed representation for noise-robust exploration – volume: 2 start-page: 303 issue: 4 year: 1989 ident: 10.1016/j.sigpro.2025.109983_b51 article-title: Approximation by superpositions of a sigmoidal function publication-title: Math. Conytrol, Signal Syst. doi: 10.1007/BF02551274 – volume: 27 year: 1948 ident: 10.1016/j.sigpro.2025.109983_b21 article-title: A mathematical theory of communication publication-title: Bell Syst. Tech. J. doi: 10.1002/j.1538-7305.1948.tb01338.x – start-page: 2924 year: 2014 ident: 10.1016/j.sigpro.2025.109983_b52 article-title: On the number of linear regions of deep neural network – volume: 56 start-page: 5930 issue: 12 year: 2010 ident: 10.1016/j.sigpro.2025.109983_b32 article-title: On the interplay between conditional entropy and error probability publication-title: IEEE Trans. Inform. Theory doi: 10.1109/TIT.2010.2080891 – year: 2016 ident: 10.1016/j.sigpro.2025.109983_b50 |
| SSID | ssj0001360 |
| Score | 2.4725277 |
| Snippet | We present a theory of representation learning to model and understand the role of encoder–decoder design in machine learning (ML) from an... |
| SourceID | crossref elsevier |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 109983 |
| SubjectTerms | Cross-entropy loss Digital models Encoder expressiveness Encoder–decoder design Explainability Information bottleneck Information sufficiency Invariant models Representation learning Sparse models |
| Title | Understanding encoder–decoder structures in machine learning using information measures |
| URI | https://dx.doi.org/10.1016/j.sigpro.2025.109983 |
| Volume | 234 |
| WOSCitedRecordID | wos001448562800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 issn: 0165-1684 databaseCode: AIEXJ dateStart: 19950101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.sciencedirect.com omitProxy: false ssIdentifier: ssj0001360 providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwELag5QCHiqcoBeQDt8irdezYybFUrQChCqkPLacosb1Rqt3satNWFSf-A_-QX9LxI2nKIl4Sl2gTrZ1dz6fJl_E3Mwi9MbIAUhBnRApZEq4ZI-k04USkmsmx4TJxavfTj_LwMJ1Msk9Bq9q6dgKyadKrq2z5X00N18DYNnX2L8zdTwoX4DMYHY5gdjj-keFPbqWr2DqVtkZI0DQwbdx55OvGXqycHiuaO0ml6XpIVNFF63Nd-tzGaO6Die2QzR7VlSWzS59s0D0Ebbymnl16Ea6NuEcHox4mxaqoKqcgOK2Vr3Qc9pnm4Hy_eA3KvJ4teqpfFXYz_y31AWCYd7UYhiripNdihfjZWg6ND2mKhFDhG8V1Pjn2Ec41_-5DDWejtq7gv43sTWxBrMw3w_mhcvaRnZq6jUaXrzK-izZjmWTg_DZ33-9PPvSPbMpcOnn_U7ocSycEXL_XzznMgJccP0Rb4YUC73ogPEJ3TPMYPRiUmXyCPt-CBA6Q-P71WwADvgEDrhscwIA7MGAHBjwAA-7A8BSdHOwf770joaUGUfBueE5SXmpBKSt5yYxKUz6lCVA2XXDGE6OkKqgxhdFcTsHza6qKUgg1HutYKQHn7BnaaBaNeY6wMAJ8PwO6CNNQI1OhpIl1AozHMBrTbcS6RcpVqDdv257M8k5YeJb7pc3t0uZ-abcR6Uctfb2V33xfduufB87ouWAOkPnlyBf_PHIH3b9B90u0ATYyr9A9dXlet6vXAVvXSGyU_w |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Understanding+encoder%E2%80%93decoder+structures+in+machine+learning+using+information+measures&rft.jtitle=Signal+processing&rft.au=Silva%2C+Jorge+F.&rft.au=Faraggi%2C+Victor&rft.au=Ramirez%2C+Camilo&rft.au=Ega%C3%B1a%2C+Alvaro&rft.date=2025-09-01&rft.pub=Elsevier+B.V&rft.issn=0165-1684&rft.volume=234&rft_id=info:doi/10.1016%2Fj.sigpro.2025.109983&rft.externalDocID=S0165168425000970 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0165-1684&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0165-1684&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0165-1684&client=summon |