Understanding encoder–decoder structures in machine learning using information measures

We present a theory of representation learning to model and understand the role of encoder–decoder design in machine learning (ML) from an information-theoretic angle. We use two main information concepts, information sufficiency (IS) and mutual information loss to represent predictive structures in...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Signal processing Jg. 234; S. 109983
Hauptverfasser: Silva, Jorge F., Faraggi, Victor, Ramirez, Camilo, Egaña, Alvaro, Pavez, Eduardo
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier B.V 01.09.2025
Schlagworte:
ISSN:0165-1684
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract We present a theory of representation learning to model and understand the role of encoder–decoder design in machine learning (ML) from an information-theoretic angle. We use two main information concepts, information sufficiency (IS) and mutual information loss to represent predictive structures in machine learning. Our first main result provides a functional expression that characterizes the class of probabilistic models consistent with an IS encoder–decoder latent predictive structure. This result formally justifies the encoder–decoder forward stages many modern ML architectures adopt to learn latent (compressed) representations for classification. To illustrate IS as a realistic and relevant model assumption, we revisit some known ML concepts and present some interesting new examples: invariant, robust, sparse, and digital models. Furthermore, our IS characterization allows us to tackle the fundamental question of how much performance could be lost, using the cross entropy risk, when a given encoder–decoder architecture is adopted in a learning setting. Here, our second main result shows that a mutual information loss quantifies the lack of expressiveness attributed to the choice of a (biased) encoder–decoder ML design. Finally, we address the problem of universal cross-entropy learning with an encoder–decoder design where necessary and sufficiency conditions are established to meet this requirement. In all these results, Shannon’s information measures offer new interpretations and explanations for representation learning. •A new theory of representation learning to understand encoder–decoder design.•Information sufficiency to model and characterize the predictive structures in learning.•Shannon’s information loss proposes to measure the encoder’s lack of expressiveness.•New results for universal cross-entropy learning.•On the appropriateness of digital encoders and information bottleneck for learning.
AbstractList We present a theory of representation learning to model and understand the role of encoder–decoder design in machine learning (ML) from an information-theoretic angle. We use two main information concepts, information sufficiency (IS) and mutual information loss to represent predictive structures in machine learning. Our first main result provides a functional expression that characterizes the class of probabilistic models consistent with an IS encoder–decoder latent predictive structure. This result formally justifies the encoder–decoder forward stages many modern ML architectures adopt to learn latent (compressed) representations for classification. To illustrate IS as a realistic and relevant model assumption, we revisit some known ML concepts and present some interesting new examples: invariant, robust, sparse, and digital models. Furthermore, our IS characterization allows us to tackle the fundamental question of how much performance could be lost, using the cross entropy risk, when a given encoder–decoder architecture is adopted in a learning setting. Here, our second main result shows that a mutual information loss quantifies the lack of expressiveness attributed to the choice of a (biased) encoder–decoder ML design. Finally, we address the problem of universal cross-entropy learning with an encoder–decoder design where necessary and sufficiency conditions are established to meet this requirement. In all these results, Shannon’s information measures offer new interpretations and explanations for representation learning. •A new theory of representation learning to understand encoder–decoder design.•Information sufficiency to model and characterize the predictive structures in learning.•Shannon’s information loss proposes to measure the encoder’s lack of expressiveness.•New results for universal cross-entropy learning.•On the appropriateness of digital encoders and information bottleneck for learning.
ArticleNumber 109983
Author Silva, Jorge F.
Egaña, Alvaro
Ramirez, Camilo
Pavez, Eduardo
Faraggi, Victor
Author_xml – sequence: 1
  givenname: Jorge F.
  orcidid: 0000-0002-0256-282X
  surname: Silva
  fullname: Silva, Jorge F.
  email: josilva@ing.uchile.cl
  organization: Information and Decision System Group (IDS), Universidad de Chile, Chile
– sequence: 2
  givenname: Victor
  orcidid: 0000-0001-8167-0310
  surname: Faraggi
  fullname: Faraggi, Victor
  email: victor.faraggi@ug.uchile.cl
  organization: Information and Decision System Group (IDS), Universidad de Chile, Chile
– sequence: 3
  givenname: Camilo
  orcidid: 0000-0002-2774-0012
  surname: Ramirez
  fullname: Ramirez, Camilo
  email: camilo.ramirez@ug.uchile.cl
  organization: Information and Decision System Group (IDS), Universidad de Chile, Chile
– sequence: 4
  givenname: Alvaro
  orcidid: 0000-0001-8720-4783
  surname: Egaña
  fullname: Egaña, Alvaro
  email: aegana@alges.cl
  organization: Advanced Laboratory for Geostatistical Supercomputing (ALGES), Universidad de Chile, Chile
– sequence: 5
  givenname: Eduardo
  orcidid: 0000-0001-8985-2872
  surname: Pavez
  fullname: Pavez, Eduardo
  email: pavezcar@usc.edu
  organization: Department of Electrical and Computer Engineering, University of Southern California, United States of America
BookMark eNqFkM1OAyEQgDnUxLb6Bh72BbZCYSn1YGIa_5ImXuzBE2GH2UrTsg1QE2--g2_ok8h2PXnQCzDMfJOZb0QGvvVIyAWjE0aZvNxMolvvQzuZ0mmVv-ZzxQdkmFNVyaQSp2QU44ZSyrikQ_Ky8hZDTMZb59cFemhz_PXxafH4KmIKB0iHgLFwvtgZeHUeiy2a4DvgELvT-aYNO5Ncm0vQxK78jJw0Zhvx_Ocek9Xd7fPioVw-3T8ubpYlcCpTqURtJWO8FjVHUEo0rFJMWSO4qBBmYBiiQStmDafMMjC1lECpnQLIHPMxuer7QmhjDNhocOk4SgrGbTWjuhOjN7oXozsxuheTYfEL3ge3M-H9P-y6xzAv9uYw6Aguu0PrAkLStnV_N_gGj7SH3g
CitedBy_id crossref_primary_10_3390_en18184867
crossref_primary_10_1016_j_jmapro_2025_02_053
crossref_primary_10_1016_j_sigpro_2025_110208
Cites_doi 10.1109/18.272494
10.1016/j.patcog.2011.11.015
10.1109/TIT.2011.2177771
10.1109/TPAMI.2017.2784440
10.1111/j.2517-6161.1996.tb02080.x
10.1016/j.jspi.2010.04.011
10.1007/s10444-022-09991-x
10.1016/j.acha.2019.06.004
10.1109/18.720541
10.1016/0893-6080(89)90020-8
10.1002/cpa.20124
10.1109/TSP.2010.2046077
10.1109/TIT.2005.858979
10.1214/cbms/1462061029
10.1109/TIT.2005.862083
10.1162/089976603321780272
10.1111/j.2517-6161.1979.tb01052.x
10.1090/S0894-0347-08-00610-3
10.1016/j.crma.2008.03.014
10.3390/e20060397
10.1109/TIT.2006.871582
10.1109/MLSP.2007.4414331
10.1109/ITW.2015.7133169
10.1016/j.acha.2021.08.002
10.1109/TSP.2015.2419183
10.1007/BF02551274
10.1002/j.1538-7305.1948.tb01338.x
10.1109/TIT.2010.2080891
ContentType Journal Article
Copyright 2025
Copyright_xml – notice: 2025
DBID AAYXX
CITATION
DOI 10.1016/j.sigpro.2025.109983
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
ExternalDocumentID 10_1016_j_sigpro_2025_109983
S0165168425000970
GroupedDBID --K
--M
-~X
.DC
.~1
0R~
123
1B1
1~.
1~5
4.4
457
4G.
53G
5VS
7-5
71M
8P~
9JN
AABNK
AACTN
AAEDT
AAEDW
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AATTM
AAXKI
AAXUO
AAYFN
ABBOA
ABDPE
ABFNM
ABFRF
ABJNI
ABMAC
ABWVN
ABXDB
ACDAQ
ACGFO
ACGFS
ACNNM
ACRLP
ACRPL
ACZNC
ADBBV
ADEZE
ADJOM
ADMUD
ADNMO
ADTZH
AEBSH
AECPX
AEFWE
AEIPS
AEKER
AENEX
AFJKZ
AFTJW
AFXIZ
AGCQF
AGHFR
AGQPQ
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AKRWK
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
ANKPU
AOUOD
APXCP
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
BNPGV
CS3
DU5
EBS
EFJIC
EJD
EO8
EO9
EP2
EP3
F0J
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
HLZ
HVGLF
HZ~
IHE
J1W
JJJVA
KOM
LG9
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
ROL
RPZ
SBC
SDF
SDG
SDP
SES
SEW
SPC
SPCBC
SSH
SST
SSV
SSZ
T5K
TAE
TN5
WUQ
XPP
ZMT
~02
~G-
9DU
AAYWO
AAYXX
ACLOT
ACVFH
ADCNI
AEUPX
AFPUW
AIGII
AIIUN
AKBMS
AKYEP
CITATION
EFKBS
EFLBG
~HD
ID FETCH-LOGICAL-c306t-84bd6113b4b3ec884f15818da4345ec7ca1eeaed47f301d1cab66c00d2cc61d13
ISICitedReferencesCount 2
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001448562800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0165-1684
IngestDate Sat Nov 29 06:56:43 EST 2025
Tue Nov 18 22:08:23 EST 2025
Sat Apr 26 15:41:40 EDT 2025
IsPeerReviewed true
IsScholarly true
Keywords Representation learning
Information bottleneck
Invariant models
Encoder expressiveness
Digital models
Encoder–decoder design
Cross-entropy loss
Information sufficiency
Sparse models
Explainability
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c306t-84bd6113b4b3ec884f15818da4345ec7ca1eeaed47f301d1cab66c00d2cc61d13
ORCID 0000-0002-2774-0012
0000-0001-8720-4783
0000-0001-8167-0310
0000-0001-8985-2872
0000-0002-0256-282X
ParticipantIDs crossref_citationtrail_10_1016_j_sigpro_2025_109983
crossref_primary_10_1016_j_sigpro_2025_109983
elsevier_sciencedirect_doi_10_1016_j_sigpro_2025_109983
PublicationCentury 2000
PublicationDate September 2025
2025-09-00
PublicationDateYYYYMMDD 2025-09-01
PublicationDate_xml – month: 09
  year: 2025
  text: September 2025
PublicationDecade 2020
PublicationTitle Signal processing
PublicationYear 2025
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Shannon (b21) 1948; 27
Cover, Thomas (b28) 2006
Zaheer, Kottur, Ravanbakhsh, Poczos, Salakhutdinov, Smola (b17) 2017
Zhou (b25) 2020; 48
Delalleau, Bengio (b54) 2011
Silva, Narayanan (b64) 2010; 140
Strouse, Schwab (b8) 2017; 26
Eaton (b34) 1989
D.J. Rezende, S. Mohamed, D. Wierstra, Stochastic back-propagation and approximate inferencxe in deep generative models, in: International Conference on Machine Learning, 2014, pp. 1278–1286.
Cybenko (b51) 1989; 2
Halmos (b27) 1950
Cohen, Dahmen, DeVore (b45) 2009; 22
N. Tishby, F. Pereira, W. Biale, The Information Botleneck Method, in: The 37th Annual Allerton Conf. on Communication, Control, and Computing, 1999, pp. 368–377.
Silva, Tobar, Vicuna, Cordova (b22) 2024; 25
M. Telgarsky, Benefits of depth in neural networks, in: Conference on Learning Theory, 2016, pp. 1517–1539.
Candes, Tao (b44) 2005; 51
Dawid (b18) 1979; 41
Silva, Narayanan (b62) 2010; 58
Y. Dubois, B. Bloem-Reddy, K. Ullrich, C.J. Maddison, Lossy Compression for Losless Prediction, in: At ICLR 2021 Neural Compression Workshop, 2021, pp. 1–26.
Tegmark, Wu (b9) 2019; 22
Candes, Romberg, Tao (b41) 2006; 52
Candes, Romberg, Tao (b43) 2006; 59
Mao, Zhou (b58) 2022; 48
Devroye, Györfi, Lugosi (b16) 1996
Silva (b61) 2018; 20
Bengio, Courville, Vincent (b1) 2013; 35
Csiszár, Shields (b30) 2004
Silva, Tobar (b15) 2022; vol. 151
N. Tishby, N. Zaslavsky, Deep learning and the information bottleneck principle, in: Information Theory Workshop, 2015, pp. 1–5.
Breiman (b33) 1968
D. Kingma, M. Welling, Auto-encoding variational Bayes, in: Proc. Int. Conf. Learn. Represent., ICLR, 2014.
Molchanov, Tyree, Karras, Aila, Kautz (b50) 2016
J. Silva, S. Narayanan, Minimum Probability of Error Signal Representation, in: IEEE Workshop Machine Learning for Signal Processing, 2007.
Gray, Neuhoff (b36) 1998; 44
Gray (b26) 2009
Candes (b42) 2008; I 346
Poole, Lahiri, Raghu, Dickstein, Ganguli (b53) 2016
Silva (b47) 2022; 56
Silva, Derpich (b46) 2015; 63
A. Alemi, I. Fischer, J. Dillon, K. Murphy, Deep variational information bottleneck, in: Proc. Int. Conf. Learn. Represent., ICLR, 2017, pp. 368–377.
Bloem-Reddy, Teh (b10) 2020; 21
Kim, Kim, Woo, Kim (b48) 2021
Xu, Mannor (b11) 2012; 86
Hornik, Stinchcombe, White (b24) 1989; 2
Gray (b29) 1990
Silva, Narayanan (b65) 2012; 58
Amjad, Geiger (b3) 2019; 10.1109/TPAMI.2019.2909031
Gray (b39) 1990
Silva, Narayanan (b13) 2012; 45
Rotman (b35) 1995
Montufar, Pascanu, Cho, Bengio (b52) 2014
Feder, Merhav (b31) 1994; 40
Achille, Soatto (b2) 2018; 40
Donoho (b40) 2006; 52
Ho, Verdú (b32) 2010; 56
Gersho, Gray (b38) 1992
Shen, Jiao, Lin, Huang (b57) 2022; 35
Mhaskar, Poggio (b55) 2016; 6
Vera, Piantanida, Vega (b7) 2018
Duda, Hart (b19) 1983
Kullback (b59) 1958
Silva, Narayanan (b63) 2007
Michie, Spiegelhalter, Taylor (b20) 1994
Tibshirani (b49) 1996; 58
Paninski (b60) 2003; 15
10.1016/j.sigpro.2025.109983_b37
Amjad (10.1016/j.sigpro.2025.109983_b3) 2019; 10.1109/TPAMI.2019.2909031
Silva (10.1016/j.sigpro.2025.109983_b61) 2018; 20
Bengio (10.1016/j.sigpro.2025.109983_b1) 2013; 35
Devroye (10.1016/j.sigpro.2025.109983_b16) 1996
Mhaskar (10.1016/j.sigpro.2025.109983_b55) 2016; 6
Zaheer (10.1016/j.sigpro.2025.109983_b17) 2017
Halmos (10.1016/j.sigpro.2025.109983_b27) 1950
Donoho (10.1016/j.sigpro.2025.109983_b40) 2006; 52
Dawid (10.1016/j.sigpro.2025.109983_b18) 1979; 41
Tibshirani (10.1016/j.sigpro.2025.109983_b49) 1996; 58
Shen (10.1016/j.sigpro.2025.109983_b57) 2022; 35
Ho (10.1016/j.sigpro.2025.109983_b32) 2010; 56
Candes (10.1016/j.sigpro.2025.109983_b42) 2008; I 346
Silva (10.1016/j.sigpro.2025.109983_b15) 2022; vol. 151
Poole (10.1016/j.sigpro.2025.109983_b53) 2016
Silva (10.1016/j.sigpro.2025.109983_b46) 2015; 63
Gray (10.1016/j.sigpro.2025.109983_b36) 1998; 44
Vera (10.1016/j.sigpro.2025.109983_b7) 2018
10.1016/j.sigpro.2025.109983_b23
Feder (10.1016/j.sigpro.2025.109983_b31) 1994; 40
Silva (10.1016/j.sigpro.2025.109983_b13) 2012; 45
Michie (10.1016/j.sigpro.2025.109983_b20) 1994
Silva (10.1016/j.sigpro.2025.109983_b64) 2010; 140
Cohen (10.1016/j.sigpro.2025.109983_b45) 2009; 22
Montufar (10.1016/j.sigpro.2025.109983_b52) 2014
Eaton (10.1016/j.sigpro.2025.109983_b34) 1989
Cover (10.1016/j.sigpro.2025.109983_b28) 2006
Gray (10.1016/j.sigpro.2025.109983_b29) 1990
Mao (10.1016/j.sigpro.2025.109983_b58) 2022; 48
Csiszár (10.1016/j.sigpro.2025.109983_b30) 2004
Shannon (10.1016/j.sigpro.2025.109983_b21) 1948; 27
10.1016/j.sigpro.2025.109983_b6
10.1016/j.sigpro.2025.109983_b12
10.1016/j.sigpro.2025.109983_b56
10.1016/j.sigpro.2025.109983_b4
10.1016/j.sigpro.2025.109983_b5
10.1016/j.sigpro.2025.109983_b14
Rotman (10.1016/j.sigpro.2025.109983_b35) 1995
Candes (10.1016/j.sigpro.2025.109983_b41) 2006; 52
Delalleau (10.1016/j.sigpro.2025.109983_b54) 2011
Gray (10.1016/j.sigpro.2025.109983_b26) 2009
Achille (10.1016/j.sigpro.2025.109983_b2) 2018; 40
Kim (10.1016/j.sigpro.2025.109983_b48) 2021
Zhou (10.1016/j.sigpro.2025.109983_b25) 2020; 48
Molchanov (10.1016/j.sigpro.2025.109983_b50) 2016
Tegmark (10.1016/j.sigpro.2025.109983_b9) 2019; 22
Kullback (10.1016/j.sigpro.2025.109983_b59) 1958
Breiman (10.1016/j.sigpro.2025.109983_b33) 1968
Xu (10.1016/j.sigpro.2025.109983_b11) 2012; 86
Strouse (10.1016/j.sigpro.2025.109983_b8) 2017; 26
Paninski (10.1016/j.sigpro.2025.109983_b60) 2003; 15
Silva (10.1016/j.sigpro.2025.109983_b22) 2024; 25
Cybenko (10.1016/j.sigpro.2025.109983_b51) 1989; 2
Silva (10.1016/j.sigpro.2025.109983_b65) 2012; 58
Gersho (10.1016/j.sigpro.2025.109983_b38) 1992
Candes (10.1016/j.sigpro.2025.109983_b43) 2006; 59
Candes (10.1016/j.sigpro.2025.109983_b44) 2005; 51
Silva (10.1016/j.sigpro.2025.109983_b63) 2007
Hornik (10.1016/j.sigpro.2025.109983_b24) 1989; 2
Silva (10.1016/j.sigpro.2025.109983_b47) 2022; 56
Bloem-Reddy (10.1016/j.sigpro.2025.109983_b10) 2020; 21
Gray (10.1016/j.sigpro.2025.109983_b39) 1990
Duda (10.1016/j.sigpro.2025.109983_b19) 1983
Silva (10.1016/j.sigpro.2025.109983_b62) 2010; 58
References_xml – volume: 25
  start-page: 1
  year: 2024
  end-page: 71
  ident: b22
  article-title: Studying the interplay between information loss and operation loss in representations for classification
  publication-title: J. Mach. Learn. Res.
– volume: 35
  start-page: 798
  year: 2013
  end-page: 1828
  ident: b1
  article-title: Representation learning: A review and new perspective
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– reference: J. Silva, S. Narayanan, Minimum Probability of Error Signal Representation, in: IEEE Workshop Machine Learning for Signal Processing, 2007.
– volume: 40
  start-page: 2897
  year: 2018
  end-page: 2905
  ident: b2
  article-title: Information dropout: Learning optimal representations through noisy computation
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– year: 2016
  ident: b50
  article-title: Pruning convolutional neural networks for resource efficient transfer learning
– reference: N. Tishby, F. Pereira, W. Biale, The Information Botleneck Method, in: The 37th Annual Allerton Conf. on Communication, Control, and Computing, 1999, pp. 368–377.
– volume: 56
  start-page: 5930
  year: 2010
  end-page: 5942
  ident: b32
  article-title: On the interplay between conditional entropy and error probability
  publication-title: IEEE Trans. Inform. Theory
– volume: 35
  year: 2022
  ident: b57
  article-title: Approximation with CNNs in Sobolev space: with applications to classification
  publication-title: Adv. Neural Inf. Process. Syst.
– volume: 140
  start-page: 3180
  year: 2010
  end-page: 3198
  ident: b64
  article-title: Information divergence estimation based on data-dependent partitions
  publication-title: J. Statist. Plann. Inference
– year: 2004
  ident: b30
  article-title: Information Theory and Statistics: A Tutorial
– volume: 56
  start-page: 61
  year: 2022
  end-page: 97
  ident: b47
  article-title: Compressibility analysis of asymptotically mean stationary processes
  publication-title: Appl. Comput. Harmon. Anal.
– volume: 27
  year: 1948
  ident: b21
  article-title: A mathematical theory of communication
  publication-title: Bell Syst. Tech. J.
– year: 1989
  ident: b34
  article-title: Group invariance in applicartions in statistics
  publication-title: Regional Conference Series in Probability and Statistics, Volume 1
– reference: D.J. Rezende, S. Mohamed, D. Wierstra, Stochastic back-propagation and approximate inferencxe in deep generative models, in: International Conference on Machine Learning, 2014, pp. 1278–1286.
– year: 2006
  ident: b28
  article-title: Elements of Information Theory
– volume: 6
  year: 2016
  ident: b55
  article-title: Deep vs. shallow networks: An approximation theory perspective
  publication-title: Anal. Appl.
– volume: 52
  start-page: 489
  year: 2006
  end-page: 509
  ident: b41
  article-title: Robust uncertanty principle: exact signal reconstruction from highly imcomplete frequency information
  publication-title: IEEE Trans. Inform. Theory
– year: 1996
  ident: b16
  article-title: A Probabilistic Theory of Pattern Recognition
– volume: 48
  start-page: 787
  year: 2020
  end-page: 794
  ident: b25
  article-title: Universality of deep convolutional neural networks
  publication-title: Appl. Comput. Harmon. Anal.
– year: 2021
  ident: b48
  article-title: Drop-bottleneck: Learning discrete compressed representation for noise-robust exploration
  publication-title: International Conference on Learning Representations
– volume: 58
  start-page: 1940
  year: 2012
  end-page: 1952
  ident: b65
  article-title: Complexity-regularized tree-structured partition for mutual information estimation
  publication-title: IEEE Trans. Inform. Theory
– year: 1950
  ident: b27
  article-title: Measure Theory
– volume: 15
  start-page: 1191
  year: 2003
  end-page: 1253
  ident: b60
  article-title: Estimation of entropy and mutual information
  publication-title: Neural Comput.
– year: 1990
  ident: b29
  article-title: Entropy and Information Theory
– volume: 59
  start-page: 1207
  year: 2006
  end-page: 1223
  ident: b43
  article-title: Stable signal recovery from incomplete and inaccurate measurements
  publication-title: Commun. Pure Appl. Math.
– year: 1958
  ident: b59
  article-title: Information theory and statistics
– start-page: 3391
  year: 2017
  end-page: 3401
  ident: b17
  article-title: Deep sets
  publication-title: Advances in Neural Information Processing System 30
– start-page: 666
  year: 2011
  end-page: 674
  ident: b54
  article-title: Shallow vs. deep sum-product networks
  publication-title: Advances in Neural Information Processing System 24
– volume: 22
  start-page: 1
  year: 2019
  end-page: 27
  ident: b9
  article-title: Pareto-optimal data compression for binary classifiaction tasks
  publication-title: Entropy
– volume: 44
  start-page: 2325
  year: 1998
  end-page: 2384
  ident: b36
  article-title: Quantization
  publication-title: IEEE Trans. Inform. Theory
– volume: 22
  start-page: 211
  year: 2009
  end-page: 231
  ident: b45
  article-title: Compressed sensing and best
  publication-title: J. Amer. Math. Soc.
– reference: D. Kingma, M. Welling, Auto-encoding variational Bayes, in: Proc. Int. Conf. Learn. Represent., ICLR, 2014.
– volume: vol. 151
  start-page: 4853
  year: 2022
  end-page: 4871
  ident: b15
  article-title: On the interplay between information loss and operation loss in representations for classification
  publication-title: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics
– year: 2009
  ident: b26
  article-title: Probability, Random Processes, and Ergodic Properties
– volume: 26
  year: 2017
  ident: b8
  article-title: The deterministic information bottleneck
  publication-title: Mass. Inst. Tech. Neural Comput.
– reference: M. Telgarsky, Benefits of depth in neural networks, in: Conference on Learning Theory, 2016, pp. 1517–1539.
– reference: A. Alemi, I. Fischer, J. Dillon, K. Murphy, Deep variational information bottleneck, in: Proc. Int. Conf. Learn. Represent., ICLR, 2017, pp. 368–377.
– volume: 2
  start-page: 303
  year: 1989
  end-page: 314
  ident: b51
  article-title: Approximation by superpositions of a sigmoidal function
  publication-title: Math. Conytrol, Signal Syst.
– volume: 48
  start-page: 84
  year: 2022
  ident: b58
  article-title: Approximation of functions from Korobov spaces by deep convolutional neural networks
  publication-title: Adv. Comput. Math.
– year: 2007
  ident: b63
  article-title: Universal consistency of data-driven partitions for divergence estimation
  publication-title: IEEE International Symposium on Information Theory
– volume: 2
  start-page: 359
  year: 1989
  end-page: 366
  ident: b24
  article-title: Multilayer feedforward networks are univeral approximators
  publication-title: Neural Netw.
– volume: 58
  start-page: 3497
  year: 2010
  end-page: 3511
  ident: b62
  article-title: Non-product data-dependent partitions for mutual information estimation: Strong consistency and applications
  publication-title: IEEE Trans. Signal Process.
– year: 1992
  ident: b38
  article-title: Vector Quantization and Signal Compression
– year: 1990
  ident: b39
  article-title: Source Coding Theory
– year: 2018
  ident: b7
  article-title: The role of information complexity and randomization in representation learning
– start-page: 2924
  year: 2014
  end-page: 2932
  ident: b52
  article-title: On the number of linear regions of deep neural network
  publication-title: Advances in Neural Information Processing System 27
– volume: 21
  start-page: 1
  year: 2020
  end-page: 61
  ident: b10
  article-title: Probabilistic symmetry and invariant neural networks
  publication-title: J. Mach. Learn. Res.
– volume: 51
  start-page: 4203
  year: 2005
  end-page: 4215
  ident: b44
  article-title: Decoding by linear programing
  publication-title: IEEE Trans. Inform. Theory
– volume: 20
  start-page: 1
  year: 2018
  end-page: 28
  ident: b61
  article-title: Shannon entropy estimation in
  publication-title: Entropy
– year: 1995
  ident: b35
  article-title: An Introduction to the Theory of Groups
  publication-title: Graduate Texts in Mathematics
– volume: I 346
  start-page: 589
  year: 2008
  end-page: 592
  ident: b42
  article-title: The restricted isometry property and its applications for compressed sensing
  publication-title: C. R. Acad. Sci. Paris
– volume: 45
  start-page: 1853
  year: 2012
  end-page: 1865
  ident: b13
  article-title: On signal representations within the Bayes decision framework
  publication-title: Pattern Recognit.
– volume: 58
  start-page: 267
  year: 1996
  end-page: 288
  ident: b49
  article-title: Regression shrinkage and selection via the lasso
  publication-title: J. R. Stat. Soc. Ser. B Stat. Methodol.
– year: 1968
  ident: b33
  article-title: Probability
– volume: 10.1109/TPAMI.2019.2909031
  year: 2019
  ident: b3
  article-title: Learning representations for neural network-based classification using the information bottleneck principle
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– volume: 63
  start-page: 2915
  year: 2015
  end-page: 2928
  ident: b46
  article-title: On the characterization of
  publication-title: IEEE Trans. Signal Process.
– reference: Y. Dubois, B. Bloem-Reddy, K. Ullrich, C.J. Maddison, Lossy Compression for Losless Prediction, in: At ICLR 2021 Neural Compression Workshop, 2021, pp. 1–26.
– volume: 41
  start-page: 1
  year: 1979
  end-page: 31
  ident: b18
  article-title: Conditional independence in statistical theory
  publication-title: J. R. Stat. Soc. Ser. B Stat. Methodol.
– volume: 40
  start-page: 259
  year: 1994
  end-page: 266
  ident: b31
  article-title: Relationship between entropy and error probability
  publication-title: IEEE Trans. Inform. Theory
– year: 1983
  ident: b19
  article-title: Pattern Classification and Scene Analysis
– year: 1994
  ident: b20
  article-title: Machine Learning, Neural and Statistical Classification
– start-page: 3360
  year: 2016
  end-page: 3368
  ident: b53
  article-title: Exponential expresiviness in deep learning networks through transient chaos
  publication-title: Advances in Neural Information Processing System 29
– volume: 86
  year: 2012
  ident: b11
  article-title: Robustness and generalization
  publication-title: Mach. Learn.
– volume: 52
  start-page: 1289
  year: 2006
  end-page: 1306
  ident: b40
  article-title: Compressed sensing
  publication-title: IEEE Trans. Inform. Theory
– reference: N. Tishby, N. Zaslavsky, Deep learning and the information bottleneck principle, in: Information Theory Workshop, 2015, pp. 1–5.
– year: 2006
  ident: 10.1016/j.sigpro.2025.109983_b28
– year: 1950
  ident: 10.1016/j.sigpro.2025.109983_b27
– volume: 40
  start-page: 259
  issue: 1
  year: 1994
  ident: 10.1016/j.sigpro.2025.109983_b31
  article-title: Relationship between entropy and error probability
  publication-title: IEEE Trans. Inform. Theory
  doi: 10.1109/18.272494
– volume: vol. 151
  start-page: 4853
  year: 2022
  ident: 10.1016/j.sigpro.2025.109983_b15
  article-title: On the interplay between information loss and operation loss in representations for classification
– volume: 26
  issue: 1611–1630
  year: 2017
  ident: 10.1016/j.sigpro.2025.109983_b8
  article-title: The deterministic information bottleneck
  publication-title: Mass. Inst. Tech. Neural Comput.
– volume: 21
  start-page: 1
  year: 2020
  ident: 10.1016/j.sigpro.2025.109983_b10
  article-title: Probabilistic symmetry and invariant neural networks
  publication-title: J. Mach. Learn. Res.
– volume: 45
  start-page: 1853
  issue: 5
  year: 2012
  ident: 10.1016/j.sigpro.2025.109983_b13
  article-title: On signal representations within the Bayes decision framework
  publication-title: Pattern Recognit.
  doi: 10.1016/j.patcog.2011.11.015
– year: 1996
  ident: 10.1016/j.sigpro.2025.109983_b16
– volume: 35
  start-page: 798
  issue: 1
  year: 2013
  ident: 10.1016/j.sigpro.2025.109983_b1
  article-title: Representation learning: A review and new perspective
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– ident: 10.1016/j.sigpro.2025.109983_b37
– volume: 58
  start-page: 1940
  issue: 3
  year: 2012
  ident: 10.1016/j.sigpro.2025.109983_b65
  article-title: Complexity-regularized tree-structured partition for mutual information estimation
  publication-title: IEEE Trans. Inform. Theory
  doi: 10.1109/TIT.2011.2177771
– volume: 40
  start-page: 2897
  issue: 12
  year: 2018
  ident: 10.1016/j.sigpro.2025.109983_b2
  article-title: Information dropout: Learning optimal representations through noisy computation
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2017.2784440
– start-page: 666
  year: 2011
  ident: 10.1016/j.sigpro.2025.109983_b54
  article-title: Shallow vs. deep sum-product networks
– volume: 25
  start-page: 1
  year: 2024
  ident: 10.1016/j.sigpro.2025.109983_b22
  article-title: Studying the interplay between information loss and operation loss in representations for classification
  publication-title: J. Mach. Learn. Res.
– ident: 10.1016/j.sigpro.2025.109983_b14
– volume: 58
  start-page: 267
  issue: 1
  year: 1996
  ident: 10.1016/j.sigpro.2025.109983_b49
  article-title: Regression shrinkage and selection via the lasso
  publication-title: J. R. Stat. Soc. Ser. B Stat. Methodol.
  doi: 10.1111/j.2517-6161.1996.tb02080.x
– volume: 140
  start-page: 3180
  issue: 11
  year: 2010
  ident: 10.1016/j.sigpro.2025.109983_b64
  article-title: Information divergence estimation based on data-dependent partitions
  publication-title: J. Statist. Plann. Inference
  doi: 10.1016/j.jspi.2010.04.011
– volume: 48
  start-page: 84
  issue: 6
  year: 2022
  ident: 10.1016/j.sigpro.2025.109983_b58
  article-title: Approximation of functions from Korobov spaces by deep convolutional neural networks
  publication-title: Adv. Comput. Math.
  doi: 10.1007/s10444-022-09991-x
– volume: 48
  start-page: 787
  issue: 2
  year: 2020
  ident: 10.1016/j.sigpro.2025.109983_b25
  article-title: Universality of deep convolutional neural networks
  publication-title: Appl. Comput. Harmon. Anal.
  doi: 10.1016/j.acha.2019.06.004
– year: 2018
  ident: 10.1016/j.sigpro.2025.109983_b7
– year: 2009
  ident: 10.1016/j.sigpro.2025.109983_b26
– start-page: 3360
  year: 2016
  ident: 10.1016/j.sigpro.2025.109983_b53
  article-title: Exponential expresiviness in deep learning networks through transient chaos
– volume: 44
  start-page: 2325
  year: 1998
  ident: 10.1016/j.sigpro.2025.109983_b36
  article-title: Quantization
  publication-title: IEEE Trans. Inform. Theory
  doi: 10.1109/18.720541
– volume: 2
  start-page: 359
  year: 1989
  ident: 10.1016/j.sigpro.2025.109983_b24
  article-title: Multilayer feedforward networks are univeral approximators
  publication-title: Neural Netw.
  doi: 10.1016/0893-6080(89)90020-8
– volume: 59
  start-page: 1207
  year: 2006
  ident: 10.1016/j.sigpro.2025.109983_b43
  article-title: Stable signal recovery from incomplete and inaccurate measurements
  publication-title: Commun. Pure Appl. Math.
  doi: 10.1002/cpa.20124
– start-page: 3391
  year: 2017
  ident: 10.1016/j.sigpro.2025.109983_b17
  article-title: Deep sets
– year: 1995
  ident: 10.1016/j.sigpro.2025.109983_b35
  article-title: An Introduction to the Theory of Groups
– year: 2004
  ident: 10.1016/j.sigpro.2025.109983_b30
– volume: 58
  start-page: 3497
  issue: 7
  year: 2010
  ident: 10.1016/j.sigpro.2025.109983_b62
  article-title: Non-product data-dependent partitions for mutual information estimation: Strong consistency and applications
  publication-title: IEEE Trans. Signal Process.
  doi: 10.1109/TSP.2010.2046077
– volume: 51
  start-page: 4203
  issue: 12
  year: 2005
  ident: 10.1016/j.sigpro.2025.109983_b44
  article-title: Decoding by linear programing
  publication-title: IEEE Trans. Inform. Theory
  doi: 10.1109/TIT.2005.858979
– year: 1994
  ident: 10.1016/j.sigpro.2025.109983_b20
– ident: 10.1016/j.sigpro.2025.109983_b4
– year: 1989
  ident: 10.1016/j.sigpro.2025.109983_b34
  article-title: Group invariance in applicartions in statistics
  doi: 10.1214/cbms/1462061029
– volume: 22
  start-page: 1
  issue: 7
  year: 2019
  ident: 10.1016/j.sigpro.2025.109983_b9
  article-title: Pareto-optimal data compression for binary classifiaction tasks
  publication-title: Entropy
– volume: 52
  start-page: 489
  issue: 2
  year: 2006
  ident: 10.1016/j.sigpro.2025.109983_b41
  article-title: Robust uncertanty principle: exact signal reconstruction from highly imcomplete frequency information
  publication-title: IEEE Trans. Inform. Theory
  doi: 10.1109/TIT.2005.862083
– volume: 35
  year: 2022
  ident: 10.1016/j.sigpro.2025.109983_b57
  article-title: Approximation with CNNs in Sobolev space: with applications to classification
  publication-title: Adv. Neural Inf. Process. Syst.
– year: 1983
  ident: 10.1016/j.sigpro.2025.109983_b19
– volume: 15
  start-page: 1191
  year: 2003
  ident: 10.1016/j.sigpro.2025.109983_b60
  article-title: Estimation of entropy and mutual information
  publication-title: Neural Comput.
  doi: 10.1162/089976603321780272
– volume: 41
  start-page: 1
  issue: 1
  year: 1979
  ident: 10.1016/j.sigpro.2025.109983_b18
  article-title: Conditional independence in statistical theory
  publication-title: J. R. Stat. Soc. Ser. B Stat. Methodol.
  doi: 10.1111/j.2517-6161.1979.tb01052.x
– volume: 22
  start-page: 211
  issue: 1
  year: 2009
  ident: 10.1016/j.sigpro.2025.109983_b45
  article-title: Compressed sensing and best k-term approximation
  publication-title: J. Amer. Math. Soc.
  doi: 10.1090/S0894-0347-08-00610-3
– volume: I 346
  start-page: 589
  year: 2008
  ident: 10.1016/j.sigpro.2025.109983_b42
  article-title: The restricted isometry property and its applications for compressed sensing
  publication-title: C. R. Acad. Sci. Paris
  doi: 10.1016/j.crma.2008.03.014
– volume: 20
  start-page: 1
  issue: 6
  year: 2018
  ident: 10.1016/j.sigpro.2025.109983_b61
  article-title: Shannon entropy estimation in ∞-alphabets from convergence results: Studying plug-in estimators
  publication-title: Entropy
  doi: 10.3390/e20060397
– volume: 52
  start-page: 1289
  year: 2006
  ident: 10.1016/j.sigpro.2025.109983_b40
  article-title: Compressed sensing
  publication-title: IEEE Trans. Inform. Theory
  doi: 10.1109/TIT.2006.871582
– year: 1958
  ident: 10.1016/j.sigpro.2025.109983_b59
– ident: 10.1016/j.sigpro.2025.109983_b12
  doi: 10.1109/MLSP.2007.4414331
– year: 1990
  ident: 10.1016/j.sigpro.2025.109983_b29
– year: 1968
  ident: 10.1016/j.sigpro.2025.109983_b33
– ident: 10.1016/j.sigpro.2025.109983_b5
– ident: 10.1016/j.sigpro.2025.109983_b23
  doi: 10.1109/ITW.2015.7133169
– year: 1990
  ident: 10.1016/j.sigpro.2025.109983_b39
– ident: 10.1016/j.sigpro.2025.109983_b56
– year: 2007
  ident: 10.1016/j.sigpro.2025.109983_b63
  article-title: Universal consistency of data-driven partitions for divergence estimation
– volume: 6
  issue: 14
  year: 2016
  ident: 10.1016/j.sigpro.2025.109983_b55
  article-title: Deep vs. shallow networks: An approximation theory perspective
  publication-title: Anal. Appl.
– ident: 10.1016/j.sigpro.2025.109983_b6
– volume: 56
  start-page: 61
  year: 2022
  ident: 10.1016/j.sigpro.2025.109983_b47
  article-title: Compressibility analysis of asymptotically mean stationary processes
  publication-title: Appl. Comput. Harmon. Anal.
  doi: 10.1016/j.acha.2021.08.002
– volume: 10.1109/TPAMI.2019.2909031
  year: 2019
  ident: 10.1016/j.sigpro.2025.109983_b3
  article-title: Learning representations for neural network-based classification using the information bottleneck principle
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– year: 1992
  ident: 10.1016/j.sigpro.2025.109983_b38
– volume: 63
  start-page: 2915
  issue: 11
  year: 2015
  ident: 10.1016/j.sigpro.2025.109983_b46
  article-title: On the characterization of ℓp-compressible ergodic sequences
  publication-title: IEEE Trans. Signal Process.
  doi: 10.1109/TSP.2015.2419183
– volume: 86
  issue: 391–423
  year: 2012
  ident: 10.1016/j.sigpro.2025.109983_b11
  article-title: Robustness and generalization
  publication-title: Mach. Learn.
– year: 2021
  ident: 10.1016/j.sigpro.2025.109983_b48
  article-title: Drop-bottleneck: Learning discrete compressed representation for noise-robust exploration
– volume: 2
  start-page: 303
  issue: 4
  year: 1989
  ident: 10.1016/j.sigpro.2025.109983_b51
  article-title: Approximation by superpositions of a sigmoidal function
  publication-title: Math. Conytrol, Signal Syst.
  doi: 10.1007/BF02551274
– volume: 27
  year: 1948
  ident: 10.1016/j.sigpro.2025.109983_b21
  article-title: A mathematical theory of communication
  publication-title: Bell Syst. Tech. J.
  doi: 10.1002/j.1538-7305.1948.tb01338.x
– start-page: 2924
  year: 2014
  ident: 10.1016/j.sigpro.2025.109983_b52
  article-title: On the number of linear regions of deep neural network
– volume: 56
  start-page: 5930
  issue: 12
  year: 2010
  ident: 10.1016/j.sigpro.2025.109983_b32
  article-title: On the interplay between conditional entropy and error probability
  publication-title: IEEE Trans. Inform. Theory
  doi: 10.1109/TIT.2010.2080891
– year: 2016
  ident: 10.1016/j.sigpro.2025.109983_b50
SSID ssj0001360
Score 2.4725277
Snippet We present a theory of representation learning to model and understand the role of encoder–decoder design in machine learning (ML) from an...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 109983
SubjectTerms Cross-entropy loss
Digital models
Encoder expressiveness
Encoder–decoder design
Explainability
Information bottleneck
Information sufficiency
Invariant models
Representation learning
Sparse models
Title Understanding encoder–decoder structures in machine learning using information measures
URI https://dx.doi.org/10.1016/j.sigpro.2025.109983
Volume 234
WOSCitedRecordID wos001448562800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  issn: 0165-1684
  databaseCode: AIEXJ
  dateStart: 19950101
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.sciencedirect.com
  omitProxy: false
  ssIdentifier: ssj0001360
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwELag5QCHiqcoBeQDt8irdezYybFUrQChCqkPLacosb1Rqt3satNWFSf-A_-QX9LxI2nKIl4Sl2gTrZ1dz6fJl_E3Mwi9MbIAUhBnRApZEq4ZI-k04USkmsmx4TJxavfTj_LwMJ1Msk9Bq9q6dgKyadKrq2z5X00N18DYNnX2L8zdTwoX4DMYHY5gdjj-keFPbqWr2DqVtkZI0DQwbdx55OvGXqycHiuaO0ml6XpIVNFF63Nd-tzGaO6Die2QzR7VlSWzS59s0D0Ebbymnl16Ea6NuEcHox4mxaqoKqcgOK2Vr3Qc9pnm4Hy_eA3KvJ4teqpfFXYz_y31AWCYd7UYhiripNdihfjZWg6ND2mKhFDhG8V1Pjn2Ec41_-5DDWejtq7gv43sTWxBrMw3w_mhcvaRnZq6jUaXrzK-izZjmWTg_DZ33-9PPvSPbMpcOnn_U7ocSycEXL_XzznMgJccP0Rb4YUC73ogPEJ3TPMYPRiUmXyCPt-CBA6Q-P71WwADvgEDrhscwIA7MGAHBjwAA-7A8BSdHOwf770joaUGUfBueE5SXmpBKSt5yYxKUz6lCVA2XXDGE6OkKqgxhdFcTsHza6qKUgg1HutYKQHn7BnaaBaNeY6wMAJ8PwO6CNNQI1OhpIl1AozHMBrTbcS6RcpVqDdv257M8k5YeJb7pc3t0uZ-abcR6Uctfb2V33xfduufB87ouWAOkPnlyBf_PHIH3b9B90u0ATYyr9A9dXlet6vXAVvXSGyU_w
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Understanding+encoder%E2%80%93decoder+structures+in+machine+learning+using+information+measures&rft.jtitle=Signal+processing&rft.au=Silva%2C+Jorge+F.&rft.au=Faraggi%2C+Victor&rft.au=Ramirez%2C+Camilo&rft.au=Ega%C3%B1a%2C+Alvaro&rft.date=2025-09-01&rft.pub=Elsevier+B.V&rft.issn=0165-1684&rft.volume=234&rft_id=info:doi/10.1016%2Fj.sigpro.2025.109983&rft.externalDocID=S0165168425000970
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0165-1684&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0165-1684&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0165-1684&client=summon