Subspace exploration: Bounds on Projected Frequency Estimation

Given an n × d dimensional dataset A, a projection query specifies a subset C ⊆ [d] of columns which yields a new n × |C| array. We study the space complexity of computing data analysis functions over such subspaces, including heavy hitters and norms, when the subspaces are revealed only after obser...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems Ročník 2021; s. 273
Hlavní autori: Cormode, Graham, Dickens, Charlie, Woodruff, David P
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: 20.06.2021
ISSN:1055-6338
On-line prístup:Zistit podrobnosti o prístupe
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Given an n × d dimensional dataset A, a projection query specifies a subset C ⊆ [d] of columns which yields a new n × |C| array. We study the space complexity of computing data analysis functions over such subspaces, including heavy hitters and norms, when the subspaces are revealed only after observing the data. We show that this important class of problems is typically hard: for many problems, we show 2Ω(d) lower bounds. However, we present upper bounds which demonstrate space dependency better than 2 d . That is, for c, c' ∈ (0, 1) and a parameter N = 2 d an Nc -approximation can be obtained in space min ( N c ' , n ) , showing that it is possible to improve on the naïve approach of keeping information for all 2 d subsets of d columns. Our results are based on careful constructions of instances using coding theory and novel combinatorial reductions that exhibit such space-approximation tradeoffs.Given an n × d dimensional dataset A, a projection query specifies a subset C ⊆ [d] of columns which yields a new n × |C| array. We study the space complexity of computing data analysis functions over such subspaces, including heavy hitters and norms, when the subspaces are revealed only after observing the data. We show that this important class of problems is typically hard: for many problems, we show 2Ω(d) lower bounds. However, we present upper bounds which demonstrate space dependency better than 2 d . That is, for c, c' ∈ (0, 1) and a parameter N = 2 d an Nc -approximation can be obtained in space min ( N c ' , n ) , showing that it is possible to improve on the naïve approach of keeping information for all 2 d subsets of d columns. Our results are based on careful constructions of instances using coding theory and novel combinatorial reductions that exhibit such space-approximation tradeoffs.
AbstractList Given an n × d dimensional dataset A, a projection query specifies a subset C ⊆ [d] of columns which yields a new n × |C| array. We study the space complexity of computing data analysis functions over such subspaces, including heavy hitters and norms, when the subspaces are revealed only after observing the data. We show that this important class of problems is typically hard: for many problems, we show 2Ω(d) lower bounds. However, we present upper bounds which demonstrate space dependency better than 2 d . That is, for c, c' ∈ (0, 1) and a parameter N = 2 d an Nc -approximation can be obtained in space min ( N c ' , n ) , showing that it is possible to improve on the naïve approach of keeping information for all 2 d subsets of d columns. Our results are based on careful constructions of instances using coding theory and novel combinatorial reductions that exhibit such space-approximation tradeoffs.Given an n × d dimensional dataset A, a projection query specifies a subset C ⊆ [d] of columns which yields a new n × |C| array. We study the space complexity of computing data analysis functions over such subspaces, including heavy hitters and norms, when the subspaces are revealed only after observing the data. We show that this important class of problems is typically hard: for many problems, we show 2Ω(d) lower bounds. However, we present upper bounds which demonstrate space dependency better than 2 d . That is, for c, c' ∈ (0, 1) and a parameter N = 2 d an Nc -approximation can be obtained in space min ( N c ' , n ) , showing that it is possible to improve on the naïve approach of keeping information for all 2 d subsets of d columns. Our results are based on careful constructions of instances using coding theory and novel combinatorial reductions that exhibit such space-approximation tradeoffs.
Author Dickens, Charlie
Woodruff, David P
Cormode, Graham
Author_xml – sequence: 1
  givenname: Graham
  surname: Cormode
  fullname: Cormode, Graham
– sequence: 2
  givenname: Charlie
  surname: Dickens
  fullname: Dickens, Charlie
– sequence: 3
  givenname: David P
  surname: Woodruff
  fullname: Woodruff, David P
BookMark eNotjktLAzEYAHOoYFs9e83Ry9Y8Ni8PgpZWhYKCei5fkm9hyzZZN7ug_96inuYyDLMgs5QTEnLF2YrzWt3IWgkm-OpEK7mYkTlnSlVaSntOFqUcGKuNEHJO7t4mX3oISPGr7_IAY5vTLX3IU4qF5kRfh3zAMGKk2wE_J0zhm27K2B5_zQty1kBX8PKfS_Kx3byvn6rdy-Pz-n5XgeRqrACcBRtkFMw7L5hBVC5yjNw5GbQGF7U2tTTAWQNWa26MZ843RnoraiaW5Pqv2w_5NFHG_bEtAbsOEuap7IVSThsllRU_YWZL9g
ContentType Journal Article
DBID 7X8
DOI 10.1145/3452021.3458312
DatabaseName MEDLINE - Academic
DatabaseTitle MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
Database_xml – sequence: 1
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Computer Science
GroupedDBID -~X
29O
7X8
ALMA_UNASSIGNED_HOLDINGS
APO
ID FETCH-LOGICAL-a315t-aa98a8c3d20b9b207ee59d1ed1993c66a9d667437a10fa866177b09bf73b82402
IEDL.DBID 7X8
ISICitedReferencesCount 1
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000842374900022&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1055-6338
IngestDate Fri Jul 11 08:54:09 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a315t-aa98a8c3d20b9b207ee59d1ed1993c66a9d667437a10fa866177b09bf73b82402
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
PQID 2559675358
PQPubID 23479
ParticipantIDs proquest_miscellaneous_2559675358
PublicationCentury 2000
PublicationDate 2021-06-20
PublicationDateYYYYMMDD 2021-06-20
PublicationDate_xml – month: 06
  year: 2021
  text: 2021-06-20
  day: 20
PublicationDecade 2020
PublicationTitle Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems
PublicationYear 2021
SSID ssj0047223
Score 2.1621509
Snippet Given an n × d dimensional dataset A, a projection query specifies a subset C ⊆ [d] of columns which yields a new n × |C| array. We study the space complexity...
SourceID proquest
SourceType Aggregation Database
StartPage 273
Title Subspace exploration: Bounds on Projected Frequency Estimation
URI https://www.proquest.com/docview/2559675358
Volume 2021
WOSCitedRecordID wos000842374900022&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07T8MwELaAMrBQnuItI7GGOnGc2AwgQK2Yqg4gdavOsSOxJLQpSPx77hIXJCYk9liK7uzP9_jOH2NXIKymkfNI2rTABAUQBwurI-ESA1rYpEjLVmwiH4_1dGomoeDWBFrlChNboHZ1QTXyAYW-GNxKpe_e5hGpRlF3NUhorLOexFCGKF359LuLQO8gdgR7paIMc7HwtE-cqoFMFab98bWkviHJUf5C4vZ6GfX_-2M7bDsElvy-2wm7bM1Xe6y_Em3g4Qzvs1uCCkyUPfct_a71zA1_IHmlhtcVn3S1Ge_4aNHxrD_5EIGgm3E8YC-j4fPjUxREFCKQsVpGAEaDLqRLhDU2Ebn3yrjYO2LuFVkGxmU0iJBDLErQeF3nuRXGlrm0mlovh2yjqit_xHgKCs2FKRXGHCmUuDTxBowEmZZx5rJjdrmy0Aw3KXUeoPL1ezP7sdHJH745ZVvkGyJkJeKM9Uo8iP6cbRYfy9dmcdH6-As0Jq_v
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Subspace+exploration%3A+Bounds+on+Projected+Frequency+Estimation&rft.jtitle=Proceedings+of+the+ACM+SIGACT-SIGMOD-SIGART+Symposium+on+Principles+of+Database+Systems&rft.au=Cormode%2C+Graham&rft.au=Dickens%2C+Charlie&rft.au=Woodruff%2C+David+P&rft.date=2021-06-20&rft.issn=1055-6338&rft.volume=2021&rft.spage=273&rft_id=info:doi/10.1145%2F3452021.3458312&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1055-6338&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1055-6338&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1055-6338&client=summon