Empirically determining the sample size for large-scale gene network inference algorithms

The performance of genome-wide gene regulatory network inference algorithms depends on the sample size. It is generally considered that the larger the sample size, the better the gene network inference performance. Nevertheless, there is not adequate information on determining the sample size for op...

Full description

Saved in:
Bibliographic Details
Published in:IET systems biology Vol. 6; no. 2; p. 35
Main Author: Altay, G
Format: Journal Article
Language:English
Published: England 01.04.2012
Subjects:
ISSN:1751-8849
Online Access:Get more information
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract The performance of genome-wide gene regulatory network inference algorithms depends on the sample size. It is generally considered that the larger the sample size, the better the gene network inference performance. Nevertheless, there is not adequate information on determining the sample size for optimal performance. In this study, the author systematically demonstrates the effect of sample size on information-theory-based gene network inference algorithms with an ensemble approach. The empirical results showed that the inference performances of the considered algorithms tend to converge after a particular sample size region. As a specific example, the sample size region around ≃64 is sufficient to obtain the most of the inference performance with respect to precision using the representative algorithm C3NET on the synthetic steady-state data sets of Escherichia coli and also time-series data set of a homo sapiens subnetworks. The author verified the convergence result on a large, real data set of E. coli as well. The results give evidence to biologists to better design experiments to infer gene networks. Further, the effect of cutoff on inference performances over various sample sizes is considered. [Includes supplementary material].
AbstractList The performance of genome-wide gene regulatory network inference algorithms depends on the sample size. It is generally considered that the larger the sample size, the better the gene network inference performance. Nevertheless, there is not adequate information on determining the sample size for optimal performance. In this study, the author systematically demonstrates the effect of sample size on information-theory-based gene network inference algorithms with an ensemble approach. The empirical results showed that the inference performances of the considered algorithms tend to converge after a particular sample size region. As a specific example, the sample size region around ≃64 is sufficient to obtain the most of the inference performance with respect to precision using the representative algorithm C3NET on the synthetic steady-state data sets of Escherichia coli and also time-series data set of a homo sapiens subnetworks. The author verified the convergence result on a large, real data set of E. coli as well. The results give evidence to biologists to better design experiments to infer gene networks. Further, the effect of cutoff on inference performances over various sample sizes is considered. [Includes supplementary material].
The performance of genome-wide gene regulatory network inference algorithms depends on the sample size. It is generally considered that the larger the sample size, the better the gene network inference performance. Nevertheless, there is not adequate information on determining the sample size for optimal performance. In this study, the author systematically demonstrates the effect of sample size on information-theory-based gene network inference algorithms with an ensemble approach. The empirical results showed that the inference performances of the considered algorithms tend to converge after a particular sample size region. As a specific example, the sample size region around ≃64 is sufficient to obtain the most of the inference performance with respect to precision using the representative algorithm C3NET on the synthetic steady-state data sets of Escherichia coli and also time-series data set of a homo sapiens subnetworks. The author verified the convergence result on a large, real data set of E. coli as well. The results give evidence to biologists to better design experiments to infer gene networks. Further, the effect of cutoff on inference performances over various sample sizes is considered. [Includes supplementary material].The performance of genome-wide gene regulatory network inference algorithms depends on the sample size. It is generally considered that the larger the sample size, the better the gene network inference performance. Nevertheless, there is not adequate information on determining the sample size for optimal performance. In this study, the author systematically demonstrates the effect of sample size on information-theory-based gene network inference algorithms with an ensemble approach. The empirical results showed that the inference performances of the considered algorithms tend to converge after a particular sample size region. As a specific example, the sample size region around ≃64 is sufficient to obtain the most of the inference performance with respect to precision using the representative algorithm C3NET on the synthetic steady-state data sets of Escherichia coli and also time-series data set of a homo sapiens subnetworks. The author verified the convergence result on a large, real data set of E. coli as well. The results give evidence to biologists to better design experiments to infer gene networks. Further, the effect of cutoff on inference performances over various sample sizes is considered. [Includes supplementary material].
Author Altay, G
Author_xml – sequence: 1
  givenname: G
  surname: Altay
  fullname: Altay, G
  email: ga303@cam.ac.uk
  organization: University of Cambridge, Department of Oncology, Cambridge, UK. ga303@cam.ac.uk
BackLink https://www.ncbi.nlm.nih.gov/pubmed/22519356$$D View this record in MEDLINE/PubMed
BookMark eNo1kLtOwzAYRj0U0Qs8AAvyyJJiO3Fij6gqF6kSCwxMUez8Tg2OE2xXqDw9QZTpSEdH3_At0cwPHhC6omRNSSFvLaQsHtWakckQIukMLWjFaSZEIedoGeM7IZyXnJyjOWOcypyXC_S27UcbrG6cO-IWEoTeeus7nPaAY9OPboL9BmyGgF0TOsjiFAPuwAP2kL6G8IGtNxDAa8CN64Zg076PF-jMNC7C5Ykr9Hq_fdk8Zrvnh6fN3S7TOaEp01qQ1shSGaIZzZXJuSJcCM65lKAZSKgqWQhZcclUTgvBJkEro0rVqgrYCt387Y5h-DxATHVvowbnGg_DIdb094ycsIJP6fUpPage2noMtm_Csf6_g_0A5ypjnw
CitedBy_id crossref_primary_10_1038_s41598_019_50885_8
crossref_primary_10_1109_TBCAS_2013_2288035
crossref_primary_10_3390_plants12152767
crossref_primary_10_1109_TNSE_2025_3563303
crossref_primary_10_1016_j_tplants_2015_06_013
crossref_primary_10_3389_fmed_2021_652824
crossref_primary_10_1371_journal_pone_0089815
crossref_primary_10_1186_s12918_017_0440_2
crossref_primary_10_1016_j_pharmthera_2013_01_016
ContentType Journal Article
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1049/iet-syb.2010.0091
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Biology
ExternalDocumentID 22519356
Genre Journal Article
GroupedDBID ---
.DC
0R~
0ZK
1OC
24P
29I
4.4
5GY
6IK
7X7
88E
8FE
8FG
8FH
8FI
8FJ
AAHJG
AAJGR
AAMMB
ABJCF
ABMDY
ABQXS
ABUWG
ACCMX
ACESK
ACGFO
ACGFS
ACIWK
ACPRK
ACXQS
ADBBV
ADEYR
AEFGJ
AEGXH
AENEX
AFKRA
AGXDD
AHMBA
AIDQK
AIDYY
ALIPV
ALMA_UNASSIGNED_HOLDINGS
ALUQN
ARAPS
AVUZU
AZQEC
BBNVY
BENPR
BGLVJ
BHPHI
BPHCQ
BVXVI
CCPQU
CGR
CS3
CUY
CVF
DU5
DWQXO
EBS
ECM
EIF
EJD
F5P
FYUFA
GNUQQ
GROUPED_DOAJ
HCIFZ
HMCUK
HZ~
IAO
IDLOA
IGS
IHR
IPLJI
ITC
K6V
K7-
L6V
LAI
LK8
M1P
M43
M7P
M7S
MCNEO
NPM
O9-
OK1
P62
PHGZM
PHGZT
PJZUB
PPXIY
PQGLB
PQQKQ
PROAC
PSQYO
PTHSS
RNS
ROL
RPM
RUI
SJN
UKHRP
UNMZH
~ZZ
7X8
ID FETCH-LOGICAL-c301t-cc80df96bf0c213bf35b058855599ec2e9e7794897592b31482e7717fb6bdb7e2
IEDL.DBID 7X8
ISICitedReferencesCount 20
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000302939100001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1751-8849
IngestDate Thu Sep 04 18:15:50 EDT 2025
Mon Jul 21 05:48:36 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 2
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c301t-cc80df96bf0c213bf35b058855599ec2e9e7794897592b31482e7717fb6bdb7e2
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
PMID 22519356
PQID 1009130245
PQPubID 23479
ParticipantIDs proquest_miscellaneous_1009130245
pubmed_primary_22519356
PublicationCentury 2000
PublicationDate 2012-Apr
20120401
PublicationDateYYYYMMDD 2012-04-01
PublicationDate_xml – month: 04
  year: 2012
  text: 2012-Apr
PublicationDecade 2010
PublicationPlace England
PublicationPlace_xml – name: England
PublicationTitle IET systems biology
PublicationTitleAlternate IET Syst Biol
PublicationYear 2012
SSID ssj0055650
Score 2.0042343
Snippet The performance of genome-wide gene regulatory network inference algorithms depends on the sample size. It is generally considered that the larger the sample...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 35
SubjectTerms Algorithms
Escherichia coli - genetics
Gene Regulatory Networks - genetics
Genome, Bacterial - genetics
Models, Statistical
Sample Size
Systems Biology - methods
Time Factors
Title Empirically determining the sample size for large-scale gene network inference algorithms
URI https://www.ncbi.nlm.nih.gov/pubmed/22519356
https://www.proquest.com/docview/1009130245
Volume 6
WOSCitedRecordID wos000302939100001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8QwEA7qKnjx_VhfRPAabJNNm5xEZBcvLntQWE-lSRMt7HZXU4X11zvpYz0JgpceSgthOpP5mvnmG4SupDFUBEYRlsWa9CBFEsViThhTVDOe2liqathEPByK8ViOmgM319Aq2z2x2qizmfZn5BDdXsHSFwpv5m_ET43y1dVmhMYq6jCAMp7SFY-XVQQOYKVuiOQhEaInl1VNeZ2bkriFarldMvwdYVaZZrD93zXuoK0GY-Lb2il20Yop9tBGPXVysY-e-9N5XimDTBY4a-gwkMEwYEHsUi8XjF3-ZTDgWTzxTHHi4GGDwdcMLmreOM7bTkGcTl5gEeXr1B2gp0H_8e6eNBMWiIbALonWIsisjJQNNA2ZsoyrgAvBvQ6Z0dRIE0PAChlzSRXzmqFwI4ytipTXZaaHaK2YFeYY4SgNrQ2M1YIBpLE8zYKMhQYAGlcK_lu66LK1WQIe7MsSaWFmHy75sVoXHdWGT-a11EZCfWMt49HJH94-RZvwNWlNqzlDHQvxa87Ruv4sc_d-UbkGXIejh2-Yl8Q-
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Empirically+determining+the+sample+size+for+large-scale+gene+network+inference+algorithms&rft.jtitle=IET+systems+biology&rft.au=Altay%2C+G&rft.date=2012-04-01&rft.issn=1751-8849&rft.volume=6&rft.issue=2&rft.spage=35&rft_id=info:doi/10.1049%2Fiet-syb.2010.0091&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1751-8849&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1751-8849&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1751-8849&client=summon