Empirically determining the sample size for large-scale gene network inference algorithms
The performance of genome-wide gene regulatory network inference algorithms depends on the sample size. It is generally considered that the larger the sample size, the better the gene network inference performance. Nevertheless, there is not adequate information on determining the sample size for op...
Saved in:
| Published in: | IET systems biology Vol. 6; no. 2; p. 35 |
|---|---|
| Main Author: | |
| Format: | Journal Article |
| Language: | English |
| Published: |
England
01.04.2012
|
| Subjects: | |
| ISSN: | 1751-8849 |
| Online Access: | Get more information |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | The performance of genome-wide gene regulatory network inference algorithms depends on the sample size. It is generally considered that the larger the sample size, the better the gene network inference performance. Nevertheless, there is not adequate information on determining the sample size for optimal performance. In this study, the author systematically demonstrates the effect of sample size on information-theory-based gene network inference algorithms with an ensemble approach. The empirical results showed that the inference performances of the considered algorithms tend to converge after a particular sample size region. As a specific example, the sample size region around ≃64 is sufficient to obtain the most of the inference performance with respect to precision using the representative algorithm C3NET on the synthetic steady-state data sets of Escherichia coli and also time-series data set of a homo sapiens subnetworks. The author verified the convergence result on a large, real data set of E. coli as well. The results give evidence to biologists to better design experiments to infer gene networks. Further, the effect of cutoff on inference performances over various sample sizes is considered. [Includes supplementary material]. |
|---|---|
| AbstractList | The performance of genome-wide gene regulatory network inference algorithms depends on the sample size. It is generally considered that the larger the sample size, the better the gene network inference performance. Nevertheless, there is not adequate information on determining the sample size for optimal performance. In this study, the author systematically demonstrates the effect of sample size on information-theory-based gene network inference algorithms with an ensemble approach. The empirical results showed that the inference performances of the considered algorithms tend to converge after a particular sample size region. As a specific example, the sample size region around ≃64 is sufficient to obtain the most of the inference performance with respect to precision using the representative algorithm C3NET on the synthetic steady-state data sets of Escherichia coli and also time-series data set of a homo sapiens subnetworks. The author verified the convergence result on a large, real data set of E. coli as well. The results give evidence to biologists to better design experiments to infer gene networks. Further, the effect of cutoff on inference performances over various sample sizes is considered. [Includes supplementary material]. The performance of genome-wide gene regulatory network inference algorithms depends on the sample size. It is generally considered that the larger the sample size, the better the gene network inference performance. Nevertheless, there is not adequate information on determining the sample size for optimal performance. In this study, the author systematically demonstrates the effect of sample size on information-theory-based gene network inference algorithms with an ensemble approach. The empirical results showed that the inference performances of the considered algorithms tend to converge after a particular sample size region. As a specific example, the sample size region around ≃64 is sufficient to obtain the most of the inference performance with respect to precision using the representative algorithm C3NET on the synthetic steady-state data sets of Escherichia coli and also time-series data set of a homo sapiens subnetworks. The author verified the convergence result on a large, real data set of E. coli as well. The results give evidence to biologists to better design experiments to infer gene networks. Further, the effect of cutoff on inference performances over various sample sizes is considered. [Includes supplementary material].The performance of genome-wide gene regulatory network inference algorithms depends on the sample size. It is generally considered that the larger the sample size, the better the gene network inference performance. Nevertheless, there is not adequate information on determining the sample size for optimal performance. In this study, the author systematically demonstrates the effect of sample size on information-theory-based gene network inference algorithms with an ensemble approach. The empirical results showed that the inference performances of the considered algorithms tend to converge after a particular sample size region. As a specific example, the sample size region around ≃64 is sufficient to obtain the most of the inference performance with respect to precision using the representative algorithm C3NET on the synthetic steady-state data sets of Escherichia coli and also time-series data set of a homo sapiens subnetworks. The author verified the convergence result on a large, real data set of E. coli as well. The results give evidence to biologists to better design experiments to infer gene networks. Further, the effect of cutoff on inference performances over various sample sizes is considered. [Includes supplementary material]. |
| Author | Altay, G |
| Author_xml | – sequence: 1 givenname: G surname: Altay fullname: Altay, G email: ga303@cam.ac.uk organization: University of Cambridge, Department of Oncology, Cambridge, UK. ga303@cam.ac.uk |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/22519356$$D View this record in MEDLINE/PubMed |
| BookMark | eNo1kLtOwzAYRj0U0Qs8AAvyyJJiO3Fij6gqF6kSCwxMUez8Tg2OE2xXqDw9QZTpSEdH3_At0cwPHhC6omRNSSFvLaQsHtWakckQIukMLWjFaSZEIedoGeM7IZyXnJyjOWOcypyXC_S27UcbrG6cO-IWEoTeeus7nPaAY9OPboL9BmyGgF0TOsjiFAPuwAP2kL6G8IGtNxDAa8CN64Zg076PF-jMNC7C5Ykr9Hq_fdk8Zrvnh6fN3S7TOaEp01qQ1shSGaIZzZXJuSJcCM65lKAZSKgqWQhZcclUTgvBJkEro0rVqgrYCt387Y5h-DxATHVvowbnGg_DIdb094ycsIJP6fUpPage2noMtm_Csf6_g_0A5ypjnw |
| CitedBy_id | crossref_primary_10_1038_s41598_019_50885_8 crossref_primary_10_1109_TBCAS_2013_2288035 crossref_primary_10_3390_plants12152767 crossref_primary_10_1109_TNSE_2025_3563303 crossref_primary_10_1016_j_tplants_2015_06_013 crossref_primary_10_3389_fmed_2021_652824 crossref_primary_10_1371_journal_pone_0089815 crossref_primary_10_1186_s12918_017_0440_2 crossref_primary_10_1016_j_pharmthera_2013_01_016 |
| ContentType | Journal Article |
| DBID | CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1049/iet-syb.2010.0091 |
| DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Biology |
| ExternalDocumentID | 22519356 |
| Genre | Journal Article |
| GroupedDBID | --- .DC 0R~ 0ZK 1OC 24P 29I 4.4 5GY 6IK 7X7 88E 8FE 8FG 8FH 8FI 8FJ AAHJG AAJGR AAMMB ABJCF ABMDY ABQXS ABUWG ACCMX ACESK ACGFO ACGFS ACIWK ACPRK ACXQS ADBBV ADEYR AEFGJ AEGXH AENEX AFKRA AGXDD AHMBA AIDQK AIDYY ALIPV ALMA_UNASSIGNED_HOLDINGS ALUQN ARAPS AVUZU AZQEC BBNVY BENPR BGLVJ BHPHI BPHCQ BVXVI CCPQU CGR CS3 CUY CVF DU5 DWQXO EBS ECM EIF EJD F5P FYUFA GNUQQ GROUPED_DOAJ HCIFZ HMCUK HZ~ IAO IDLOA IGS IHR IPLJI ITC K6V K7- L6V LAI LK8 M1P M43 M7P M7S MCNEO NPM O9- OK1 P62 PHGZM PHGZT PJZUB PPXIY PQGLB PQQKQ PROAC PSQYO PTHSS RNS ROL RPM RUI SJN UKHRP UNMZH ~ZZ 7X8 |
| ID | FETCH-LOGICAL-c301t-cc80df96bf0c213bf35b058855599ec2e9e7794897592b31482e7717fb6bdb7e2 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 20 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000302939100001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1751-8849 |
| IngestDate | Thu Sep 04 18:15:50 EDT 2025 Mon Jul 21 05:48:36 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 2 |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c301t-cc80df96bf0c213bf35b058855599ec2e9e7794897592b31482e7717fb6bdb7e2 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| PMID | 22519356 |
| PQID | 1009130245 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_1009130245 pubmed_primary_22519356 |
| PublicationCentury | 2000 |
| PublicationDate | 2012-Apr 20120401 |
| PublicationDateYYYYMMDD | 2012-04-01 |
| PublicationDate_xml | – month: 04 year: 2012 text: 2012-Apr |
| PublicationDecade | 2010 |
| PublicationPlace | England |
| PublicationPlace_xml | – name: England |
| PublicationTitle | IET systems biology |
| PublicationTitleAlternate | IET Syst Biol |
| PublicationYear | 2012 |
| SSID | ssj0055650 |
| Score | 2.0042343 |
| Snippet | The performance of genome-wide gene regulatory network inference algorithms depends on the sample size. It is generally considered that the larger the sample... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 35 |
| SubjectTerms | Algorithms Escherichia coli - genetics Gene Regulatory Networks - genetics Genome, Bacterial - genetics Models, Statistical Sample Size Systems Biology - methods Time Factors |
| Title | Empirically determining the sample size for large-scale gene network inference algorithms |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/22519356 https://www.proquest.com/docview/1009130245 |
| Volume | 6 |
| WOSCitedRecordID | wos000302939100001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8QwEA7qKnjx_VhfRPAabJNNm5xEZBcvLntQWE-lSRMt7HZXU4X11zvpYz0JgpceSgthOpP5mvnmG4SupDFUBEYRlsWa9CBFEsViThhTVDOe2liqathEPByK8ViOmgM319Aq2z2x2qizmfZn5BDdXsHSFwpv5m_ET43y1dVmhMYq6jCAMp7SFY-XVQQOYKVuiOQhEaInl1VNeZ2bkriFarldMvwdYVaZZrD93zXuoK0GY-Lb2il20Yop9tBGPXVysY-e-9N5XimDTBY4a-gwkMEwYEHsUi8XjF3-ZTDgWTzxTHHi4GGDwdcMLmreOM7bTkGcTl5gEeXr1B2gp0H_8e6eNBMWiIbALonWIsisjJQNNA2ZsoyrgAvBvQ6Z0dRIE0PAChlzSRXzmqFwI4ytipTXZaaHaK2YFeYY4SgNrQ2M1YIBpLE8zYKMhQYAGlcK_lu66LK1WQIe7MsSaWFmHy75sVoXHdWGT-a11EZCfWMt49HJH94-RZvwNWlNqzlDHQvxa87Ruv4sc_d-UbkGXIejh2-Yl8Q- |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Empirically+determining+the+sample+size+for+large-scale+gene+network+inference+algorithms&rft.jtitle=IET+systems+biology&rft.au=Altay%2C+G&rft.date=2012-04-01&rft.issn=1751-8849&rft.volume=6&rft.issue=2&rft.spage=35&rft_id=info:doi/10.1049%2Fiet-syb.2010.0091&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1751-8849&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1751-8849&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1751-8849&client=summon |