Application of supervised and unsupervised algorithms to find the important features related to barley ('Hurdeum vulgare' L.) grain yield: A new vista in data mining

Data mining methods are useful tools for crop physiologists to search through large datasets seeking patterns for agronomic factors, and that may assist the selection of the most important features for the individual site and field. To find the main features contributing to barley grain yield (outpu...

Full description

Saved in:
Bibliographic Details
Published in:Australian Journal of Crop Science Vol. 8; no. 12; pp. 1590 - 1596
Main Authors: Ehsan Bijanzadeh, Ruhollah Naderi
Format: Journal Article
Language:English
Published: Lismore, N.S.W Southern Cross Publishers 01.12.2014
Subjects:
ISSN:1835-2693
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Data mining methods are useful tools for crop physiologists to search through large datasets seeking patterns for agronomic factors, and that may assist the selection of the most important features for the individual site and field. To find the main features contributing to barley grain yield (output), supervised and unsupervised algorithms as feature selection and attribute weighting were performed using SPSS Clementine 11.1 and Rapid Miner 5.0.001 softwares, respectively. Data presented in this study was collected from the literatures on the subject of barley physiology in Iran that was existed in http://sid.ir website. A total of 10563 data was extracted from the literatures, including 21 features and 503 records. Ranking of features by feature selection indicated that from 20 features as input, 10 features including culture type, location, irrigation regime, biological yield, nitrogen applied to the soil, rainfall amount, and genotype, with a value of 1.0 were the most important features related to the barley grain yield. General linear model between location and barley grain yield showed that Kermanshah with 3721 kg/ha had significant differences (p=0.01) with Badjgah, Sararood and Gachsaran under dryland farming. By ten attribute weighting algorithms, 13 features had weights = 0.5 and biological yield, location, genotype, and culture type were the most important features highlighted by 7, 6, 5 and 5 algorithms related to grain yield, respectively. Overall, feature classification by supervised and unsupervised algorithms can provide a comprehensive view of important features such as biological yield, location, culture type, irrigation regime, nitrogen applied and genotype, which contribute to grain yield improvement.
Bibliography:Australian Journal of Crop Science, Vol. 8, No. 12, Dec 2014, 1590-1596
Informit, Melbourne (Vic)
ISSN:1835-2693