Trigonometric-Euclidean-Smoother Interpolator (TESI) for continuous time-series and non-time-series data augmentation for deep neural network applications in agriculture
•A new method is proposed for data augmentation for deep neural network use.•The method uses a trigonometric-Euclidian space to generate the new data points.•The new method is compared to the deep learning-based augmentation methods.•The new method retained the data's original distribution, gai...
Gespeichert in:
| Veröffentlicht in: | Computers and electronics in agriculture Jg. 206; S. 107646 |
|---|---|
| Hauptverfasser: | , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Elsevier B.V
01.03.2023
|
| Schlagworte: | |
| ISSN: | 0168-1699, 1872-7107 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | •A new method is proposed for data augmentation for deep neural network use.•The method uses a trigonometric-Euclidian space to generate the new data points.•The new method is compared to the deep learning-based augmentation methods.•The new method retained the data's original distribution, gaining the lowest loss.•The coefficient of determination R2 range increased from 0.60–0.68 to 0.77–0.99.
Biomass estimation, fertilisation, and crop production reflect crop yield potential. The prediction of these variables allows the selection of crop cultivars with high yield potential. Deep neural networks (DNNs) can predict such crop variables. However, DNNs are data greedy algorithms that overfit/underfit on small-size datasets. Additionally, the collection of big data is expensive and laborious. Therefore, providing synthetic big data is preferable. This study aims to: (i) develop a trigonometric-Euclidean-smoother interpolation (TESI) for continuous time-series and non-time-series data augmentation to prevent DNNs from under/overfitting; (ii) compare the TESI performance to the tabular variational autoencoder (TVAE) and the conditional tabular generative adversarial network (CTGAN); and (iii) compare the DNN performance before and after data augmentation. Two time-series datasets, oil palm production and rice production, and two non-time-series datasets, fertiliser and rice total aboveground biomass (TAGB), were augmented using the TESI, TVAE, and CTGAN algorithms. The TESI retained the features’ original probability distribution in the four datasets. The C-TESI achieved the lowest mean squared error mean percentage (MAEP) on the oil palm (0.60–2.85%), rice (0.77–1.72%), and fertiliser datasets (2.04–2.21%). The TESI retained the variance inflation factor (VIF) ranges less than 10 on the four datasets; the TESI retained a VIF range of 1.99–10.06 or reduced the VIF range to 1.55–6.66. Furthermore, the TESI retained the Spearman's r (rs) range of 0.79–0.97 or increased it to 0.81–0.99 on the four datasets. The DNN achieved the highest coefficient of determination (R2) (0.77–0.99) and lowest root mean squared error (RMSE) ranges (2.8E+01–8.1E+05) on the four datasets augmented with the TESI. The Q-TESI, C-TESI, and L-TESI overcame the LN-TESI in retaining the features’ original probability distribution, minimising the augmentation loss, reducing the VIF, increasing the rs, and decreasing the DNN under/overfitting. The Q-TESI, C-TESI, and L-TESI may approximate the nonlinear changes of crop phenology in time-spaced sampling, thereby reducing the cost of sampling for scientists. In addition, they intensify zonal synthetic sampling, thereby reducing sampling labour. |
|---|---|
| Bibliographie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 0168-1699 1872-7107 |
| DOI: | 10.1016/j.compag.2023.107646 |