Early And Accurate Detection Of Apple Leaf Diseases Using Attention U-Net And Transformers
DOI:
https://doi.org/10.64252/8068db27Keywords:
Crop disease detection, Deep learning, Attention mechanism, Convolutional Neural Network (CNN), Vision Transformer (ViT), Explainable AI.Abstract
Early detection of crop diseases is critical for safeguarding agricultural productivity and ensuring food security. This study proposes an attention-based deep learning framework for robust classification of apple leaf diseases using the publicly available Plant Pathology 2020 and 2021 datasets. The smaller Plant Pathology 2020 dataset (3,651 images) was used for prototyping, while the larger Plant Pathology 2021 dataset (19,000 images) enabled large-scale multi-label classification under field conditions. The framework integrated Convolutional Neural Networks (CNNs) enhanced with Convolutional Block Attention Modules (CBAM), transformer-based architectures such as the Vision Transformer (ViT) and Swin Transformer, and a hybrid Attention U-Net model. Evaluation metrics included accuracy, precision, recall, F1-score, and ROC-AUC. Experimental results demonstrated that baseline CNNs achieved an accuracy of 91.8–92.7% on the Plant Pathology 2020 dataset, while the inclusion of CBAM increased performance to 95.3–95.9% with macro-F1 up to 0.96. On the Plant Pathology 2021 dataset, transformer-based models significantly outperformed CNNs, with ViT achieving 96.8% accuracy and Swin Transformer achieving 97.4% accuracy, accompanied by F1-scores above 0.96 and ROC-AUC values of 0.98. Explainability techniques such as Grad-CAM and transformer attention maps confirmed that the models focused on biologically relevant lesion regions. These results highlight that attention-driven architectures achieve state-of-the-art performance while enhancing interpretability, making them well-suited for precision agriculture applications.




