Image Captioning Using Deep Learning Techniques Like Cnn-Lstm
DOI:
https://doi.org/10.64252/8mg8z355Keywords:
CNN, LSTM, Image Captioning, Deep learning, Attention Mechanism, Agriculture sectorAbstract
The capacity to detect and understand visual content in images has significant implications across various sectors, including automated driving, assistive technologies, and content management, agriculture sector and in general images. This paper explores a hybrid deep learning architecture combining Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks for the task of image detection and captioning. Our method utilizes CNN as a feature extractor to analyze visual inputs, which are then processed by LSTM to generate descriptive textual captions. We employ a novel combination of pre-trained models, InceptionV3 and VGG16, and demonstrate our system's efficacy through experiments using the Flickr8k and Flickr30k datasets. The results, measured by BLEU scores, show promising improvements over current state-of-the-art technologies. We discuss the hardware and software tools used, the experimental setup, and the practical challenges encountered. This work concludes with potential applications and future research directions, aiming to further bridge the gap between visual data and natural language processing .