Image Captioning Using Deep Learning Techniques Like Cnn-Lstm

Authors

  • Ranjana B Battur, Arundhati Nelli, Sushant Mangasuli, Vijay S Rajpurohit, Prashant Y Niranjan, Sayeda Anjum K Munshi, Alok Gaddi Author

DOI:

https://doi.org/10.64252/8mg8z355

Keywords:

CNN, LSTM, Image Captioning, Deep learning, Attention Mechanism, Agriculture sector

Abstract

The capacity to detect and understand visual content in images has significant implications across various sectors, including automated driving, assistive technologies, and content management, agriculture sector and in general images. This paper explores a hybrid deep learning architecture combining Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks for the task of image detection and captioning. Our method utilizes CNN as a feature extractor to analyze visual inputs, which are then processed by LSTM to generate descriptive textual captions. We employ a novel combination of pre-trained models, InceptionV3 and VGG16, and demonstrate our system's efficacy through experiments using the Flickr8k and Flickr30k datasets. The results, measured by BLEU scores, show promising improvements over current state-of-the-art technologies. We discuss the hardware and software tools used, the experimental setup, and the practical challenges encountered. This work concludes with potential applications and future research directions, aiming to further bridge the gap between visual data and natural language processing .

Downloads

Download data is not yet available.

Downloads

Published

2025-10-01

Issue

Section

Articles

How to Cite

Image Captioning Using Deep Learning Techniques Like Cnn-Lstm . (2025). International Journal of Environmental Sciences, 1354-1363. https://doi.org/10.64252/8mg8z355