Text Information Extraction From Images Using Deep Cnn

Authors

  • Samit Arun Ingole Author
  • Dr. Shitalkumar A Jain Author

DOI:

https://doi.org/10.64252/4mn15v76

Keywords:

Convolutional Neural Networks, Tesseract, Optical Character Recognition, Long Short-Term Memory, Text Extraction.

Abstract

This paper aims in the development of Optical Character Recognition (OCR) technique in order to extract the text from image. It employs Tesseract-OCR coupled with OpenCV-Python and Pillow Python library for image preprocessing, text detection and extraction. The image processing algorithms include image to grayscale, image cleaning, image thresholding, and image edge detection that make text more readable and machine identifiable for the system to process. The PyTesseract then takes the extracted text, removes any unwanted spaces since this format can then be easily fed into different data processing pipelines. This approach accepts most popular image formats, including JPEG, PNG, TIFF, PSP, GIF. It also can solve problems of noisy or distorted or low-contrast images by using adaptive filters and contrast enhancement algorithms. Moreover, this system is intended for document scanning, such as the automation of data entry, text analysis in real-time, and for visually impaired individuals. It can therefore be extended for text translation, handwriting recognition and data extraction from invoices, forms, and other official documents. Further development may implicate the use of improved and developed models like the EAST for sentence detecting, CRNN or transformers and deploying OCR services as cloud platforms for extended processing. The extraction and processing of text from various image formats is a useful endeavor that can be applied in numerous industries and thus the scalability and efficiency of the current project work enhances its value.

Downloads

Download data is not yet available.

Downloads

Published

2025-06-02

Issue

Section

Articles

How to Cite

Text Information Extraction From Images Using Deep Cnn. (2025). International Journal of Environmental Sciences, 1442-1455. https://doi.org/10.64252/4mn15v76