A System and Method for Image-To-Text Conversion

Technology #ua18-094

Questions about this technology? Ask a Technology Manager

Download Printable PDF

Categories
Researchers
Marek Rychlik
President, Mathmatics
Managed By
Lewis Humphreys
Licensing Manager (520) 626-2574

Title: A System and Method for Image-to-Text Conversion

 

Invention: This technology is a software for converting scanned, printed and handwritten text into Unicode text. Composed of machine learning algorithms and image processing capabilities, this system is an improved optical character recognition (OCR) technology.

 

Background: OCR technologies provide a simpler alternative to entering metadata manually, revolutionizing document management and transforming many offices into paperless work spaces by digitizing documents. Current OCR algorithms are design for modern layouts meaning limitations exist in converting older documents, which are difficult to convert due to the difference in layout structures. Additionally, OCR software lacks language support for Pashto and traditional Chinese. The technology presented here was developed with the intention of annotating large library collections into accessible formats; particularly the Pashto documents whose originals were destroyed in a fire in Kabul.

 

Applications:

  • Document preservation
  • Accounting/bookkeeping
  • Transport and logistics
  • Government
  • Retail

 

Advantages:

  • Annotates small-large documents
  • Provides additional language support (Pashto, Traditional Chinese)
  • Converts existing documents into unicode text

 

Licensing Manager:

John Geikler

JohnG@tla.arizona.edu

(520) 626-4605